0% found this document useful (0 votes)
82 views

On The Classification of Computer Architecture

This document discusses different classifications of computer architectures based on their instruction sets. It describes Complex Instruction Set Computers (CISC) which use microcode to implement instructions and have complex decoding logic. Reduced Instruction Set Computers (RISC) aim to execute each instruction in one clock cycle using simpler instructions, load/store architecture, and pipelining. The document also discusses other classifications like Minimal Instruction Set Computers (MISC), High Performance Instruction Set Computers (HISC), Very Long Instruction Word (VLIW) architectures, and compares RISC vs CISC designs.

Uploaded by

Aya Yasser
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

On The Classification of Computer Architecture

This document discusses different classifications of computer architectures based on their instruction sets. It describes Complex Instruction Set Computers (CISC) which use microcode to implement instructions and have complex decoding logic. Reduced Instruction Set Computers (RISC) aim to execute each instruction in one clock cycle using simpler instructions, load/store architecture, and pipelining. The document also discusses other classifications like Minimal Instruction Set Computers (MISC), High Performance Instruction Set Computers (HISC), Very Long Instruction Word (VLIW) architectures, and compares RISC vs CISC designs.

Uploaded by

Aya Yasser
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/267239549

On the Classification of Computer Architecture

Article · July 2003

CITATIONS READS

6 27,907

2 authors:

Adnan Shaout Taisir Eldos


University of Michigan-Dearborn Jordan University of Science and Technology
239 PUBLICATIONS   615 CITATIONS    23 PUBLICATIONS   104 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Secure RFID Access Control System View project

Vehicle Intelligent systems using fuzzy logic View project

All content following this page was uploaded by Adnan Shaout on 21 November 2014.

The user has requested enhancement of the downloaded file.


On the Classification of Computer Architecture
Adnan Shaout*, and Taisir Eldos**

*University of Michigan – Dearborn, Department of Electrical and Computer Engineering, Dearborn, MI 48128
**Jordan University of Science & Technology, Department of Computer Engineering, Irbid, Jordan 22110

ABSTRACT

There are many computer architecture classification methods based on different criteria such as cost, capacity
(memory size, data word length and size of the secondary storage), performance, instruction set, component base and
others. The purpose of this paper is to review existing computer architecture classification methods. A brief
description of their philosophy, comparative analysis and applications will be presented. New classification methods
are introduced based on classification criteria such as number of storage hierarchy levels, number of addressable
fields, fault tolerance, processor identity, code morphing, and reconfigurability.

1. INTRODUCTION and used microcode instruction set to control the data


path logic. In those systems, the main processor has
There are many sources for computer classifications. some built-in ROM, which contains groups of
The Association of Computing Machinery (ACM) has microcode instructions, corresponding to each machine-
one of the most detailed computer classifications; language instruction (a macrocode instruction).
where computers are classified based on control
structures, arithmetic and logic structures, memory Because instructions could be retrieved much faster
structures, I/O and data communications and integrated from a local ROM than from main memory, designers
circuits used. put as many instructions as possible into microcode.
Microcode implementation allows using the same
In this paper, classifications based on various criteria programming model among different hardware
will be investigated and new classification methods will configurations, beside the advantage of easily
be proposed. Sections 2 through 9 will describe the modifying the instruction set. Some machines were
existing computer architectures, which are based on the optimized for scientific computing, others were
instruction sets of the computer system; CISC, RISC, optimized for business computing; however, since they
MISC, HISC, WISC, ZISC and VLIW, and provide all shared the same instruction set, programs could be
analysis of advantages and disadvantages of each class. moved from one machine to another without re-
In sections 10, beside the classification of computers by compilation (but with a possible increase or decrease in
their instruction sets, other criteria have been used such performance depending on the underlying hardware.)
as cost, capacity, performance, component density, and This kind of flexibility and power made microcoding
many others. This section looks at some of these the preferred way to build new computers for some
classification criteria that have been used to classify time. Assembly language programming and the low
computer systems. In section 11, new classification memory size of the early days promoted the CISC style,
methods are introduced based on classification criteria leading to common features including 2-operand
such as number of storage hierarchy levels, number of format, register to memory and memory to register
addressable fields, fault tolerance, processor identity, instructions, multiple addressing modes for memory,
code morphing, and reconfigurability. In section 12 we variable length instructions and many clock cycles per
conclude. instruction. The hardware architectures had typically
complex instruction-decoding logic, small number of
2. COMPLEX INSTRUCTION SET COMPUTER general-purpose registers and several special-purpose
(CISC) registers.

The earliest processor designs used dedicated hardwired 3. REDUCED INSTRUCTION SET COMPUTER
logic to decode and execute each instruction. That was (RISC)
appropriate for simple designs with few registers, but
made architectures more complex and hard to build. RISC processors implement small instruction sets
Developers of computer systems took another capable of running faster than CISC instructions.
approach; they built simple logic to control the data Developers set the goal of achieving one cycle
paths between the various elements of the processor,
execution time. This objective could only be achieved if RISC processors depend on a complex memory
the instruction set was pipelined. Modern RISC hierarchy in order to work at full speed. In most of
processors feature small instruction set, load/store them, separate data and instruction caches try to avoid
architecture, fixed length coding and hardware contention for the system bus when a fetch is
decoding, large register set, delayed branching and one overlapped with a register load or store and that
clock per instruction. explains the necessity of proper management of a
memory hierarchy. Those features are part of a
The phases in the execution path are typically: common strategy to guarantee an uninterrupted pipeline
instruction fetch, decoding, operand fetch, execution, flow, a high level of parallel execution of sequentially
memory access and write back of the operation results. coded programs. Fixed word encoding, hardwired
Pipelining is just the overlapped execution of the decoding, delayed loads, delayed branches, etc., are just
different phases of the execution path. The penalty for ways to achieve a regular pipeline flow.
the disruptions is paid in the form of lost or stall
pipeline cycles. The effective instruction level RISC Processors can be classified according to many
parallelism exploited by traditional CISC measures that affect the performance, like the word
microprocessors is nearly 2.5. Typical RISC processors size, datapath width, pipeline depth, cache structure as
go beyond the classical 3 level pipeline (Fetch, decode, split versus common or on-chip versus off-chip, bus
and execute) and use pipelines with 4, 5, or 6 levels, structure as Harvard versus Princeton, prefetch buffers
and up to 20 in modern processors. Deeper pipeline and write buffers, register files as common versus
means more parallelism but also more coordination private, register management as scoreboarding versus
problems. register renaming and units chaining capability.

Statistics of real programs have shown that 15% of the 4. RISC VERSUS CISC
instructions are branches and around 50% of the
forward going branches and 90% of the backward going Many of the techniques used in RISC processors can be
branches are taken. To minimize the flushing effect on implemented in CISC designs. It is possible to rewire
the pipeline, the branching decision is made in the the processor in order to execute most of the
decoding stage. This can be done only if the branching instructions in one cycle, or it is possible to use a
condition tests are very simple, like for example a pipelined microengine in order to speed up execution.
register compare with zero or a condition flag test. At The microengine could be a RISC kernel giving all the
the end of the decoding phase the processor can start advantages of RISC without its disadvantages.
fetching instructions from the new target. But in this However, RISC features can be introduced in CISC
decode cycle the next instruction after the branch has processors only at the expense of much more hardware.
already been fetched and in order to avoid stall cycles It is possible to program the pipeline of a CISC
this instruction can be executed. In this case the branch processor to use the dead time between the load and
is a delayed branch. From the programmer’s point of store of one instruction argument in memory. The
view, the branch is postponed until after the next microengine works in this case following a load/store
instruction is executed. The compiler tries to schedule a model, and it dynamically reschedules the operations
useful instruction in the location after the branch, which needed by the macrocode. This dynamical rescheduling
is called the delay slot. Processors with very deep is too expensive compared to the software scheduling
pipelines schedule up to two delay slots and fill the rest used in RISC processors. Software scheduling must be
with NOPs. Another technique is to predict the target of done only once and then it runs without complex
the branch. hardware. Dynamic scheduling needs much more
hardware logic.
The whole benefit of a RISC architecture can be
defeated if the compiler is not sophisticated enough to CISC designers move complexity from software to
rearrange instructions in the optimal order [1]. RISC hardware, making tradeoffs in favor of decreased code
architectures try to maximize the cooperation between size, at the expense of a higher cycle per seconds (CPI).
hardware and software. Optimizing compilers are one While RISC designers move complexity from hardware
of the essential components of RISC systems. This act to software, making tradeoffs in favor of a lower CPI, at
of shifting the burden of code optimization from the the expense of increased code size. CISC processors
hardware to the compiler was one of the key advances can still be made competitive to RISC processors if the
of the RISC revolution. Since the hardware was now cycle time is reduced, but RISC processors are better
simpler, this meant that the software had to absorb positioned to achieve greater reductions in the clock
some of the complexity by aggressively profiling the cycle time in the long run. The cycle time is determined
code and making judicious use of minimal instruction by the following factors: pipelining depth, amount of
set and expanded register count. Thus, RISC machines logic in each stage and the VLSI technology used.
devoted their limited transistor resources to provide an RISC processors can achieve larger reductions in the
environment in which code could be executed as clock cycle time with a lower investment in design
quickly as possible, trusting that the compiler had made time. Reducing the clock cycle time of CISC processors
the code compact and optimal [2]. is possible, but much more difficult. The question is
which design philosophy will be capable of climbing
the performance ladder faster in the next few years. operand access codes and design and system dependent
RISC designs appear as potentially much faster than information for the operand. The data types of the
CISC processors, but numbers show that both RISC and operands include integer, floating-point number, BCD,
CISC are improving at 50-55% per year [3]. character and string. The vector information includes
number of elements in the vector and element spacing
5. MINIMUM INSTRUCTION SET COMPUTER for vector operands.
(MISC)
HISC reduced the demand for conditional branching as
Increasing speed in the RISC processor creates a large in RISC by eliminating the looping count for operands
bottleneck between the processor and the slower of variable lengths and large size, as well as vectors. On
memory. To increase the memory accessing speed, it is the other hand, HISC will operate super-scalar on a
necessary to use cache memory to buffer instruction higher level. The inter-dependency of operands will be
and data streams. The cache memory brings in a whole much less, while it is much likely to operate super-
set of problems which can complicate the system design scalar for two or more function units. HISC also keeps
and make the system more expensive. RISC processors the vector information so that vector operations will be
are relatively inefficient in handling subroutine calls done by hardware.
and returns. Efficient subroutine mechanism is critical
to the performance of a processor in supporting high- HISC processors provide better encapsulation and
level languages. Many RISC processors use a large potential performance gain. In a typical object-oriented
register file, which is windowed to assist in subroutine system, an object is defined as a software bundle of
calls and returns. However, the register window must persistent variables and related methods. The persistent
be big enough to handle a large set of input, output, and variables record the persistent state of the object, while
local parameters. The large register window wastes the the methods implement the behavior of the object. The
most valuable resource in the RISC processor, and basic operations that can be applied to an object are
slows down the computer system during context object creation, object removal, method invocation and
switching. The principle of simplicity was not enforced persistent variables invocation. Object creation is to
enough to realize the full benefit from this principle. create an instance of a class or a prototype object, in
MISC architectures explore simplicity to its limit, by which a new set of persistent variables are created
assuming 32 instructions. while the methods are shared among objects with the
same class or prototype. Object removal is done by a
The MuP21 chip a MISC implementation with the four process called garbage collection. Any object, which is
instructions groups [4, 5]; transfer instructions, memory not referenced, by any other objects in the system will
instructions, and arithmetic and register instructions. So be removed. Method invocation passes the control from
far, only 24 instructions were implemented in MISC, the current executing method to the invoked method.
leaving some room for future expansion. Subtraction
can be synthesized by complement and addition. OR In a typical object-oriented system, every object is a
can be synthesized by complement, AND, and XOR. software collection of persistent variables and methods.
In many software implementations, there is an object
Potential applications for MuP21 include advanced table for an object, which defines the name, access
video games, Video test pattern generators, CAD rights and the value of the persistent variables, or
design system, and telephone switching system, methods resided in the object. This object table has a
handheld computers, high-speed communications structure very similar to the Operand Descriptor Table
systems, intelligent hard disk controllers, and robotic (ODT) in HISC. The logical relationship between the
controllers. HISC architecture and the object-oriented concept
provides for easy mapping of an object representation
6. HIGH-LEVEL INSTRUCTION SET to the HISC architecture.
COMPUTER (HISC)
7. WRITABLE INSTRUCTION SET COMPUTER
A High-level Instruction Set Computer (HISC) is a (WISC)
general-purpose architecture proposed by Anthony
Fong [6], targeted on high performance, Writable Instruction Set Computers (WISC) is stack-
implementation flexibility, expandability, better access based architecture. New stack machine design based on
control and system dependent features for nowadays VLSI design technology that provides additional
demand for high computing power and multimedia benefits not found on previous stack machines. These
applications. new stack computers use the union of their features to
achieve combination of speed, flexibility, and
HISC is a 64-bit architecture, which involves simple simplicity.
instructions of fixed length, entries of operand
descriptors and application oriented data types. The Stack machines offer processor complexity that is much
operands of an instruction are described by Operand lower than that of CISC machines, and overall system
Descriptors, which are records, which consist of virtual complexity that is lower than that of either RISC or
addresses, data types, operand sizes, vector information, CISC machines [7]. They do this without requiring
complicated compilers or cache control hardware for in conventional languages. Of special interest are those
good performance. They also accomplish competitive applications that require stack machines' special
raw performance, and superior performance in most advantages: small system size, good response to
programming environments. Their first successful external events, and efficient use of limited hardware
application area has been in real time embedded control resources.
environments, where they outperform other system
design approaches. Stack machines also show great 8. ZERO INSTRUCTION SET COMPUTERS
promise in executing logic programming languages (ZISC)
such as Prolog, functional programming languages such
as Miranda and Scheme, and artificial intelligence ZISC is the integrated circuit based on neural network
research languages such as OPS-5 and Lisp. designed for applications, which usually require
supercomputers [8, 9]. ZISC has a high performance,
The major difference between this new breed of stack capable of operating in real time and providing a very
machine and the older stack machines is that large, high cost-effective way to solve such problems as pattern
speed dedicated stack memories are now cost effective. recognition and classification.
Previously, the stacks were kept mostly in program
memory; newer stack machines maintain separate ZISC is an expert system, which uses accumulated
memory chips or even on-chip memory for the stacks. knowledge to recognize and classify objects or
These stack machines provide extremely fast subroutine situations and take immediate decisions. ZISC does not
calling capability and superior performance for need to be programmed because it learns by examples
interrupt handling and task switching. When put from samples of data. During training session it is
together, these qualities create computer systems that required to enter pairs of examples and solutions and
are fast and compact. the built-in learning mechanism will accumulate the
knowledge. ZISC has also generalization capability,
Both hardware and software stacks have been used to which makes it possible to react to objects, or
support four major areas in computing requirements: situations, which were not part of the learning
expression evaluation, subroutine return address examples.
storage, dynamically allocated local variable storage,
and subroutine parameter passing. Types included ZISC has ability to learn while performing
Expression Evaluation Stack, Return Address Stack, classification tasks, i.e. ZISC learning capability is not
Local Variable Stack, Parameter Stack and limited in time. It is also not limited in volume because
Combination Stacks its chips can be cascaded to create a larger system,
which is very important as it ensures that the system
New generation of stack computers is based on the rich architecture will not change when technology density
history of stack machine design and the new increases. ZISC cascadability means several chips can
opportunities offered by VLSI fabrication technology. be linked together to build a wider network, without
This combination produces a unique blend of simplicity additional logic and without affecting the classification
and efficiency that has in the past been lacking in or learning performances. Those features make the
computers of all kinds. The features that produce these ZISC very easy to use, capable of solving not precisely
results and distinguish these machines from defined problems. ZISC has the ability to separate noise
conventional designs are: multiple stacks with hardware from the signal, and that makes it a perfect base for
stack buffers, zero-operand stack oriented instruction signal processing.
sets, and the capability for fast procedure calls. These
design characteristics lead to a number of features in ZISC036 [10] as an example is a fully integrated,
the new stack machines. Among these features are high digital implementation of the RBF-like (Radial Basis
performance without pipelining, very simple processor Function) model. One ZISC036 device has 36 neurons,
logic, very low system complexity, small program size, but due to its cascadability, the total number of neurons
fast program execution, low interrupt response in a network is not limited. Each neuron has a register
overhead, consistent program execution speeds across file for prototype storage, and a distance evaluation unit
all time scales, and a low cost for context switching. to ensure a high level of parallelism. In ZISC036, 14
bits are used for distance calculations. To perform a
Many of the designs for these stack computers have classification, the vector components are fed in
their roots in the Forth programming language [7], sequence and processed in parallel by every neuron.
because Forth forms both a high level and assembly With a ZISC operating at 20MHz, 64 components can
language for a stack machine that has two hardware be fed and processed in 3.2 ms and the evaluation
stacks: one for expression evaluation/parameter achieved within 0.5 ms after the last component has
passing, and one for return addresses. Forth language been fed. This level of performance allows more than
actually defines a stack based computer architecture, 250,000 evaluations per second. It would require a
which is emulated by the host processor while 2,000 MIPS machine to achieve the same level of
executing Forth programs. Although some of stack performance on a Von-Neumann processor.
machines are designed primarily to run Forth, they are
quite practical to use in many applications programmed
The ZISC family is designed to extend the neural synchronizing gates and buses and passing data
network beyond the boundary of a single chip without between function units.
impact on performance. Some applications can also
benefit from multi-layer configurations where decisions RISC eliminated microcoding in favor of hard-wired
of a layer can feed the input features of a second layer. instructions. VLIW, on the other hand, is like taking
This can be achieved either by connecting several chips that microcode off the chip and putting it into the
or by using subsets of the network operating in a time- compiler. The trouble is that writing microcodes is
multiplexed mode. unbelievably hard. VLIW becomes viable only if a
smart compiler can write it for you. This difficulty has
9. VERY LONG INSTRUCTION WORD (VLIW) thus far confined VLIW machines to niches such as
scientific array processing and signal processing.
Very Large Instruction Word (VLIW) describes an
instruction-set philosophy in which the compiler packs The compiler packs groups of independent operations
a number of simple, non-interdependent operations into into very long instruction words in a way that uses all
the same instruction word. When fetched from cache or the function units efficiently during each cycle. The
memory into the processor, these words are easily compiler discovers all the data dependencies, and then
broken up and the operations dispatched to independent determines how to resolve these dependencies,
execution units. VLIW or EPIC can perhaps be best reordering the whole program by moving blocks of
described as software based super scalar technology code around. This process differs from a superscalar
[11]. CPU, which uses special hardware to determine
dependencies dynamically at run time (Optimizing
VLIW represents the ultimate of internal parallelism in compilers can certainly improve the performance of a
microprocessor designs. There are two things to make a superscalar CPU, but the CPU does not depend on
microprocessor run faster: Speed up its clock or make it them). Most superscalar processors will detect
perform more operations during each clock cycle. dependencies and schedule parallel execution, only
Speeding up the clock requires inventing smaller within basic blocks (a group of consecutive statements
fabrication processes and adopting architectural with no halting or branching except at the end). To find
features such as deep pipelines to keep the silicon busy. more parallelism, a VLIW machine must look for
More operations per cycle imply building multiple operations from different basic blocks to pack into the
function units on the same chip as well as executing same instruction. Trace scheduling is a common
enough instructions concurrently to keep those units technique to do this [12, 13, 14].
busy.
Among the issues of concern is the static nature of
The scheduling problem is the core problem of modern VLIW compiler optimizations [15]. How well will such
processor design [12]. Super scalar processors employ programs perform when faced with dynamic run-time
special hardware to uncover instruction dependencies. events (such as waiting for I/O) unforeseen at compile
However, this approach goes only so far, since the time? VLIW arose to meet the needs of scientific
scheduling hardware increases geometrically with the number crunching, but it might prove less capable on
number of function units and eats more chip real estate. the sorts of object-oriented and event-driven programs
The alternative is to let the software do all the that are more common in the PC community.
scheduling, and that is precisely what a VLIW design
does. A smart compiler can examine a program, find all 10. NEW CLASSIFICATION METHODS
instructions with no dependencies, string them together
in a very long batch, and execute them concurrently on Beside the classification of computers by their
an equally big array of function units. Very long instruction sets, other criteria have been used such as
instructions are typically between 256 and 1024 bits cost, capacity, performance, component density, and
wide. Such long instructions contain many smaller many others. This section looks at some of these
fields, each of which directly encodes an operation for a classification criteria that can be used to classify
particular function unit. computer systems.

In hardware terms, a VLIW processor is very simple, 10.1. Cost, Capacity and Performance
consisting of little more than a collection of function
units (adders, multipliers, branch units, etc.) connected Computers may be compared on the basis of cost,
by a bus, plus some registers and caches. This benefits capacity (memory size, data word length, size of
semiconductor manufacturers since more silicon goes to secondary storage) and performance (speed, throughput
the actual processing (rather than being spent on branch and number of users). They may be classed broadly as:
prediction, for example) and VLIW processor should Supercomputers, Mainframes and Minicomputers
run fast, as the only limit is the latency of the function supporting variable number of users and offering
units themselves. CISC implements such instructions as several hundreds of GIPS, down to Workstations
microprograms in a microcode ROM on the chip. offering with several tens of GIPS. Personal desktop
Microcoding is the ultimate low-level language: computers, Notebooks, Palmtops and Personal Data
Assistants offer portable performance at reasonable intercommunication is a major factor that contributes to
price. the performance. Memory partitions in such systems are
either tightly coupled or lightly coupled. Coupling
10.2. Components Density results in tightly coupled multiprocessor systems
(Shared Memory) and lightly coupled multiprocessor
For many years, computers used integrated circuits to systems (Distributed Memory)
build the major internal computer components [16].
Based on the density of the components, computers can 10.9. Program Control Mechanism
be classified as Small, Medium, Large and Very Large
Scale Integration with hundreds of millions of Conventional Von Neumann computers use a program
transistors. counter to sequence the execution of instructions in a
program. This sequential execution style has been
10.3. Word-ordering Schemes called control driven. Control-flow computers use
shared memory to hold program instructions and data
There are two possible conventions of storing bytes of a objects. Data Flow Computers perform a computation
word in the memory and computers can be classified as when all the operands are available, instead of being
Big Endian or Little Endian. guided by a program counter. Data flow is one kind of
data driven architecture, the other is demand driven. It
10.4. Communication is a technique for specifying parallel computation at a
fine-grain level, usually in the form of two-dimensional
Data transfer among different points in a computer graphs [17, 18, 19].
system is achieved through sets of lines carrying signals
called buses. Typically, we need data, address and 10.10. Intelligence
control information to carry out such operations [11].
The way signals are multiplexed resulted in double, This class of computers has CPU and memory where
triple and quadruple bus styles of architecture. In information is represented in terms of structures of
multiprocessor systems, multiple bus or crossbar symbols. These computers do not have assets peculiar
switched is used. to human reasoning i.e. they don't have generalization
capability or not able to accumulate knowledge. They
10.5. Input/Output operate in terms of true or false logic. The CPU fetches
the instruction from the memory and performs tasks
Data exchange with the outside is carried out through according to an algorithm (written as a program).
input/output ports, which are polling or interrupt driven Intelligent computers are a class of computer hardware
styles. or software, which is able to implement reasoning skills
based on accumulating knowledge. Intelligent systems
10.6. Number of processors approaches are Expert Systems, Neural networks and
. Fuzzy Logic.
Higher performance is achieved through the use of
more than one processor. The number of processors Expert Systems are computer program that can advise,
used places a computer into one of the following analyze, categorize, communicate, consult, design,
categories: Single processor, modestly parallel diagnose, explain, explore, forecast, form concepts,
multiprocessor, parallel multiprocessor, and massively identify, interpret, justify, learn, manage, monitor, plan,
parallel computer. present, retrieve, schedule, test or tutor. Such systems
are good only in a particular domain; and require large
10.7. Processor Level Parallelism amount of memory to hold all the information and a
powerful computer [20, 21].
Processor level parallelization is concerned with Neural networks are based upon the structure of the
offering multiple streams of data, instructions or both. brain, which consists of billions of cells called neurons.
The result is Single Instruction stream Single Data They need to be trained using of one of the various
stream (SISD), which are as sequential machines, training methods and learning rules [22, 23, 24, 25] .
Single Instruction stream Multiple Data stream (SIMD),
Multiple Instruction stream Single Data stream (MISD) Fuzzy logic is a superset of conventional (Boolean)
for fault tolerance and Multiple Instruction stream logic that has been extended to handle the concept of
Multiple Data stream (MIMD) that are capable of partial truth-values between "completely true" and
executing several programs simultaneously, each of "completely false", i.e. thinking of membership to a set
which has its own data set. as a degree rather than a yes or no situation. Fuzzy logic
10.8. Processor-Memory Dependency is used for solving problems in real-time systems that
must react to an imperfect environment of highly
In complex computing machines, there might be a large variable, volatile or unpredictable conditions, in
number of processors. Those processors are typically particular where it is hard to formalize a solution to the
programmed to work on one problem by partitioning problem in hand [26, 27, 28].
tasks and data objects. Synchronization and
11. NEW CLASSIFICATION METHODS software layer. The hardware component is generally a
simple high-performance engine like the VLIW, with a
In this section we will present several new classification software layer, called the Code Morphing software,
methods based on different criteria. surrounding this engine. Programs designed for a
certain processor are dynamically translated (morphed)
11.1. Number of storage hierarchy levels into the hardware engine's native instruction set [29].

Computers can be classified according to the number of An example of such technologies is the Transmeta's
hierarchy levels of data storage, starting from the Crusoe. Blocks of x86 instructions are translated once,
internal registers, cache memory, main memory, saving the resulting translation in a translation cache.
magnetic or optical disks. Cache memory may consist The next time the (now translated) code is executed; the
of one to three levels. system skips the translation step and directly executes
the existing optimized translation at full speed. This
11.2. Number of Addressable Fields approach eliminates millions of transistors, replacing
them with software. The current implementation of the
For a two-operand arithmetic instruction, five items are Crusoe processor uses roughly one-quarter of the logic
needed to be specified; Operation to be performed, transistors required for an all-hardware design of
location of the first operand, location of the second similar performance. This results in small and more
operand, place to store the result and location of next efficient hardware that is decoupled from the target
instruction to be executed. The 4-address machine instruction set architecture making future upgrades
specifies all the items mentioned in addition to the easier
operation to be performed. However, the inclusion of a
program counter in the processor eliminates the need to 11.6. Reconfigurability
specify the location of the next instruction, and we can
assume it is always the next in line except for branch Reconfigurable computing is intended to fill the gap
instructions. Real machines are usually classified as between hardware and software, achieving potentially
being in the load/store, register-memory or memory- much higher performance than software, while
memory classes. Modern RISC processors are of the maintaining a higher level of flexibility than hardware.
load/store, sometimes called register-register variety. This type of computing is based upon Field
These are 1.5-address machines in which memory Programmable Gate Arrays (FPGAs), and is being
access instructions are limited to two instructions; load termed Dynamic Instruction Set Computers (DISC) and
and store. The range of choices of processor-state Flexible Instruction Set Computers (FISC). These
structure and instruction type trades off flexibility in the devices contain an array of computational elements
placement of operands and results against the amount of whose functionality is determined through multiple
information that must be specified by an instruction SRAM configuration bits. These elements, known as
logic blocks, are connected using a set of
11.3. Fault Tolerance programmable routing resources. In this way, custom
circuits can be mapped to the FPGA by computing the
According to the capability of fault detection, logic functions of the circuit within the logic blocks and
computers can be classified into: Non-fault tolerant using the configurable routing to connect the blocks
computers, partially fault tolerant computers with together to form the necessary circuit [30, 31, 32, 33].
degradable performance and completely fault tolerant
computers, where more than one processor performs the Reconfigurable systems are usually formed with a
same task. combination of reconfigurable logic and a general-
purpose microprocessor. The processor performs the
11.4. Processor identity operations that cannot be done efficiently in the
reconfigurable logic, such as loops, branches, and
SIMD computers can be subdivided into two possibly memory accesses, while the computational
categories: Coprocessor architectures have two or more cores are mapped to the reconfigurable hardware. Run-
processors analyze the instruction stream concurrently, time reconfiguration allows for the acceleration of a
like an integer processor and a floating point processor. greater portion of an application, but may introduce
Multiple unit architectures offer a central decoding unit, some overhead, which limits the amount of acceleration
which starts executing units according to the instruction possible. Because configuration can take milliseconds
that has been decoded. The decoding unit, for example, or longer, rapid and efficient configuration is a critical
can start an integer addition in the integer unit - one issue.
cycle later; it can start the floating-point multiplication
unit. 12. CONCLUSION

11.5. Code Morphing There are numerous methods of classifying computers.


Computers can be classified according to variety of
A revolutionary new approach to microprocessor design criteria such as: cost, performance, memory capacity,
uses a compact hardware engine surrounded by a number of users, processor level parallelism, targeted
applications, component base, intercommunication Symposium, 406-410. Society for Computer
features, input/output architecture, processor-memory Simulation, Oct 95.
dependency, etc. Classes of computers are not absolute. [7] Phil Koopman "Stack Computers & Forth" and
Classification according to certain criteria may not be “MISC M17," http://www.cs.cmu.edu/~koopman
an easy task since the differences between different [8] Robert David, Erin Williams, Ghislain de
computers classes may be ambiguous. For example, Trémiolles, Pascal Tannhof," Description and
designers of CISC processors implement more and Practical Uses of IBM ZISC036" Virtual
more RISC features. Intelligence - Dynamic Neural Networks, June 22-
26, 1998
Classification on the instruction set basis is one of the [9] J-P LeBouquin, IBM Microelectronics ZISC,
most common classification methods. Instruction set “Zero Instruction Set Computer, Preliminary
and addressing modes has a big influence on the Information,” Poster Show WCNN, San Diego,
processor architecture. The more developed instruction CA, 1994 and addendum to conference
set, the more complex addressing modes available, the proceedings
more complex instruction decoding logic is required. [10] C. S. Lindsey et al, "Experience with the IBM
Variable instruction length also contributes to the ZISC036 Neural Network Chip", International
complexity of the decoding logic. Table 1 summarizes Journal of Modern Physics, page 579.
the existing computer architectures with respect to [11] Kai Hwang, Zhiwei Xu, Scalable Parallel
various properties. Computing, the McGraw-Hill, 1998
[12] J. Sanchez, A. Gonzales, “Instruction Scheduling
Post-RISC features are results of the transistor counts for Clustered VLIW Architectures,” International
[1, 34]. Today, transistor counts are extremely high, and Symposium on System Synthesis (ISSS), 2000
they are getting even higher. The problem now is not [13] K. Ebcioglu, E. Altman, S. Sathaye, and M.
how do to fit needed functionality on one piece of Gschwind. "Execution Based Scheduling for
silicon, but what to do with all these transistors. In fact, VLIW Architectures," Proceedings Europar '99,
designers are actively looking for things to integrate ILP and Uniprocessor Architecture, Sep 99.
onto the die to make use of the wealth of transistor [14] Sanjeev Banerjia. “Instruction Scheduling and
resources, and asking what to include rather than what Fetch Mechanisms for Clustered VLIW
to throw out. In conclusion, there is no advantageous Processors.” PhD thesis, Dept. of Electrical and
architecture. There is always a trade off; simplifying Computer Engineering, North Carolina State
hardware can put more burdens on software and vice University, 1998.
versa. Design of computer systems or choice of [15] P. Faraboschi, G. Desoli, J.A. Fisher, “VLIW
commodity computer products for particular application Architectures for DSP and Multimedia
requires consideration of wide range of aspects relating Applications - The Latest Word in Digital and
to hardware, software and their interfacing. Media Processing,” IEEE Signal Processing
Magazine, March 1998.
In this paper new classification methods were [16] Vincent Heuring and Harry Jordan, Computer
introduced based on classification criteria such as Systems Design & Architecture, Addison Wesley
number of storage hierarchy levels, number of 97
addressable fields, fault tolerance, processor identity, [17] Arvind, L. Bic, and T. Ungerer, Evolution of
code morphing, and reconfigurability. Data-Flow Computers, Advanced Topics in
Dataflow Computing, Prentice Hall, 1991.
13. REFERENCES [18] G. R. Gao, "A Flexible Architecture Model for
Hybrid Data-Flow and Control-Flow Evaluation,"
[1] Bhandarkar, "RISC versus CISC: A Tale of Two Advanced Topics in Data--Flow Computing,
Chips", Computer Architecture News, vol. 25, no. Prentice Hall, Englewood Cliffs, NJ, 1991, pp.
1, March 1997 327-346.
[2] R. Russell and R. Grewell. “Software Aids Pull for [19] M. Takesue, “A unified resource management and
Real-time RISC: RISC/CISC Tradeoffs.” execution control mechanism for data flow
Electronic Engineering Times, vol. 51, September machine.” International Annual Symposium on
1994. Computer Architecture, ACM, 1987, pp. 90-97.
[3] L. Gwennap, “Processor Performance Climbs [20] Barry G. Silverman, “Survey of expert critiquing
Steadily.” Microprocessor Report, vol. 9, no. 1, systems: Practical and theoretical frontiers.”
Jan 23, 95, pp. 17-20. Communications of the ACM, vol. 35, no. 4,
[4] Charles Moore and C.H. Ting, "Minimal 1992, pp. 106-127.
Instruction Set Computer", Forth Dimensions, [21] Fink, P.K., Lusth, J.C. and Duran, “A General
January 1995 Expert System Design for Diagnostic Problem
[5] C. Ting and C. Moore. “Mup21: a high Solving.” IEEE Transactions on Pattern Analysis
performance MISC processor.” Forth Dimensions, and Machine Intelligence, vol. 7, no. 5, Sep. 1985.
January 1995. [22] A. Konig, “Survey and Current Status of Neural
[6] Anthony Fong. "HISC: A High-level Instruction Network Hardware.” Proceedings of the Int’l
Set Computer". 7th European Simulation
Conference on Artificial Neural Networks 95, pp. IEEE Transactions on Fuzzy Systems, vol. 4, no.
391-410. 4, Nov 96, pp. 439-459.
[23] M.W. Roth, “Survey of Neural Network [29] http://www.transmeta.com
Technology for Automatic Target Recognition.” [30] M. Wirthlin and B. Hutchings, “A Dynamic
IEEE Transactions on Networks 1, 1990, pp. 28- Instruction Set Computer,” Proceedings of IEEE
43. Workshop on FPGAs for Custom Computing
[24] Y. Le Cun, L.D. Jackel, B. Boser, J.S. Denker, Machines, Napa, CA, April 1995, pp. 99-107.
H.P. Graf, I. Guyon, Henderson, R.E. Howard and [31] G. Goossens et al, “Integration of Medium-
W. Hubbard. “Handwritten digit recognition: Throughput Signal Processing Algorithms on
Application of neural network chips and automatic Flexible Instruction-Set Architectures, ”Journal of
learning.” IEEE Communications Magazine, VLSI Signal Processing, vol. 9 no. 1, 95, pp. 49-
Nov. 1989, pp. 41-46. 65.
[25] Opitz, R. Das Lernfahrzeug, “Neural network [32] Katherine Compton and Scott Hauck,
application for autonomous mobile robots.” “Configurable Computing: A Survey of Systems
Advanced neural Computers, Elseviers, 1990, pp. and Software”, Northwestern University,
373-379. Technical Report, 99.
[26] M.S. Yang, "A survey of fuzzy logic", Math. [33] J. Turley, "Soft computing reconfigures designer
Computing. Modelling (18, 11), 1993. pp. 1-16. options"
[27] Cordon, O., Herrera, F. & Lozano, M. “On the http://www.computer-design.com/Editorial/19
combination of fuzzy logic and evolutionary [34] Bhandarkar and Clark. “Performance from
computation: a short review and bibliography.” In architecture: Comparing a RISC and a CISC with
W. Pedrycz (Ed.), Fuzzy Evolutionary similar hardware organization.” International
Computation, Kluwer Academic 1997, pp. 33-56. Conference on Architectural Support for
[28] M. Patyra, J. Grantner, K. Kirby. "Digital Fuzzy Programming Languages and Operating Systems,
Logic Controllers: Design and Implementation." CA, April 1991, pp. 310-319.
WISC

HISC
MISC

EPIC
ZISC

FISC
CISC

RISC
Instruction Complexity Instruction Complexity
Very Complex Very Complex 3 3
Complex 3 Complex 3
Moderate 3 Moderate
Simple 3 3 Simple 3 3
Instruction Type Instruction Type
Advanced 3 3 Advanced 3 3
Moderate 3 Moderate 3 3
Basic 3 Basic
Instruction Set Size Instruction Set Size
Very Large (>256) 3 Very Large (>256)
Large (64 -256) 3 3 Large (64 -256) 3 3 3
Medium (32 – 64) Medium (32 – 64)
Small (<32) 3 Small (<32) 3
Memory Bandwidth Memory Bandwidth
Large 3 Large 3 3 3 3
Medium 3 3 3 Medium
Small Small
Control Flow Control Flow
Asynchronous 3 Asynchronous 3 3 3
Synchronous 3 3 3 Synchronous 3 3
Data Flow Data Flow
Serial 3 3 Serial
Parallel 3 Parallel 3 3
Serial/Parallel 3 Serial/Parallel 3 3
Performance Domination Performance Domination
Hardware 3 3 Hardware 3 3
Software 3 3 Software 3
Hardware/Software Hardware/Software 3
Applications Applications
General Purpose 3 3 3 General Purpose 3 3 3
Special Purpose 3 Special Purpose 3 3
Programming Style Programming Style
Procedural 3 3 3 3 Procedural 3 3
Non-procedural Non-procedural 3 3
Execution Order Execution Order
Deterministic 3 3 3 3 Deterministic 3 3 3
Speculative Speculative 3
Instruction Length Instruction Length
Uniform 3 3 Uniform 3 3 3
Non-uniform 3 3 Non-uniform 3
Clocks Per Instruction Clocks Per Instruction
Large (>1) 3 Large (>1) 3 3
Medium (=1) 3 Medium (=1) 3
Small (<1) 3 3 Small (<1)
OS Support Needed OS Support Needed
Good 3 Good 3
Fair 3 3 Fair 3 3
Poor 3 Poor 3

Table 1a: Summery of Existing Architectures Table 1b: Summery of Existing Architectures

View publication stats

You might also like