Computer Structures Readings and Examples 1971
Computer Structures Readings and Examples 1971
RICHARD W. HAMMING
Bell Telephone Laboratories
EDWARD A. FElGENBAUM
Stanford University
07-004357-4
1234567890 HDBP 7 9 8 7 6 5 4 3 2 1 0
This book was set in News Gothic by Graphic Services, Inc., printed on
permanent paper by Halliday Lithograph Corporation, and bound by
The Book Press, Inc. The designer was Elliot Epstein; the drawings were
done by John Cordes, J. & R. Technical Services, Inc. The editors were
Richard Dojny and J. W. Maisel. William P. Weiss supervised production.
Preface
The structures that we call computer systems continue to grow in complexity, in
size, and in diversity. This book is linked firmly to the nature of this growth. The
book is about the upper levels of computer structure: about instruction sets, which
define a computer system at the programming level; and about organizations of
processors, memories, switches, input-output devices, controllers, and communica-
tion links, which provide the ultimate functioning system. These levels are just
emerging into well-defined systems levels-with developed symbolic techniques of
analysis and synthesis and accumulated engineering know-how, all expressed in a
crystallized representation. These aspects of computer systems have always existed,
of course, but only in rudimentary form. The classical four-box picture of a com-
puter (arithmetic unit, memory, input-output, and control) is certainly an effective
organization of components to process information. But multiple-processors hier-
archies of memories and remote communications force the top level of organization
into a distinct level, requiring analysis and rational design. Similarly, the 25 instruc-
tions of the IBM 701 computer (developed around 1953) is certainly an instruction
set-indeed one worthy of study. But processors with dozens of registers and
almost unlimited logical circuitry, again force the instruction set to become a topic
of rational analysis and design.
This book is tied to the emergence of these upper levels of organization: eight
years ago (a computer engineer’s half dozen) would have been too early to write
this book; eight years hence would be too late. Eight years ago the diversity and
complexity of computer structures was not sufficient to justify the attention this
book provides. This book would have been too thin. Eight years hence textbooks will
exist that treat these levels systematically. This book will then appear too descriptive.
But right now, as these aspects of computer structure are emerging, and with
systematic treatment still precluded, there is a need to make available material on
these levels for systematic reference and study. Our choice has been to present a
large set of examples, which illustrate the various design options and structural
possibilities, both in instruction sets and in overall configurations. These examples
are descriptions of actual computer systems, taken from the technical literature or
from technical reports and manuals. Descriptions of actual systems are to be much
preferred over idealized abstractions. The latter can reflect the real issues only after
successful systematization.
Not only are the chapters about actual computers, they present much detail. The
complexity of computers resides in part in their size and the multiplicity of their
parts-e.g., to their having 200 instructions rather than 20, or having to service
50 Teletypes rather than 2. It seems essential to describe computer systems in their
entirety, rather than via simplified vignettes. Again, this view stems from the existing
state of the art. Eight years hence, it will not necessarily hold.
We fall from grace on all the above principles, providing occasionally descrip-
tions of paper machines and partial descriptions of partial systems. But our feeling
that detail and reality is important remains. This is why this book is so large; and fit
for study rather than for reading.
V
vi Preface
the design needs they attempted to satisfy. Given that systematic analysis does not
yet exist, there is no substitute for extensive, critical understanding of the existing
examples of designed systems. We assume the student of computer engineering
comes to this book with a working knowledge of logical design. He should find it
possible to realize many of the systems described in this book at the next lower
levels of logic structure.
For the computer scientist, the levels of computer structure discussed in this book
constitute a substantial part of what he should know about the physical devices that
underlie his science. As we pass downward from these levels to lower ones-to
register-transfer systems, sequential logic circuits, combinatory circuits, continuous
circuits and on down-the relevance of each level gradually fades. The levels of this
book, along with the register-transfer level constitute the main aspects of computer
structure that the computer scientist must understand. It does not matter that they
are, as yet, basically empirical and descriptive. The computer scientist undoubtedly
will not be able to carry through the design of the systems described in this book
in terms of the lower logic levels, but this is not necessary for an appropriate grasp
of these upper levels of computer structure. Indeed, this is what it means for distinct
systems levels to exist.
For the electrical engineer, this book undoubtedly presents more examples than
he cares to know (or needs to). But an appropriate sampling, plus the overview
presented in the first three chapters, is appropriate to give him some insight into
the elaborate growth that has occurred on top of the basic digital technology created
within electrical engineering.
The student of systems engineering may also find the material presented here
useful, as an example of a class of complex systems which has evolved several
distinct levels of representation. Again, the book undoubtedly presents too massive
a dose of detail for him, but the overview in the first chapters, plus a sampling
throughout the space of computer systems, should prove highly instructive.
We have goals for the book in addition to the educational ones. We think the book
can serve as a useful reference for the practicing computer engineer. The time is
past when every computer engineer knows about all computer systems because he
has lived through all of computer history. That position is now reserved for those of
us who are past forty (and still active). For the rest, a source book that provides the
cumulated design experience of the field is a useful substitute, especially so if it
contains enough detail so that a designer can reasonably evaluate the actual com-
puter systems that embody a particular design alternative.
Behind the goal of the book as a guide for the practicing computer designer
lies the feeling that the field of computer engineering needs to develop a sense of
history and of looking to the past for guidance. The fantastic advance in basic logic
technology-in speed, cost, and reliability- makes each day seem an absolutely
new one. But, of course, it is not. Many alternative designs have been tried out in
past systems, in ways relevant to current design. Thus, we have the goal of saving
some of the past in a form accessible to the future needs of computer design. This
goal is mixed with a certain archival feeling. Many of the systems in this book have
never been documented, other than in manuals and various elementary how-to
programming books.
viii Preface
A final goal comes from our feelings as computer scientists that the variety of
computer systems is a phenomena worthy of study in its own right. This book carries,
therefore, an invitation to taxonomy-to asking how to classify the diversity of
forms of computer systems that are coming into existence. Taxonomic endeavors
usually take place in a field of natural systems, particularly biological systems. It
may seem strange that a domain of artificial systems calls for taxonomic activity.
But the demand for empirical classification exists whenever there is a population of
significant size and rich structure. Rudimentary classification efforts have occurred
for many populations of artifacts-for ships, for aircraft, for houses. This book
should amply confirm that computer systems are complex and diverse enough-
and undergoing enough continual proliferation and evolution-to command sig-
nificant taxonomic endeavor.
Enough is said in the first two chapters about the new notations introduced in
the book, so that nothing substantive need be added here. We apologize for inflicting
new notation on the reader. We feel that good notations are really quite important
for the aspects of computer structure described in this book. Much would be gained
by the whole field of computers-by users, programmers, engineers, planners,
buyers, sellers, manufacturers, students, and scientists-if relatively uniform
notations came into common use. Although we have no illusions about the perfec-
tion of the notations we have introduced, we would be most happy if they cause a
rise in concern for standard notations and nomenclature.
A large number of distinct systems are described in substantial detail. We have
redescribed many of the systems in the common notation introduced in the book.
The accuracy of all these descriptions is a major problem. Even where the papers
are reproduced from the literature, this problem of accuracy remains-although
then it is not ours alone. Even though we have taken pains to obtain accurate in-
formation on the systems and to portray them faithfully in our various descriptions
and figures, there is no way we can be responsible for their ultimate accuracy. The
PMS and ISP figures, in particular, cannot be guaranteed to be accurate representa-
tions of the systems they purport to describe. Ultimately, one would like to have
simulation languages for such notations and to verify (up to the usual criteria of a
debugged program) that a system given by, say, an ISP description, simulates the
behavior of the target machine. But that day is still far off.
Our most fundamental acknowledgment is to the contributors to this volume,
not only for the articles they have written, but for the computers they have designed
and built, thereby creating a population of fascinating artifacts worthy of study. An
additional reason for reprinting their articles rather than simply describing their
computer systems is the importance of having available the views of the designers
themselves about the nature of their systems.
The research on the basic ideas underlying the notations was supported by
Advanced Research Projects Agency of the Office of the Secretary of Defense
(F 44620-67-C-0058)and is monitored by the Air Force Office of Scientific Research.
We would like to extend an acknowledgment to the organizations that have
produced all of these computers, oftentimes it would seem in defiance of the laws
of economics. Perhaps, as the old saw has it, a computer manufacturer is simply a
computer’s way of breeding another computer. This might account for the tenacity
Preface ix
C. Gordon Bell
Allen Newel1
Acknowledgments
R. H. AZlmarkandJ. R. Lucking: Design of an Arithmetic Unit Incorporating J. R. Hudson, W. H. Leonard, R. C. McReynolds, and G. Shapiro formed
a Nesting Store, Proceedings of the lnternational Federation of lnforma- the basis for the subseqiient efforts. Of particular importance is the
tion Processing Congress 1962, pp. 694-698, North Holland Publishing Co., work of J. 6 . Gregory in tuning the conceptual design to the real
Amsterdam, Holland, by permission from American Federation of Informa- world of technology.
tion Processing Societies (AFIPS), Spartan Books, Washington, D.C.
Theodore R. Bashkow, Azra Sasson, and Arnold Kronfeld: System Design
of a FORTRAN Machine, Transactions on Electronic Computers, vol. EC-16,
R. L. Alonso, H. Blair-Smith, and A. L. Hopkins: Some Aspects of the Logical no. 4, pp. 485-499, August 1967, by permission of the authors and the IEEE.
Design of a Control Computer, A Case Study, Transactions on Electronic The authors acknowledge:
Computers, vol. EC-12, no. 6, pp. 687-697, December, 1963, by permission
of the authors and the Institute of Electrical and Electronics Engineers This research is supported by the Air Force Office of Scientific Research
(IEEE). Contract AF19(628)-2798.
X
Acknowledgments xi
Strela (Arrow),pp. 111-115; Instruction Logic of the MIDAC, pp. 115-121, no. 2, pp. 223-235, April, 1962, by permission of the authors and the IEEE.
chap. 2, Programming and Coding, “Handbook of Automation, Computa- The authors acknowledge:
tion, and Control,” vol. 2, edited by Eugene M. Grabbe, Simon Ramo, and The authors gratefully acknowledge the contributions made to this
Dean Wooldridge, Copyright 01959 John Wiley & Sons, Inc., New York, work by all members of the Atlas computer team at both Manchester
reprinted by permission.
University and Ferranti Ltd.
j . Presper Eckert, jr., James R. Weiner, H. Frazer Welsh, and Herbert F. B. W . Lampson, W . W . Lichtenberger, and M. W . Pirtk: A User Machine
Mitchell: The UNIVAC System, American Institute of Electrical Engineers- in a Time-sharing System, Proceedings of the Institute of Electrical and
Institute of Radio Engineers Conference, pp. 6-16, December, 1951, by Electronics Engineers, vol. 54, no. 12, pp. 1766-1774, December, 1966,
permission of the authors and the IEEE. The authors acknowledge: by permission of the authors and the IEEE. The authors acknowledge:
The UNIVAC System has been an over-all company project and The work for this paper was supported in part by the Advanced Re-
hundreds of people have participated. It is, therefore, difficult to search Projects Agency, Department of Defense, Contract SD-185.
acknowledge the contributions of individuals. However, special men-
tion must be made of the contributions of Mr. H. Lukoff, Mr. E. I. The software portion of the system was designed and written in part
Blumenthal, MI. L. D. Wilson, and Mr. J. D. Chapline, Jr. To the by L. P. Deutsch, who is entitled to equal credit with the authors for
Census Bureau a great debt of gratitude is owed for their continuous the ideas in this paper. L. Barnes also contributed significantly to the
support of the project. final result.
Richard E. Monnier, Thomas E. Osborne, and David S. Cochran: The J. H. Willcinson: The Pilot ACE, by permission from Automatic Digital
HP Model 9100A Computing Calculator. This chapter is a compilation of Computation, pp. 5-14, National Physical Laboratory, Teddington,
three articles: A New Electronic Calculator with Computerlike Capabili- England, March 25-28, 1953.
ties, by Richard E. Monnier, pp. 3-9; Hardware Design of the Model
9100A Calculator, by Thomas E. Osbome, pp. 10-13; and Internal M. V. Wilkes and J. B. Stringer: Micro-programming and the Design of
Programming of the 9100A Calculator, by David S. Cochran, pp. 14-16, the Control Circuits in an Electronic Digital Computer, Proceedings of
which appeared in the Hewlett-Packard Journal, volume 20, no. 1, Septem- the Cambridge Philosophical Society, Pt. 2, vol. 49, pp. 230-238, April,
ber, 1968, by permission of the Hewlett-Packard Journal. 1953,by permission of the authors and the Cambridge Philosophical Society,
Cambridge, England. The authors acknowledge:
R. E. Porter: The RW-400-A New Polymorphic Data System, Data- The authors wish to express their thanks to Mr. A. L. Freedman and
mation, vol. 6, no. 1, pp. 8-14, January/February, 1960, by permission of, Mr. W. Renwick for assisting them in clarifying a number of points,
published and Copyrighted 0 1960 by F. D. Thompson Publications, Inc., and to Professor D. R. Hartree, F.R.S., for his generous help with the
Greenwich. Conn. preparation of the paper.
xiii
Contents1
Preface 2) Acknowledgments X
Contributor.$ Xiii
Chapter 4 Preliminary Discussion of the Logi- Chapter 16 The LGP-30 and LGP-21
cal Design of an Electronic Com- Chapter 17 IBM 650 Instruction Logic-John W.
puting Instrument-Arthur W. Carr I l I
Burks, Herman H. Goldstine, and Chapter 41 The IBM 7094 I, II
John von Neumann 92 Chapter 8 The UNIVAC System-J. Presper
Chapter 5 The DEC PDP-8 120 Eckert, Jr., James B. Weiner,
Chapter 6 The Whirlwind I Computer- H. Frazer Welsh, and Herbert F.
R. K. Everett 137 Mitchell 157
Chapter 33 The IBM 1800 Chapter 23 One-level Storage System-T.
Chapter 7 Some Aspects of the Logical Design Kilburn, D. B. G . Edwards, M . J .
of a Control Computer: A Case Lanigan, and F. H . Summer
Study-K. L. Alonso, H. Blair-Smith, Chapter 34 The Engineering Design of the
and A. L. Hopkins 146 Stretch Computer-Erich Bloch
Chapter 42 The SDS 910-9300 Series
‘This is a “virtual” contents, which means that because many of the computers are relevant to more than one part and section, we have used italic
type to indicate a nonsequential mapping for computers placed out of “physical” order. The reader might read (reference) the book according to the
virtual order.
XV
xvi Contents
Section 1 Processors with Greater than One Address per Instruction 191
Chapter 11 The Pilot ACE-J. H. Wilkinson 193 Chapter 14 Instruction Logic of the MIDAC-
Chapter 12 ZEBRA, A Simple Binary Computer John W. Carr I11 209
-W. L. van der Poel 200 Chapter 15 Instruction Logic of the Soviet
Chapter 13 UNIVAC Scientific (1103A) Instruc- Strela (Arrow)-John W. Carr I11 213
tion Logic-John W. Carr 111 205
Chapter 38 The RW-400: A New Polymorphic
Data System-R. E. Porter
Chapter 19 The OLIVETTI Programma 101 Desk Chapter 9 The Design Philosophy of Pegasus, A
Calculator Quantity-production Computer-
Chapter 12 ZEBRA, A Simple Binary Computer W. S. Elliott, C. E. Owen, C. H.
-W. L. van der Poel Devonald, and B. G. Maudsley
Chapter 16 The LGP-30 and LGP-21 217 Chapter 17 IBM 650 Instruction Logic-
Chapter 11 The Pilot ACE-J. H. Wilkinson John W. Carr 111 220
Chapter 8 The UNIVAC System-J. Presper Chapter 26 NOVA: A List-oriented Computer-
Eckert, Jr., James R. Weiner, Joseph E. Wirsching
H. Frazer Welsh, and Herbert F.
Mitchell
Section 4 Desk Calculator Computers: Keyboard Programmable Processors with Small Memories 235
Section 5 Processors with Stack Memories (Zero Addresses per Instruction) 257
Section 1 Processors to Control Terminals and Secondary Memories (Input-output Processors) 303
Chapter 28 Microprogramming and the Design Chapter 20 The H P Model 91OOA Computing
of the Control Circuits in an Elec- Calculator-Richard E. Monnier,
tronic Computer-M. V. Wilkes and Thomas E. Osborne, and David S.
J. B. Stringer 335 Cochran
Chapter 29 The Design of a General-purpose Chapter 32 A Microprogrammed Implementation
Microprogram-controlled Computer 0.f EULER on IBM System/360
with Elementary Structure- Model 30-Helmut Weber
Thomas W. Kampe 341
Chapter 30 A Command Structure for Complex Azra Sasson, and Arnold Kronfeld 363
Information Processing-J. C. Shaw, Chapter 32 A Microprogrammed Implementa-
A. Newell, H. A. Simon, andT. 0.Ellis 349 tion of EULER on IBM System/360
Chapter 31 System Design of a FORTRAN Model 30-Helmut Weber 382
Machine-Theodore R. Bashkow,
xviii Contents
Section 2 Computers with One Central Processor and Multiple Input/Output Processors 396
Section 3 The IBM System/360-A Series of Planned Machines Which Span a Wide Performance Range 561
This book presents many examples of computer systems. It presents opment of this science and technology of computers (one of us
them in enough detail so that meaningful engineering study and also likes to build computers). To understand why this particular
analysis are possible. Most of these examples are presented by book seems to us to be the right way to push this development
using the original descriptions of them in the technical literature. at this particular time requires characterizing the current state
Others have been redescribed by us, especially where the original of computer-systems technology.
descriptions existed only in technical manuals. In both cases there A computer system is complex in several ways. Figure 1 shows
are considerable discussion and analysis of the computer struc- the most important. There are at least four levels of system descrip-
tures: what problems they were intended to solve, what solutions tion, possibly five, that can be used for a computer. These are not
were adopted, and how these solutions have fared. Yet the em- alternative descriptions in the sense that anything said one way
phasis has remained on detailed descriptions precise enough so can be said another. On the contrary, each level arises from ab-
that the systems themselves are available for independent study. straction of the levels below it. Each does a job that the lower
Why should one want to produce such a book? Collections of levels could not perform became of the unnecessary detail they
reprintings from the technical literature are common in many would be forced to carry around.
science and engineering fields, e.g., “Programming Systems and A system (at any level) is characterized by a set of components,
Languages” [Rosen, 19671. We have departed from this tradi- of which certain properties are posited, and a set of ways of com-
tional exercise in two ways, both of which seem important to us. bining components to produce systems. When formalized appro-
First, we have presented substantial amounts of detail: in effect, priately, the behavior of the systems is determined by the behavior
block diagrams of computer structures and the equivalents of of its components and the specific modes of combination used.
I -
programming manuals. These constitute neither good reading nor -, i I ‘ L ‘ 1
a way of communicating the “essential ideas” in the field. Second, ?
Structures. Network/#, computer/C
we have introduced a system of notation and have used it not only B Components. Processors/P. memories/M,
switches/S. controls/K, transducers / T;
in the parts we ourselves have written but also to provide addi- data operators / D , links / L
tional (sometimes redundant) descriptions of computer systems in
the reprinted articles. Why should there be a book like this? The
reasons are several and require some background discussion.
,
Computer systems /II - , A , i ’ .
r
I, . f . L Y
,
Circuits: Arithmetic unit ,
Components: Registers, transfers,
controls, data operators (+, -, etc.)
Computer systems are one example of man’s more complex arti-
1: ,T.,:
ficial systems.l They have existed as successful engineering prod-
Circuits: Counters, controls, sequential State
ucts long enough to undergo radical evolution and to give rise transducer, function generator, . system
level
register arrays I
lh
to a number of basic, unique technologies. They are sufficiently
Components: Flip- flops -, reset-set /
complex that they have given rise to a science, that is, to a con- US,JKs delay/ D, toggle/ 7; Iotch,
deloy, one shot .
- / ,
1 1
distribu ors, iterative networks Componen
t f
states. in
IWe need not argue that they are his most complex system. That view Compoynts 8ND. OR. NOT, NAND, NOR outputs
&
is myopic. Setting aside quasi-natural systems, such as cities and economies, Circuits: Amplifiers, delays, ottenuators,
it is still the case that a modern aircraft carrier is more complex than a multivibrators, clocks, gates, differentiator
modern computer by any reasonable measure. Active components: Relays, vacuum tubes,
transistors
2Here uniqueness can be claimed, perhaps, since few other artifactual B Passive components: Resistor/ U,capacitor/
systems (again, excluding the quasi-natural ones) provide new phenomena C, inducter/L, diode, deloy lines
that require sustained scientific investigation to understand them. There ,,
certainly is no science of aircraft carriers. But there is a computer science. Fig. 1. Hierarchy of levels: computer structure.
4 Part 1 1 The structure of computers
Elementary circuit theory is an almost prototypic example. The physically existing system. The fact that the languages are highly
components are R’s, L’s, C’s, and voltage sources. The mode of distinct makes it possible to be confident about the existence of
combination is to run wires between the terminals of components, different system levels. Where we are fuzzy, as in the existence
which corresponds to an identification of current and voltage at of an additional intermediate level, it is because new representa-
these terminals. The algebraic and differential equations of circuit tions have not yet congealed into distinct formal languages. As
theory provide the means whereby the behavior of a circuit can we noted, within each level there exists a whole hierarchy of
be computed from the properties of its components and the way systems and subsystems. However, as long as these are all described
the circuit is constructed. in the same language, e.g., a subroutine hierarchy, all given in
There is a recursive feature to most system descriptions. A machine-assembly language, they do not constitute separate sys-
system, composed of components structured in a given way, may tem levels.
be considered a component in the construction of yet other sys- With this general view, let us work through the levels of com-
tems. There are, of course, some primitive components whose puter systems, starting at the bottom. Each level in Fig. 1 actually
properties are not explicable as the resultant of a system of the has two languages or representations associated with it: an alge-
same type. For example, a resistor is not to be explained by a braic one and a graphical one. These are isomorphic to each other,
subcircuit but is taken as a primitive. Sometimes there are no the same entities, properties, and relations being given in both.
absolute primitives, it being a matter of convention what basis The lowest level in Fig. 1 is the circuit level. Here the com-
is taken. For example, one can build logical design systems from ponents are R’s, L’s, C’s, voltage sources, and nonlinear devices.
many different primitive sets of logical operations (AND and NOT, The behavior of the system is measured in terms of voltage, current,
NAND, OR and NOT, etc.). and magnetic flux. These are continuously varying quantities asso-
A system level, as we have used the term in Fig. 1, is charac- ciated with various components, and so there is continuous be-
terized by a distinct language for representing the system (that havior through time. The components have a discrete number of
is, the components, modes of combination, and laws of behavior). terminals, whereby they can be connected to other components.
These distinct languages reflect special properties of the types of Figure 2 shows both an algebraic and graphical description of
components and of the way they combine. Otherwise, there would an inverter circuit, as well as an algebraic and graphical descrip-
be no point in adopting a special representation. Nevertheless, tion of its behavior. We note that its structure is specified first
these levels exist in the system analyst’s way of describing the same as a circuit (a directed graph), with symbols for the arcs and nodes.
The particular circuit still is an abstraction because the transistor
Q1, the resistor R, and the stray capacitors C , are given only token
values. The structure can be described symbolically by first writing
Structure Behavior the relationship describing each of the components (i.e., Ohm’s
law, Faraday’s law, etc.) and then the equation which describes
the interconnection of the components (i.e., Kirchhoffs laws). We
- 3.0VOltS observe the behavior of the circuit (probably using an oscilloscope)
” by applying an input ei(t) and observing an output e,(t). Alterna-
c
a
e
w tively, if we solve the equations which specify the structure, we
270 uuf obtain expressions which describe the behavior explicitly.
-3 The circuit level is not in fact the lowest level that might be
t ’ =0
used in describing a computer system. The devices themselves
require a different language, either that of electromagnetic theory
ic +;/ - i. = 0 AI t ’ =o t e, = o for
.u
- ic =a ie where 1 >> 1 3-volt step, input or of quantum mechanics (for the solid-state devices). It is usually
0 1-a ( e , ( t ’ )= -15(1-e-”RCs)
9 an exercise in a course on Maxwell’s equations to show that circuit
5
in
At f ’ = 0
O=+15- i R-c: 6‘;
: e, = 0 and
di’
where e, h - 3.0 volts)
theory can be derived as a specialization under appropriately
ec=eo=Oat t ’ = O
restricted boundary conditions. Actually, even at its level of ab-
straction, circuit theory is not quite adequate to describe computer
technology since there are a number of mechanical devices which
Fig. 2. Electronic-circuit level: inverter circuit. must be represented. Magnetic tapes and drums are most likely
Chapter 1 5
to come to mind first, but card readers, card punches, and Teletype shows a circuit for a NAND (or NOR) gate plus a table of its
terminals are other examples. These devices obey laws of motion behavior. It is evident that its behavior corresponds to that of the
and are analyzed in units of mass, length, and time. NAND gate only if certain restrictions hold; namely, that one does
The next level is the logic level. It is unique to digital technol- not look at the voltage (which is identified as the behavior variable
ogy, whereas the circuit level (and below) is what digital technol- in the logic circuit) during certain periods when it is transient
ogy shares with the rest of electrical engineering. The behavior (“settling down,” to use the common phrase). Thus the logic level
of a system is now described by discrete variables which take on is an instance of the circuit level only in the same sense that the
only two values, called 0 and 1 (or + and - , true and false, high circuit level is an instance of Maxwell’s equations-as a limiting
and low). The components perform logical functions: AND, OR, case in which certain features are deliberately ignored.
NOT, NAND, etc. Systems are constructed in the same way as One buys a great deal from the specialization to logic circuits,
at the circuit level, by connecting the terminals of components, since one can compute the behavior of circuits at the logic level
which thereby identify their behavioral values. The laws of bool- that are extremely complex at the circuit level. The techniques
ean algebra are used to compute the behavior of a system from for doing so use an entirely different mathematical apparatus. In
., the behavior and properties of its components. general, we cross into another level when the representation at
‘The previous paragraph described combinatorial circuits whose the previous level provides information that is no longer relevant.
outputs are directly related to the inputs at any instant of time. A lower level is concerned with explaining the behavior of a
If the circuit has the ability to hold values over time (store infor- certain structure, whereas the next highest level takes the lower -.
mation), we get sequential circuits. The problem that the com- level as given (a primitive). The higher level is concerned not about
binatorial-level analysis solves is the production of a set of outputs internal behavior but only how primitives are combined.
at time t as a function of a number of inputs at the same time t. A glance at Fig. 1shows that we have described only the lower
As described in textbooks, the analysis abstracts from any trans- part of the logic level. There is another part, called the register-
port delays between input and output; however, in engineering transfer level (or RT level). This is still an uncertain level, a matter
practice the analysis of delays is usually considered to be still part
of the combinatorial level. In Fig. 3 we show a combinatorial
network formed from combinatorial elements which realize three
boolean output expressions, O,, O,, and O,, as a function of the input
boolean variables A and B. Note that in the symbolic representa-
tion of the structure we can write an expression that reflects the
structure of the combinatorial network, but, on reduction, the
boolean equations no longer reflect the actual structure of the
I , c .
combinatorial circuit but become a model to predict its behavior.
The representation of a sequential switching circuit is basically I! Time, t
.\the same as that of a combinatorial switching circuit, although
-. one needs to add memory components, such as a delay element
(which produces as output at time t the input at time t - T). Thus
the equations that specify structure must be difference equations
involving time. Again, there is a distinction (even in representa-
tion) between synchronous circuits and asynchronous circuits,
namely, whether behavior can be represented by a sequence of or. alternatively,
of its behavior and a table that shows the resulting behavior over
Structure Behavior time. Here the graphical structure of the system includes registers
(N, I, S), transfers (S c S + l),data operators (S + 1, I N, etc.). >
” The flowchart shows the behavior of the control with time.
.-
r
a The register-transfer level is still uncertain because there is
Y
mi Sum n 0 0 0 I Ill I I 0
substantial agreement neither on the exact language to be used
Time, f for the level nor on the techniques of analysis and synthesis that
go with it. As we will note below, for both the circuit level and
pmj
the logic-circuit level there exist well-defined representations,
guaranteed, so to speak, by standard textbooks and college courses
that teach these levels. Standard texts on digital computers make
Sinput xr only informal vse of the RT $vel.
Rinput = 7 XI A 7X We have indeed a systems level in emergence here. If one
0 l,o 0,o
:7(XrVX) restricts the transfer operations to boolean operations and thinks
l,o 0,o 0,1
of a register as simply a set of 1-bit memories, one can write a
Sum (output) table set of logic equations for any register-transfer system. Furthermore,
if one considers the role of logic design in digital computers, this
has encompassed both sequential circuits and the register-transfer
e e Table 4o f NAND
Inputs Table of NOR Inputs
behavior behavior
Fig. 4. Sequential-switching-circuit sublevel of the logic level: computa-
tion of x +
1 from serial input string x.
0 0 0
we will discuss after we have finished describing it. The com- NOR logic element
1 1 1
1 1 0 0 NAND logic element OO ’
O (Structure) 0 0
(Structure) 1 0 0 1 1 1
ponents of an RT system are registers and functional transfers 1 0 0 1
between registers. A register is a device that holds a set of bits.’ 1 0 1 1
1 1 0 1
The behavior of the system is given by the time course of values 000 1 1 1 0
of these registers, i.e., their bit sets.
The system undergoes discrete operations, whereby the values
of various registers are combined according to some rule and then
are stored in another register (thus “transferred’)).The law of
combination may be almost anything, from the simple unmodified Circuit
level -15voltS Table o f circuit
transfer (A t B) to logical combination (A t B A C) to arithmetic behavior
(A t B + C). Thus a specification of the behavior, equivalent to -15 volts
the boolean equations of sequential circuits or the differential output
equations of the circuit level, is a set of expressions (often called
0-3 0
productions) which give the conditions under which such transfers Inputs 0 -3 -3 -3
-3
will be made. In Fig. 6 we give a picture of an RT system to +10JOltS -3 0 -3
compute the sum of integers. The figure includes the specification A -3 -3 0
Node -3 -3 -3 0
Multiple input inverter Circuit
‘This assumes that the elementary state variable of the system holds a bit (Structure) (Behavior)
(i.e., one of two values, such as 0 or 1). This need not be; sometimes the
elementary variable holds a decimal digit (one of 10 values) or a character
(one of, say, 48 values). For present purposes we can talk in terms of Fig. 5. Change of representation at the circuit level combinatorial-
bits, without losing anything thereby. switching sublevel boundary.
Chapter 1 7
I
LJ
I \ "
, I
level. The practicing logic designer (by now an institutionalized
position, on a par with that of circuit designer) has sequential and Structure Behavior
incidently, we use the representations of Fig. 7 for the sequential all components active simultaneously. At the program level, com-
switching circuit of Fig. 4. That is, Fig. 7 may be viewed as an puters are represented essentially as serial devices. Second, the
abstraction of the physical system in Fig. 4. To the logic designer program level, but not the logic level, is essentially linguistic in
the state system is a useful abstraction of a logic design. A design nature. At the program level things can be named, abbreviations
usually passes through the following problem representations: can be used, decisions can be made, instructions are interpreted
- all concepts that are strikingly absent from physical systems.
1 The problem exists in a natural language.
Of course, they are not “really” absent since one can give a full
2 The problem is converted to a state diagram (output as description of the operation of a program at the logic level. But
a function of state, and input). one does so by carrying in mind the set of physical behaviors
3 The state diagram is represented as a state table and discovered for computers that make them show the appropriate
output table. linguistic behavior at the program level. Thus, one does not “go
to ALPHA if accumulator is negative’; one has a logic circuit that
4 States are assigned (physical memory elements are used).
transfers the contents of the address field of the instruction register
5 The excitation table and output tables are formed. to the program counter, ANDing that transfer with the sign of
6: The excitation and output logic equations are written the accumulator, so that it does not take place if the accumulator
(constrained by the actual logic elements). is not negative. Such a translation reveals how distinct is the
system boundary between the register-transfer level and the pro-
7 The sequential circuit is drawn.
gram level. The size of the gap is also revealed in the ability of
Let us go to the next higher level, the program leoel. This people to become expert programmers without knowing anything
not only is a unique level of description for digital technology (as about any representations below the programming level.
was the logic level) but is uniquely associated with computers, The program level constitutes an entire technology in its own
namely, with those digital devices that have a central component right, and one that carries within it most of the emergent charac-
that interprets a programming language. There are many uses of teristics of computer systems that make them worthy of a science.
digital technology, especially in instrumentation and digital con- Among the programming languages alone, there are levels of lan-
trols, which do not require such an interpretation device and guage which are so distinct from each other as to constitute system
hence have a logic level but no program level. levels fully as important as the ones exhibited in Fig. 1. Never-
The components of the program level are a set of memories theless, from the viewpoint of someone basically concerned with
and a set of operations. The memories hold data structures which hardware systems, these can all be accounted a single level, at
represent things both inside and outside the memory, e.g., num- least for the present. The one aspect of programming systems that
bers, payrolls, molecules, other data structures, etc. The operations should be of most concern, that of operating systems, is still in
take various data structures as inputs and produce new data struc- such a fragmented state that it does not even begin to be a distinct
tures, which again reside in memories. Thus the behavior of the system level.
system is the time pattern of data structures held in its memories. One peculiarity of the program level is that there exists no
The unique feature of the program level is the representation it universal representation for it, as there does for the circuit or
provides for combining components, that is, for specifying what logic-circuit level (and, it is to be hoped, soon for the register-
operations are to be executed on what data structures. This is the transfer level). Each machine has its own machine language (and
program, which consists of a sequence of instructions. Each in- its own assemblers and command languages built on those ma-
struction specifies that a given operation (or operations) be exe- chine languages). Each of these languages forms a complete sys-
cuted on specified data structures. Superimposed on this is a control tem at the program level, applicable only to the machine in
structure that specifies which instruction is to be interpreted next. question. There is no universal machine language, although there
Normally this is done in the order in which the instructions are is much in common at a conceptual level between all existing
given, with jumps out of sequence specified by branch instructions. machine languages. There has existed a long-standing attempt
Again, Fig. 8 shows a simple program, the data structures, and within the programming field to develop an UNCOL (for Uni-
the behavior. versal Computer Oriented Language) [Steel, 19611 that would
Two things separate the logic level from the program level. play this role, but it has never been successful. The reasons are
First, computer systems at the logic level are parallel devices, with not far to seek. The role of the machine language is to be inter-
Chapter 1 9
preted by the machine in order to produce behavior. It is not free the P ~ ~ - & x w a y S - h% e 4 for-4kert.
~ The
to have arbitrarily desirable properties from our human viewpoint, name is not recognized, nor is any other, since the level exists
since its details affect the efficient operation of the computer too only informally. Nevertheless, its existence is hardly in doubt. It
much - how much space is devoted to the program, how much is the view one takes of a computer system when one considers
time is saved by a special order oriented to matrix multiply, etc. only its most aggregate behavior. It then consists of central proc-
UNCOL was also attempting to fill the same role as machine essors, core memories, tapes, disks, input/output processors, com-
languages, being one from which to compile a machine code for munication lines, printers, tape controllers, busses, Teletypes,
an arbitrary machine. Another reason why there has been no scopes, etc. The system is viewed as processing a medium, infor-
universal programming representation is that each particular mation, which can be measured in bits (or digits, characters, words,
machine language is a language, and so a universal description etc.). Thus the components have capacities and flow rates as their
would seem to be a description of a class of languages. This is operating characteristics. All details of the program are sup-
by no means impossible, as the wide use of notations such as pressed, although many gross distinctions of encoding and infor-
Backus Normal Form (BNF) sh0w.l Nevertheless, it has contrib-
uted to the lack of any universal notation. 'We will propose a notation later. See also the work by F. Haney in his
We now move to the fourth and last level. In Fig. 1 it is called Generalized Instruction System (GIS) [Haney, 19681.
0
0
E , I-I+l,
Time, f
mation type remain, depending on the analysis. Thus one may peared and is replaced by a processor (a combined control and
distinguish program from data, or file space from resident monitor. arithmetic element) in Fig. 10. The central control of Fig. 9 is now
One may remain concerned with the fact that input data are in distributed throughout the remaining components. The control in
alphameric and must be converted into binary, or are bit-serial Fig. 10 is a combined unit for transforming a serial character-
and must be converted to bit-parallel. information stream into words. It also manages the transmission
We might characterize this level as the “chemical engineering of a word vector between the primary memory and a terminal
view of a digital computer,” which likens it more to a continuous- or a secondary memory. The Resource Allocation Diagram is in-
process petroleum-distilling plant than to a place where complex troduced in Fig. 10 to describe the allocation (use), hence be-
FORTRAN programs are applied to matrices of data. Indeed, this havior, of the PMS components as a function of time. Chapter 2
system level is more nearly an abstraction from the logic level describes these figures more fully.
than from the program level, since it returns to a simultaneously Another indication of the emergence of the PMS level lies in
operating flow system. the models used in most operations-research types of studies on
One might question whether there is a distinct systems level computer systems. Again, in the early 1960s these were practi-
here. In the early days of computers almost all computer systems cally nonexistent. Now, with the advent of multiprogramming,
could be represented as in the diagram in M.I.T.’s Whirlwind multiprocessing, and time sharing, and the imminent arrival of
computer programming manual in Fig. 9: with classic boxes of computer networks, there are substantial numbers of such studies.
memory (storage), control, arithmetic, and input/output. Actually, The level of abstraction is always one that considers only flows
this view of the computer in 1953 was considerably advanced; and stocks of information, measured in bits (or an equivalent),
few texts on the logic design of computers in the 1960s have such perhaps divided into several subtypes. The concerns are bottle-
a detailed model. This model has secondary memory (magnetic necks, capacities, total flow rates, queuing problems, buffer sizes,
tape and drums in the Whirlwind’s case). The most interesting and the like. All this indicates a system level above both the logic
aspect of the model, which text writers omit, is any kind of switch- level and the program level.
-.
ing (the bus of Fig. 9). The bus provides a communication path There is no uniform language for representation at this level
to link the other components. Certainly the pushbuttons (actually and even, as we noted, no standard name. We have used the term
the console) is novel for such a model. Compare this with the PMS in analogy to the use of RT for the register-transfer level.
diagram of a modern computer system in Fig. 10, which shows Processors, memories, and switches are the main kinds of com-
a two-processor UNIVAC 1108, the level of abstraction being ponents out of which systems at this level are built. If one names
the same as in Fig. 9. The arithmetic element of Fig. 9 has disap- a number of components at the PMS level, as we did previously,
one finds few switches in the list. “Busses” in our list would be
one, although many would think first of their data transfer charac-
teristics. But, as this book amply shows, what makes the PMS level
both interesting and complex is the existence of switches which
govern the pattern of information flow through the system. One
Difference
reason why they seem buried is their association with other com-
ponents as addressing systems. There are other components besides
processors, memories, and switches, namely, links, transducers, and
controls. But the first three, P, M, and S, seem appropriate to
characterize the level.
It is not known whether there will be yet other systems levels,
say one above the PMS level, as networks come into existence.
The simplicity of the top level argues against it, but that may only
show our narrow vision. It is important to realize that these levels
L A uu are not sacrosanct. They depend strongly on physical technology.
Thus, as we move toward integrated circuitry, there may emerge
Fig. 9. Automatic digital computation. (From the Whirlwind Computer representations other than register-transfer diagrams, and the lat-
Manual, M.I.T. By permission of the publishers.) ter may never develop into a clear systems level. One could even
Chapter 1 11
imagine something happening to the circuit level, as continuous current activity in the area, and the next few years may see its
distributions became more important (although the use of equiva- universal establishment. Although programming is certainly well
lent circuits is well embedded in the engineering culture). We are defined, each machine is a king in his own court, with no common
not concerned with predicting any particular changes. We wish technology of the program level that is relevant to the design of
only to emphasize that the system-levels diagram of Fig. 1 is a computer systems. The latter phrase must be added since we are
reflection both of current technology and of our ways of analyzing taking a very specialized viewpoint here. We do not consider the
given physical systems. As such, these levels have a certain im- world of programming research at all, it being entirely divorced
permanency about them. from computer-systems design.l Finally, at the top, there is practi-
cally no consensus on the nature of the systems level.
There is nothing very surprising about this state of affairs. It
What is the problem? reflects accurately the fundamental fact that only in the past few
The systems levels we have just described correspond to the tech- years have computer systems become complex enough for the
nologies that are available for the analysis and synthesis of com- higher levels to emerge as distinct systems levels. When most
puter systems. Each of these levels exists, in fact, precisely to the computers could be described in the diagram of Fig. 9-and such
extent that a technology has become well developed. Thus both a diagram was reprinted innumerable times in the first decade-
the circuit level and the lower half of the logic level (combinato- there was no need to haire a technology at the PMS level. When
rial and sequential circuits) are highly polished technologies. They registers were so expensive that one could count the registers of
are what one learns today, if one wants to become a computer en- a processor on the fingers of one hand (no thumbs allowed), one
gineer. Textbooks exist, courses are taught, and there is a flourish- did not need a register-transfer language in order to describe the
ing, cumulative technical literature. As we progress up the systems 'This is not entirely true. Each level must provide coupling with adjacent
levels, matters become progressively worse. The register-transfer levels. A major issue in computer-designis the trade-off between hardware
level is not yet well established, although there is considerable and software.
Graphic
M$- S- -Pic - T . c o n s o l e -
i
-Pc-T.console -
I
K i o ( # I : 16) SK - T . cards-
- K i o ( # l :16)
Y 0
+ -Kio(#l :16) >
0
0 r
2 m
m
+
m
flows. In both cases, an informal block diagram conveyed all the Botany” and at another “Computer Taxonomy.” We feel that the
information adequately. attempt to gather, document, and classify these existing computers
The question of the programming level is somewhat different, is a worthy endeavor in its own right. One might think that all
since this level has existed as a formal language from the very start. this material is easily available. But the record fades rapidly,
Here the key aspect, it seems to us, is that, since well-defined especially when much of it exists only as manufacturers’ manuals
languages existed, there was little pressure to find a better one. and papers in assorted proceedings.
The fact that such languages were completely idiosyncratic to the The main reasons for producing this book and for its particular
machine, since they emerged as a product of the design itself, character are by now evident. There is a need for material on the
simply did not worry anyone overly much. Each language provided upper levels of computer systems, both for teaching new students
a design framework one could work into, and this seemed to suffice. of computer science and engineering and for making the past
It led, it is true, to the game of “We have another bit left in the record available for professional designers. Since the technologies
mode field of the instruction-got another mode you’d like?” are not well developed for the upper levels, it is not possible to
But this has only made computer designers feel that creating an write a textbook, making use only of well-accepted techniques,
order code was something of an art. nJtations, and results. Instead, one settles for making available a
Thus we feel that the increased complexity of computer systems collection of examples of systems, so that they can be studied and
is making these higher system levels of increasing importance. analyzed directly.
Since this is only the second decade of the serious development
of computer systems, these upper levels are not in very good shape.
Notations
For instance, textbooks devote very little attention to the area.
Textbooks (especially good ones) tend to be technique-oriented, It remains to say a word about two notations we have introduced,
giving most attention to what is known. (When we were students both about our motivations for doing so and about their character.
we always used to wonder why there were no mathematics texts Some, but not all, of this is already implicit in the foregoing ac-
which told you about the problems that were not solvable in closed count.
form.) Thus the present need for some material at these higher We started simply to produce a set of readings in computer
levels constitutes a major motivation for this book. systems, motivated by the lack of detailed examples we could use
There is a second feature of the current scene that enters into in a course one of us (GB) was giving on computer design. As noted,
our motivation for this book. Around 1,000 different computer we felt the need to expose the students to real examples of complex
systems have been built. This represents a substantial amount of computer structures. As we gathered material we became im-
pragmatic experimentation. This is especially true at the program- pressed (depressed is actually a better term) with the diversity of
ming level and PMS level, and also to some extent a t the register- ways of describing these higher levels. Even more, the amount
transfer level. Many things have been tried, many found worth- of clumsy description-downright verbosity-even in purely
while, and many found wanting. A good deal of reinvention goes technical manuals acted as a further depressant. The thought of
on. Thus we are concerned that this history of experimentation putting such a congeries of descriptions between hard covers for
not be lost. It is true that, if the underlying technology changes one person to peruse and study was almost too much to contem-
enough, the experience may become largely irrelevant, but this plate. Gradually, we began to rewrite and condense many of the
does not appear to us to be an imminent development. descriptions. As we did so, a set of common notations developed.
We will admit also to a third concern, which does not stem Becoming aware of what was happening, we devoted a substantial
from our role as computer engineers concerned with design, but amount of attention and effort to creating notational systems that
from our role as computer scientists, fascinated with the phenom- have some consistency and, we hope, some chance of doing the
ena of computers. The variety of about 1,000 computers represents job required. These are the PMS descriptive system for the PMS
the beginning of a proliferation of a species. It is not under biologi- level (sic) and the ISP (Instruction-set processor) descriptive sys-
cal control but rather under economic and intellectual control. tem for the program level. Each of these requires some comment
Nevertheless, it is in every sense of the word an evolutionary on its nature and the role we think it should play.
population. We find ourselves feeling a little like naturalists must The PMS descriptive‘system is meant to provide a notation
have felt when confronted with the proliferation of the organic for the top level of computer systems. Figure 10 is given in this
world. We were at one time tempted to call this book “Computer notation. On the surface it is largely self-explanatory, given the
Chapter 1 13
mnemonics of P for processor, M for memory, S for switch, T for for superficial use (e.g., looking at Fig. 10) and only modest
transducer (hence also terminal), and K for control (since C is for amounts for full exploitation. They seem to us vastly preferable
computer). There is also L for link, but in most computer struc- to the array of ad hoc notations that we were faced with initially
tures it is unnecessary to distinguish a separate link component, (and with which we almost faced the reader). Still we are aware
except to show connectivity. (It does become appropriate if com- of the price.
munication delays exist.) A word should be said about antecedents. The PMS descriptive
There is an issue about whether this small set of components system is close to the way computer scientists talk informally about
is an appropriate set of primitives, but the issue is not of major the top level of computer systems; no one effort in the environment
proportions. The real issues in the development of the notation stands out as a predecessor. Some notations, such as CPU (for
come from the stress of two opposite forces. On the one hand, one central processing units), have become widespread. We clearly
wants extremely compact notations for expressing computer sys- have assimilated them. Our modifications, such as Pc instead of
tems. The systems are large in any event, and if there is much CPU, are dictated entirely by the attempt to build a consistent
extra notational freight in the way of fixed formats, forced writing notation over the whole range of computer systems. With respect
of what is already known and assumed, etc., then the notation will to ISP, we have been heavily influenced by the work on register-
be neither useful nor used. On the other hand, there is a tremen- transfer languages.' The one that we used most as a kernel from
dous variety and quantity of information that potentially must be which to grow ISP was the work of Darringer and Parnas [Dar-
capable of being written into a description: word size, capacity, ringer, 19691. In particular, their decision to work within the
flow, operation rate, data-types, variations of operation rate for framework of ALGOL suited our own sensibilities, even though
different classes of instructions, parity checking, technology, and the final version of ISP departs from a sequential algorithmic
on and on. Thus one needs a notation that responds to both these language in a number of respects.
demands-and without being hopelessly complex and difficult to Finally, a word should be said about innocence and aspirations.
learn. Our attempt at a solution involves a basically simple lan- We are putting PMS and ISP forward as two notations. They are
guage with comprehensive (and we think natural) ways of sys- that. But they also imply a particular view of digital processing.
tematic abbreviation and abstraction. Thus they are not entirely innocent. It would be appropriate to
The ISP descriptive system is meant to provide a uniform way explore fully this view and to justify the particular decompositions
of describing instruction sets, that is, of giving the information and definitions used. This is not to say that these views are pecu-
contained in a programming manual. It must provide the instruc- liarly ours. They are implicit in the informal use of similar descrip-
tion format, the registers referenced by the instructions, the rules tive systems. However, the attempt to formalize a notation makes
of interpretation of the instruction, and the semantics of each them more accessible. We accept the obligation to perform such
instruction in the processor's repertoire. It must be able to do this an exploration. But this volume is not the place to do so, for that
for any existing computer, plus the expected extensions into the would turn it into something between a treatise and a textbook.
future. Its homeliest virtue is to make it possible to read the For this book, it is appropriate to take these notations at face
descriptions of the forty-odd computer systems described in this value. We have a companion volume in preparation that attempts
book, without having to fight a new notation for each system, and the other job. This is an aspiration.
still to know in detail what the instructions really do. We have other aspirations as well. Notations in the computer
Our attempt at a solution turns out not to be a generalized world should turn into working tools. There are many tasks, such
sort of instruction. Rather, it is very similar in flavor to a register- as the communicative one of this book, where the notation by itself
transfer scheme. The differences lie in being able to suppress all is useful. Others are easy to imagine: writing specifications for new
timing information and all detail that is not essential to under- machines; being sure what the computer salesmen are selling;
standing the instructions. ISP is not a variety of UNCOL, in which standardization of programming manuals, so that learning about
one can program; rather it is a language in which one can describe a new machine is easier; etc. But there are other tasks where the
what any particular instruction set does. We thus avoid many of
'We have not been influenced in a direct way by the work of Iverson
the pitfalls of the UNCOL-like efforts.
[Falkoff, Iverson, and Sussenguth, 19641 in the sense of patterning our
There is a price to be paid for introducing new notations, for notation after his. Nevertheless, his creation of a full description of the
they must be learned. We feel that the two systems we have IBM System/30 in APL stands as an important milestone in moving
introduced here are natural enough to require almost no learning toward formal descriptions of machines.
14 Part 1 I The structure of computers
notations must become formal programming languages, so that that is an instance of the part and section. Usually a chapter
analysis and synthesis procedures can be carried on automatically describes only one computer or computer system, although there
in their terms. As we have noted, the development of ISP and PMS are a few exceptions in Part 6 on computer families.
germinated from purely notational issues. We have not let our A word needs to be said about the “Virtual” Table of Contents.
aspirations to turn them into simulation languages delay our use Many of the example computers are relevant to more than one
of them for purely descriptive purposes. Thus we accept the obli- part and section. Physically, they have to be located at one place.
gation also to develop them as operational tools. That is also an But we have permited multiple entries in the Contents, so that,
aspiration and cannot be dealt with anywhere within this book. for instance, Chap. 33 on the IBM 1800 appears in Sec. 1 of
Part 2 as an example of a one-address ISP, in Sec. 1 of Part 4 as
a terminal control, and finally in Sec. 2 of Part 5 as an example
Plan of the book of a PMS with one central processor and multiple input/output
We now have enough background to explain the structure of the processors (1 Pc, multi-Pio); physically it is located in the latter
book. Two other chapters complete the introductory part. Chapter section. By using different type faces we hope the reader will not
2 provides an exposition of the PMS and ISP descriptive systems. become confused between virtual and actual.
As we have just noted, this does not attempt to explore seriously There is little point in outlining the content of the various parts
the view of digital processing implicit in these notations, although and sections here. This is better done at the end of Chap. 3 after
it does provide a small amount of motivation. A summary of the the computer space has been laid out.
language conventions and parameter values is given at the end
of the book in the appendix.
References
Chapter 3 provides a description of the space of computer
systems. One can view all computer systems as occupying a space Brackets are used to enclose author(s) and year of publication, e.g., [Dar-
whose dimensions are the various important systems features. ringer, 19691 or [Falkoff, Iverson, and Sussenguth, 19641. A list of all the
Many features of the actual systems are relatively locked together. references in a chapter is given in code at the end of the chapter. The
For example, word size and number of instructions in the reper- code refers to the bibliography at the end of the book. This 7- or R-char-
toire covary; no 12-bit machine has 200 instructions but several acter code is as follows:
with over 32 bits do. Thus the number of significant dimensions Characters 1:4 First four characters of the last name of author (or
of variation is much less than the total number of features of first author)
computer systems. Such a space provides a basic frame in which Character 5 First initial of author (or first author)
to choose representative computer systems for inclusion in the Characters 6:7 Year of publication- 1900
book. We hope Chap. 3 will also justify our feeling that there is Character 8 (Optional) a, h, c, . . . , used to denote multiple refer-
a diversity and proliferation of computer systems that is worthy enced publications of author in a year.
of serious study.
The remainder of the book is divided into five parts (2 to 6,
References
with the introduction constituting Part l), and each part into
sections. Each chapter gives a description of a computer system DarrJ69; FaIkA64; HaneF68; RoseS67; SteeT61; ZadeL63.
The PMS and ISP descriptive systems
The task of this chapter is to provide an introduction to the PMS systems because of their immense number of states (of the order
descriptive system for the top computer-system level and to the of 10'O ' O states for a big computer).'
ISP descriptive system for the program level. We take the view Existing digital computers can be viewed as discrete state
that informal notations exist and are in use. PMS and ISP are an systems that are specialized in three ways. These three speciali-
attempt to tidy u p these notations-to make them consistent and zations make possible a much more compact and useful description
more powerful. Thus we depend on the reader already to under- of these systems, the one that we call the PMS description.
stand implicitly much of the notation and how it is to be used. First, the state is realized by a medium, called information,
In consequence, there is no attempt in this chapter to provide which is stored in memories. Thus, a core store of N words each
a formal treatment of the whole system. The appendix 1, at the of 32 bits is a digital device that can exist in one of 232Nstates. Sim-
end of the book contains a complete summary of the notation ilarly, all the states of a processor are made explicit in a set
rules, including the component attributes and values, and their of registers: an accumulator, an address register, an instruction
abbreviations (i.e., the main technical vocabulary). We will pro- register, status register, etc. Each holds a specified number of bits.
vide a brief discussion of the conceptual view underlying the two No permanent information is kept in digital devices except as
systems, since it is an appropriate way to make the notation encoded in bits in a memory. There are two qualifications to this
understandable. But this is informal and heuristic. blanket statement. First, the basic unit of information need not
The two descriptive systems are not independent. There is a be the bit; it could be any base: One can have ternary machines,
common set of notational conventions for abbreviating, for giving decimal machines, etc. Second, the sequential logic circuits that
parameter values, and so on. (The Appendix separates them.) carry out operations in the system have intermediate states. But
Likewise, there exists, in effect, an ISP description for every PMS this is a strictly temporary affair while the operation is occurring,
component, or, conversely, ISP statements imply particular PMS for example, the intermediate, inaccessible, partial results during
component structures. A natural way is to present PMS first, which a multiply operation. At the end-when the smoke has cleared,
will also serve to introduce the main notational devices. Then we so to speak-all information carried over to the next operation
will give ISP. Finally, we will add more comments on the rela- has been encoded into bits in memories somewhere. At the PMS
tionship between PMS and ISP. level we care only about the end result of such operations.
The second specialization of the general state-system view is
that current digital computer systems consist of a small number
PMS level of description of discrete subsystems linked together by flows of information.
Digital systems can be characterized most generally as systems There is a distinct component called the memory, another called
that at any time exist in one of a discrete set of states and that the central processor, another called the ,card reader, etc. This
undergo discrete changes of state with time. This is a highly ab- is analogous to the lumped-parameter specialization at the circuit
stract view. Nothing is said about what physical state corresponds level. Thus the natural representation of a digital computer system
to a system state; nothing is said about what laws of physics trans- is as a graph which has component systems at the nodes and
form the system from one state to another. The states are given information flows as branches. Now, in fact, the discrete character
abstract labels: S,, S,, . . . . The transitions are provided by a of digital encoding in bits prevents there being any truly continu-
state-transition table with many entries of the form: If the system ous digital devices (in analogy to the continuously distributed
is in state Si and the input is Ij, then the system is transformed parameter circuits). But one can have distributed networks with
to state S, and evokes output 0,.(Alternatively, a state diagram very small components. Such iterated arrays are a topic of much
has the same information.) The virtue of this "state-system"
'As we noted in Fig. 1 of Chap. 1, we actually describe some parts of
view is that it truly seems to capture what we mean by a dis- the control mechanisms of computers by state-system diagrams; however,
crete (or digital) system. Its disadvantage lies in this same com- these are exceedingly small pieces. An example may be seen in Fig. 7 on
prehensiveness, which makes it impossible to deal with large page 7.
15
16 Part 1 I The structure of computers
current investigation, as the possibility of manufacturing them by that work on a homogeneous medium called information. Infor-
integrated-circuit techniques has emerged. These distributed net- mation comes in packets, called i-units (for information units), and
works look very different from the computer systems of today, is measured in bits (or equivalent units, such as characters). I-units
although they are still digital systems. Thus, the representation have the sort of hierarchical structure indicated by the phrase: A
as a flow network with functionally specialized nodes is a real record consists of 300 words; a word consists of 4 bytes; a byte
specialization. consists of 8 bits. A record, then, contains 300 X 4 X 8 =
The third specialization of the general state-system viewpoint 9,600 bits. Each of these numbers-300, 4, 8-is called a length,
is that associated with each component in a digital system is a since one often thinks of an i-unit as a spatial sequence of
small number of discrete operations for changing its own state or the next lower i-units of which it is composed. For example,
the state of neighboring components. All transitions must occur one speaks of “word length” and of a record being “300 words
through the application of these few operations, which are evoked long.”
as a function of the current state of the component. The total Other than being decomposable into a hierarchy of factors,
behavior of the system is built up from the repeated execution i-units have no other structure at the PMS level. They do have
of the operations as the conditions for their execution become a referent, that is, a meaning. Thus it is possible to say of an
realized by the results of prior operations. The general state-system i-unit that it refers to an employer’s payroll, to the pressure of
view is more general. The state-transition table for a system may a boiler, or to a prime number satisfying certain conditions. To
exhibit an arbitrary pattern of immediate state transitions, without do so, of course, the i-units encode the information necessary to
regard to how such transition would be physically realized. make the reference. At the PMS level we are not concerned with
To summarize, within this specialized view one wants a way what is referred to, but only with the fact that certain components
of describing a system of an interconnected set of components, transform i-units but do not modify their meaning. In fact, these
which are individual devices that have associated with them a set meaning-preserving operations are the most basic information-
of operations that work on a medium of infomation, measured processing operations of all, and they provide the basic classi-
in bits (or some other base). fication of computer components.
The major complication in this picture is the amount of detail
involved in describing actual computers. It takes a whole manual,
PMS primitives
for instance, to describe the operations of a major computer, such
as the IBM 7090. Thus the descriptive system must permit very In PMS there are seven basic component types, each distinguished
compressed descriptions. It must also permit description of only by the kinds of operations it performs:
those aspects of the components that are of interest, ignoring the
Memory, M . A component that holds or stores information
rest. And what is of interest at the PMS level? Besides a description
(i.e., i-units) over time. Its operations are reading i-units out
of the gross structure of a computer system, it is primarily the of the memory and writing i-units into the memory. Each
analysis of the amounts of information held in various components, memory that holds more than a single i-unit has associated with
the flows of information between components, and the distribution it an addressing system by means of which particular i-units
of the control that accomplishes these flows. can be designated or selected. A memory can also be consid-
Thus a PMS-level description is analogous to the chemical ered as a switch to a number of submemories. The i-units are
engineer’s diagram of a refinery in which he is interested in various not changed in any way by being stored in a memory.
kinds of liquid and gas flow. He has to account for matter and Link, L . A component that transfers information (i.e., i-units)
energy loss with the system at various stages involving the trans- from one place to another in a computer system. It has fixed
duction of materials from one form to another. A specific chemical ports. The operation is that of transmitting an i-unit (or a
plant’s external performance is measured in terms of its production sequence of them) from the component at one port to the
flow rate for a given cost. With computers, external performance component at the other. Again, except for the change in spatial
is concerned with the economical accomplishment of discrete position, there is no change of any sort in the i-units.
tasks, but at the PMS level this translates into operation rates and Control, K . A component that evokes the operations of other
cost of operations. components in the system. All other components are taken to
For the PMS level we ignore all the fine structure of informa- consist of a set of discrete operations, each of which, when
tion processing and consider a system consisting of components evoked, accomplishes some discrete transformation of state.
Chapter 2 I The PMS and ISP descriptive systems 17
With the exception of a processor, P, all other components are Computer model (in PMS)
essentially passive and require some other active agent (a K)
Components of the seven types can be connected to make stored-
t o set them into small episodes of activity.
program digital computers, abbreviated by C. For instance, the
Switch, S. A component that constructs a link between other classical configuration for a computer is
components. Each switch has associated with it a set of possible
links, and its operations consist of setting some of these links C : = Mp-Pc-T-X
and breaking others.
Here Pc indicates a central processor and Mp a primary memory,
Transducer, T. A component that changes the i-unit used to namely, one which is directly accessible from a P and holds the
encode a given meaning (i.e., a given referent). The change may
program for it. T is a transducer connected t o the external environ-
involve the medium used to encode the basic bits (e.g., voltage
ment, represented by X. (The colon-equals (: =) indicates that C
levels to magnetic flux, or voltage levels to holes in a paper
card), or it may involve the structure of the i-unit (e.g., bit-serial is the name of what follows to the right.) Thus a computer is
to bit-parallel). Note that T’s are meaning-preserving but not a central processor connected to its primary memory on the one
necessarily information-preserving (in number of bits), since the hand and to a transducer on the other, which is what an input/
encoding of the (invariant) meaning need not be equally opti- output device is.
mal. Actually the classic diagram had four components, since it
decomposed the Pc into a control (K) and an arithmetic unit or
Data-operation, D. A component that produces i-units with
data-operation (D):
new meanings. It is this component that accomplishes all the
data-operations, e.g., arithmetic, logic, shifting, etc.
b~p- K - T ~ M S ~ - X M~-D--T/MS-X
‘.\I;
or
Processor, P. A component that is capable of interpreting a I
D
program in order to execute a sequence of operations. It consists
of a set of operations of the types already mentioned-M, L,
K, S, T, and D-plus the control necessary to obtain instruc- where the solid information-carrying lines are for instructions and
tions from a memory and interpret them as operations to be their data, and the dotted lines signify control.
carried out. Often logic operations were lumped with control, instead of
with data operations, but this no longer seems t o be the appro-
priate way to decompose the system functionally.
If we associate local control of each component with the ap-
Throughout PMS (and ISP, too) an operation is taken t o mean
propriate component, we get
a transformation of bits from one specific memory to another. For
instance, it is an operation to transmit a word of information from
memory M to memory M’; it is a different operation to transmit
a word from memory M’ to M”. Similarly, it is an operation to
add the contents of memory M to that of M’ and a different
operation to add the contents of M’ to M”.
The reason for emphasizing this point is that one often talks
as if addition were an operation, ignoring the specific locus of the
operands. In a discussion of computer systems, an operation must
include specification of the locus of its operands. The reason is
that the physical devices that realize operations are always local-
ized in space. If, for instance, we wish to have a physical device L J
carrying lines between K and Mp are instructions. Now, suppress- track. Note that the switches are realized by differing technologies.
ing the K’s, then lumping the processor state memory, the data The first two S(random)’s are generally electronic (AND-OR gates)
operators, and the control of the data-operations, and processor -
with selection times of 10 100 microseconds or perhaps electro-
state memory to form a central processor, we again get mechanical (relay). The S(1inear) is the electromechanical action
of a stepping motor or a pneumatic-driven, servomechanism-
Mp-Pc-T-X
system functioning is concerned. To the rest of the system all the Drop the concatenation marker (the
M can do is to remember i-units, accepting and delivering them dot), if it is not needed to recover the
in the same form (voltages). In the Appendix at the end of this two parts (all components are given
book we define for each type both a simple component and a by a single capital letter-here M).
compound component, reflecting in part this fact that complex
subsystems can be put together to perform a single function from Each of these rules corresponds to a natural tendency to abbreviate
the viewpoint of the total system. For example, a typewriter may when redundant information is given; each has as its condition
have 4-6 simple information transduction channels. that recovery must be possible.
In the full description in the appendix each component is
PMS notation defined and given a large number of parameters, Le., attributes
In the above discussions we used various notations to designate with their domain of values. Throughout, we use the slash (/) to
additional specifications for a component, for example, Mp for a introduce abbreviations or aliases as we go.’ Thus p is introduced
functional classification, and S(cyclic) for a type of access function. as an abbreviation for “primary” by writing primary/p when
There are many other additional specifications one wants to give- “primary” is given as one of the values of the attribute “function”
so many that it makes no sense to enumerate them all in advance. of a memory with respect to processors (see page 607). The list
A fixed position notation, such as standard function notation, of parameters in the Appendix does not exhaust those aspects of
F(x,y,z), where the first, second, and third argument places have a component that one might want to talk about. For instance, there
fixed interpretation, is not suitable. Instead we agree on a single are many distinct dimensions for any component in addition to
general way of providing additional specifications. If X is a com- the information dimension: packaging, physical size, physical lo-
ponent, we can write cation, energy use, cost, weight, style and color, reliability, main-
tainability, etc. Furthermore, each of these dimensions includes
X(a,:v,;a,:v,; . . .)
an entire set of parameters, just as the information dimension
to indicate that X is further specified by attribute a, having value breaks out into the set of parameters we have given in the Appen-
vl, attribute a2 having value v2, etc. Each parameter (as we call dix. Thus the descriptive system is an open one, and new param-
the pair a:v) is well defined independently of whatever other eters are definable at any occasion.
parameters are given; hence there is no significance to the order The very large number of parameters provides one of the major
in which they are written or the number which have to be written.
challenges to creating a viable scheme to describe computer sys-
According to this notation we should have written M(function: tems. We have responded to this in part by providing automatic
primary) or S(access-function:random)rather than Mp or S(ran- ways in which one can compress the descriptions by appropriate
dom). This shows immediately the price paid for the general abbreviation while still avoiding a highly cryptic encoding of each
convention: It requires an excessive amount of writing (which separate aspect. Abstraction is another major area in which some
would be even more apparent if a large number of parameters conventions can help to handle the large numbers of parameters.
were given), and the extra information seems to be redundant in It often happens that one has only imperfect information about
some cases. We compensate for these disadvantages by several an attribute, or one wishes to give its value only approximately
conventions for abbreviating and abstracting parameters. All these or partially. For instance, one attribute of a processor is the time
conventions are listed in the Appendix. Let us illustrate them by taken by its operations. This attribute can be defined with a com-
showing some alternative ways of writing Mp: plex value:
M(functi0n:primary) Complete specification. Pc(operation-times: add:4 ps, store:4 p, load:4 ps,
M(primary) Drop the attribute “function,” since multiply:16 ps, . . .)
it can be inferred from the value.
That is, the value is a list of times for each separate operation.
M.primary Use the value outside the parentheses, However, one might wish to give only the range of these numbers;
concatenated with a dot.
Use an explicitly given abbreviation, ‘There is no difficulty in distinguishing this use from the use of the slash
namely, primary/p (only if it is not as a division sign; the latter takes priority, since it is the more specific
ambiguous). use of the slash.
20 Part 1 1 The structure of computers
this is done without introducing a new attribute (i.e., operation- words; and that its operation time is 1.5 ps. We could have written
time-range) simply by indicating that the value is a range: the same information as
M(functi0n:primary; techno1ogy:core; operation-time: 1.5 p s ;
Pc(operation-time: 4 -16 ps)
size: 4096 w; word: (12 + 1)b)
Similarly, one could have given typical times or average times In Fig. 1 we wrote only the values, suppressing the attributes, since
(under some assumed frequency mix of instructions): moderate familiarity with memories permits an immediate infer-
ence about what attributes are involved. For example, it is com-
Pc(operation-time: 4 ps)
mon knowledge that computer memories store information in
Pc(operation-time: average: 8.1 ps)
words; therefore 4096 w must be the number of words in the
memory. As another example, we did not specify the function of
The primary advantage of this notational convention, which per-
the additional bit in the word when we wrote (12 + 1) b. An
mits descriptions of values to be used in place of actual values
informed reader will assume this to be a parity bit, since this is
whenever desired, is that it keeps the number of attributes that
the common reason for having an extra bit in a word. If the extra
have to be defined much smaller than otherwise.
bit had some unusual function, we would have needed to define
A PMS example using the DEC PDP-8 it. That is, in the absence of additional information, the most
common interpretation is to be assumed.
Let us now describe the PMS structure of an actual, though
In fact, we could have been even more cryptic and still com-
small, general-purpose computer, the DEC LINC-8, which is a
municated with most readers:
PDP-8 with a LINC processor. Figure 1 gives the detailed PMS
diagram. In explaining it, we will concentrate on making the M.core(1.S ps/w; 4 kw; 12 b)
notation clear rather than on discussing substantive features of the
This corresponds to the phrase “A 12-bit, 1.5-ps, 4k core store,”
system (which are described in Chap. 5). A simplified PMS diagram
which is intelligible to any computer engineer. The 4 kw stands
of the system shows its essential structure:
for 4 x 1,024 = 4,096, which again is known to computer
engineers; however, if someone less informed took it to be 4 X
1,000 = 4,000, no real harm would be done.
Consider the magnetic tapes for Pc. Since there are eight
possible tapes that make use of the same controller, K, through
a switch S, we label them #0 through #7. Actually, # is an
P.disp1ay-T-
abbreviation for index, which is an attribute like any other, whose
PC (‘L INC) MS- values are integers. Since the attribute is a unique character, we
L do not have to write #:3 (although we could). The additional
parameters give information about the physical attributes of the
This shows the basic Mp-Pc-T-X structure of a C with the addition encoding. These are alternative values, and any tape has only one
of a secondary memory (Ms) and two processors, one of which, of them. We use a vertical bar ( I ) to indicate this (as in BNF
Pc(’LINC), has its own Ms. Two switches are used: the 1/0 Bus notation for grammars). Thus, 75 1 112 in/s says that one can have
which permits access to all the devices, and the Data Break to a tape with a speed of 75 inches per second or one with 112 inches
Mp via Pc for high-data-rate devices. There are many other per second, but not a tape which can be switched dynamically
switches in the actual system, as one can see from Fig. 1; for to run at either speed.
example, Mp is really one to eight separate modules connected For many of the components no further information is given.
by a switch S to Pc. Also there are many T’s connected to the Thus, knowing that M.magnetic,tape is connected to a control
input/output switch, Sio, which we collapsed as a single T, and and from there to the Pc tells generally what that K does. It
similarly for S(’ Data Break). is a “tape controller” which evokes all the actions of the tape,
Consider the Mp module. The specifications assert that it is such as read, write, rewind; therefore these actions do not have
made with core technology, that its word size is 13 bits (12 data to be done by Pc. The fact that there is only one K for many Ms’s
bits plus one other with a different function); that its size is 4,096 implies that only one tape can be accessed at a time. Other infor-
Chapter 2 I The PMS and ISP descriptive systems 21
I T. consol e -
Mp @0;7) !.-S2-Sdc?.-S4- --5K TCTeletype; IO char/s; 8 b/char; 6 4 char)-
- paper tape; (reader; 300 char/s)I (punch:
3 -
c
[,--K 100 char/s): 8 b/char
T-K
K-T(card;
incremental point plot; 300 point/s; .01
i n/poi nt
reader: 2001800 card/min) +
1 4
mation could be given, although that just provided is all that is duction as this. But once assimilated, PMS seems to allow some
usual in specifying a controller in an overall description of a sys- of the flexibility of natural language within enough notational
tem. (The next level of detail goes t o the structure of the actual controls to enhance communication considerably.
operations and instructions and belongs to the ISP level, not the
PMS level.)
We have used several different ways of saying the same thing
in Fig. 1 in order to show the range of descriptive notations. Thus ISP level of description
the 64 Teletypes are shown by describing a single connection The behavior of a processor is completely determined by the
through a switch and putting the number of links in the switch nature and sequence of its operations. This sequence is completely
above the connecting line. determined by a set of bits in Mp, called the program, and a set
Consider, finally, the Pc in Fig. 1. We have given a few param- of interpretation rules that specify how particular bit configura-
eters: the data-types, the processor state, the descendants, etc. tions evoke the operations. Thus, if we specify the nature of the
These few parameters hardly define a processor. Several other operations and the rules of interpretation, the actual behavior of
important parameters are easily inferred from the Mp. The basic the processor depends solely on the particular program in Mp (and
operation time in a processor is a small multiple of the read time also on the initial state of data). This is the level at which the
of its Mp. Thus it is predictable that Pc stores and reads informa- programmer wants the processor described-and which the pro-
tion in 2 x 1.5 p s (one for instruction fetch, one for data fetch). gramming manual provides-since he himself wishes to determine
Again, where this is not the case (as in the CDC 6600) it is neces- the program. Thus the ISP (Instruction-set processor) description
sary to say so. Similarly, the word size in the Pc is the same as must provide a scheme for specifying any set of operations and
the word size of the Mp: 12 data bits. More generally, the Pc must any rules of interpretation.
have instructions that take care of evoking all the components of Actually, the ISP descriptive scheme need only be general
the PMS structure. These instructions do not see the switches and enough to cover some broad range of possibilities adequate for
controls as distinct entities; rather, they speak directly to the oper- past and current generations of machines along with their likely
ation of the M’s and T’s connected via these switches and controls. descendants. As we saw earlier when discussing the PMS level,
Other summary parameters could have been given for the Pc. there are certain restrictions that can be placed on the nature of
None of them would come close to specifying its behavior a computer system, specializing it from the more general concept
uniquely, although to those knowledgeable in computers still more of a discrete state system. It processes a medium, called informa-
can be inferred from the parameters given. For instance, knowing tion; it is a system of discrete components linked together by
both the data-types available in a Pc and the number of instruc- information transfers; and each component is characterized by a
tions, one can come very close to predicting exactly what the small set of operations. These assumptions are built into the PMS
instructions are. Nevertheless, the way to describe a Pc in full descriptive scheme in an integral way. Similarly, for the ISP level
detail is not to add larger and larger numbers of summary param- we can add two more such restrictions, which will in turn provide
eters. It is more direct and more revealing to develop a description the shape of its descriptive scheme.
at the level of instructions, which is the ISP description. The first specialization is that a program can be conceived as
Let us end this introduction to the PMS descriptive system by a distinct set of instructions. Operationally, this means that some
returning to a critical item in its design philosophy. A descriptive set of bits is read from the program in Mp to a memory within
scheme for systems as complex and detailed as digital computers P, called the instruction register, M.instrnction/M.i. This set of
must have the ability to range from extremely complete to highly bits then determines the immediately following sequence of oper-
simplified descriptions. It must permit highly compressed descrip- ations. Only a single operation may be determined, as in setting
tions as well as extensive ones and must permit the selective a bit in the internal state of the P; or a substantial number of
suppression or amplification of whatever aspects of the computer operations may be determined, as in a “repeat” instruction that
system are of interest to the user. PMS attempts to fulfill these evokes a search through Mp. In a typical one- or two-address
criteria by providing simple conventions for detailed description machine the number of operations per instruction ranges from two
with additional conventions that permit abbreviation and abstrac- to five. In any event, after this sequence of operations has occurred,
tions, almost without limit. The result is a notation that may seem the next instruction to be fetched from Mp is determined and
somewhat fluid, especially on first contact in such a brief intro- obtained. Then the entire cycle repeats itself.
Chapter 2 I The PMS and ISP descriptive systems 23
The cycle of activity we have just described is called the inter- +, -, X , /, x 2”, A, V, @, concatenation, etc., which are evoked
pretation cycle, and the part of the P that performs it is called by the instruction-set-interpreter part of a processor.
the interpreter. The effect of each instruction can be expressed The specialization is that all the data-operations can be char-
entirely in terms of the information held in memories at the end acterized as working on various datu-types. For example, there
of the cycle (plus any changes made to the outside world). During is a data-type called the signed integer, and there are data-opera-
execution, operations may have internal states of their own as tions that add two signed integers, subtract them, multiply them,
sequential circuits which are not represented as bits in memories. take their absolute value, test for which of the two is greater, etc.
But by the end of the interpretation cycle, whatever effect is to A data-type is a compound of two things: the referent of the bit
be carried on to a later time has been staticized in bits in some pattern (e.g., that this set of bits refers to an integer in a certain
mem0ry.l range) and the representation in the bit pattern (e.g., that bit 31
The second additional specialization is on the data-operations. is the sign, and bits 30 to 0 are the coefficients of successive
A processor’s total set of operations can be divided into two parts. powers of 2 in the binary representation of the integer). Thus
One part contains those necessary to operate other components a processor may have several data-types for representing numbers:
given in the PMS diagram: links, switches, memories, transducers, unsigned integers, signed integers, single precision floating point,
etc. The operations associated with these components and the double precision floating point, etc. Each of these is a distinct
extent to which they can be indirectly controlled from P are highly data-type, because it requires distinct operations to process it. On
restrained by the basic nature of the components and their con- occasion, operations for several data-types may all be encoded into
trols. The second part contains those operators associated with a a single instruction with a data-type subfield that selects whether
processor’s D component. So far we have said nothing at all about the data are fixed or floating point. The operations are still sepa-
them, except to exclude them completely from all PMS com- rate, no matter how packaged, and so their data-types remain
ponents except P. These are the operations that produce bit pat- distinct.
terns with new meaning-that do all the “real” processing or With these two additional specializations-instructions and
changing of informatiom2 If it were not for data-operations, the data-types-we can define an ISP description of a processor. A
system would merely transmit information. As we noted in our processor is completely described at the ISP level by giving its
original definitions (page 17) a 1’ (including a D) is the only com- instruction set and its interpreter in terms of its operations, data-
ponent capable of directly changing information. A P can create, types, and memories.
modify, and destroy information in a single operation. As we noted Let us concentrate first on the instruction set, leaving the
earlier, D’s are like the primitive components in an analog com- interpreter until later. The effect of each instruction is described
puter. Later, when we express instruction sets as simple arithmetic by an instruction-expression, which has the form
expressions, the D’s are the primitive operators, for example,
condition + action-sequence
cated most easily by examples. The same is true of the condition, called the accumulator (by the designers of the PDP-8). AC corre-
which is a standard expression involving boolean values and rela- sponds to an actual register in the Pc. However, the ISP does not
tions among memory contents. imply any particular implementation, and names may be assigned
Before we get to the examples, let us note two features of the to various sets of bits purely for descriptive convenience. The colon
action sequence. The first is that each action in the sequence may is used to denote a range or list of values. Alternatively, we could
itself be conditional, Le., of the form, “condition + action-se- have listed each bit, separating the bit names by commas, as
quence.” The second is that some actions are sequentially de-
pendent on each other, because the result of one is used as an AC(0,1,2,3,4,5,6,7,8,9,10,11)
input to the other; on other occasions a set of actions are inde- Having defined a second memory, L (which has only a single bit),
pendent and can occur in parallel. The normal situation is the one could define a combined register, LAC, in terms of L and
parallel one. Thus, in the action sequence AC as
Y, t x , ; Y, t x,; Y, t x,; Y, tx, LAC(L,0:11): = L O A C
all the transfers of information may be considered simultaneous.
The colon-equal (:=) is used for definition, and the middle square
In particular, all the X’s have their values defined by the situation
box (0) denotes concatenation. Note that the bit named L of
before the transfer. For example, if A and B are two registers, then
register LAC merely happens to correspond to the 1-bit L register.
(AtB; B t A )
Primary memory state. In dealing with addressed memory, either
exchanges the contents of A and B. When sequence is required,
Mp or various forms of working memory within the processor, we
the term “next” is used: thus
need to indicate multidimensional arrays. Thus
(A t B; next B t A)
Mp[0:7777,] (0: 11)
transfers the contents of B to A and then transfers it back to B,
leaving both A and B holding the original contents of B (and so gives primary memory as consisting of 10000, (Le., base 8) words
this contrived example is essentially just A t B). of 12 bits each, being addressed as indicated. Such an address does
not necessarily reflect the switching structure through which the
An ZSP example using the DEC PDP-8 address occurs, though it often will. (Needless to say, it reflects
The memories, operations, instructions, and data-types all need only addressing space, and not how much actual M is available
to be declared for a processor. Again these are most easily ex- in a PMS structure.) In general, only memory within the processor
plained by example, although full definitions are given in the will occur as operands of the processor’s operators. The one ex-
Appendix at the end of the book. Consequently, let us examine ception is primary memory (Mp), which was defined as a memory
the ISP description of the Pc of the PDP-8, given in Fig. 2 (the external to a P but directly accessible from it.
PDP-8 is explained fully in Chap. 5). Throughout the book the In writing memories it is natural to use base 10 for all numbers
ISP descriptions of computers follow a more highly structured and to consider the basic i-unit of the memory to be a bit. This
format than the ISP notation requires, in order to help the reader is always assumed unless otherwise indicated. Since we used base
see the similarities among the computers. 8 numbers above for specifying the addressing range, we indicated
the change of number base by a subscript, in standard fashion.
Processor state. We first need to specify the memories of the Pc If a unit of information other than the bit were to be used, we
in detail, providing names for the various bits. Thus, would subscript the angle brackets. Thus
AC(0:ll) the accumulator Mp[0:7777,](0: 1)64
is a memory called AC, with 12 bits, labeled at 0 and 11 from
reflects the same memory. The choice carries with it, of course,
the left. Comments are given in italics’-in this case that AC is
some presumption of organization in terms of base 64 characters,
‘There are a few features of the notation, such as the use of italics, which but this would show u p in the specification of the operators (and
are not easily carried over into current computer character sets. Thus, the is not true, in fact, of the PDP-8). We can also have multi-
ISP of Fig. 2 is a publication language. dimensional memories (Le., arrays), though no examples occur in
Chapter 2 I The PMS and ISP descriptive systems 25
Fig. 2. These add the extra dimensions with an extra pair of brack- sequence, separated by the condition arrow (+). In this case the
ets, for example, condition is an expression of the form (op = octal digit). Recall
that op is instruction(0:2), and so this expresses the condition that
M[a:b][c:d]. . .[g:h](x:y)
the operation code of the machine have a particular value. Each
condition has been given a name in passing; e.g., “and” is the name
The PDP-8 memory might better be described as:
of (op = 0). This provides the correspondence between the opera-
Mp[0:7][0:31][0:127](0: 11) tion code and the mnemonic name of the operation code. If this
correspondence had been established elsewhere, or if we did not
representing 8 memory fields with 32 pages per field, 128 words
care what numerical operation code the “and” instruction is, we
per page, and 12 bits per word.
could have written
Instruction f o m a t . It is possible to have several names for the and + (AC t AC A M[z])
same set of bits; e.g., having defined instruction(0:ll) we define
the format of the instruction as follows: We would not have known what condition the name “and” stood
for but could have surmised (with little difficulty) that it was
op(0:2) := instruction(0:2) simply an equality test on the operation code. We will do this
indirect,bit/ib : = instruction(3) on a number of the ISP descriptions later in the book. Most gener-
page,O,bit/p: = instruction(4) ally the form of an instruction is written as
page,address(0:6) : = instruction(5:ll)
two’s complement add/tad(: = op = 1)+
The colon-equal (: =) is used to allow us to assign names to various (LOAC t L O A C + M[z])
parts of the instruction. In effect, we are making a definition which
Here, we simultaneously define the action of the tad instruction,
is equivalent to the conventional diagram for the instruction:
its name, an abbreviation for the name, and the conditions for tad’s
i p pagedddress execution. The parentheses are, in effect, a remark to allow an
OP
I I 1 1 1 l l 1
inline definition. For example, the above single ISP statement is
equivalent to
Pc S t a t e
AC4: I I> Accumulator
L Link bit/AC e x t e n s i o n f o r overf low and carry
P C 4 : 1I> Program Counter
Run 1 when Pc i s i n t e r p r e t i n g i n s t r u c t i o n s or "running"
I n t e r r u p t s, t a t e 1 when Pc can be i n t e r r u p t e d ; under programmed control
IO$ulse,l; I04ulseJ; I0,pulseA I O p u l s e s to IO devices
Mp S t a t e
&tended memory i s not included.
M[O:77778]4:l I >
Page,O[O:177
8
I4:l I > := M [ O : 1 7 7
8
]&:I I> s p e c i a l array o f d i r e c t l y addressed memory r e g i s t e r s
Auto,index[O:7]4:ll> := P a g e g [ l O
8
: I 7 8] 4 : l l > s p e c i a l array when addressed i n d i r e c t l y , i s incremented by 1
1'c Console S t a t e
Keys f o r s t a r t , s t o p , continue, examine (load from memory), and d e p o s i t ( s t o r e i n memory) are n o t included.
Oata s w i t c h e s a : l l > data entered v i a console
I n s t r u c t i o n Format
instruction/idl:ll>
iod4,bit := id>
s ma := i<5> p b i t f o r s k i p on minus AC, operate 2 gy.oup
sza := i<6> u. b i t f o r s k i p on zero AC
snl := i<D + b i t f o r s k i p on non zero Link
LI microcoded i n s t r u c t i o n or i n s t r u c t i o n b i t l s ) w i t h i n an i n s t r u c t i o n
I n s t r u c t i o n I n t e r p r e t a t i o n Process
Run A (Interrupt,request h Interrupt-state) --f ( no i n t e r r u p t i n t e r p r e t e r
i n s t r u c t i o n cM[PCI; PC cPC + I ; next fetch
instruction-execution) ; execute
Run A Interrupt-request A I n t e r r u p t - s t a t e + ( interrupt interpreter
M[O] t PC; I n t e r r u p t - s t a t e t 0 ; PC t 1)
Operate I n s t r u c t i o n S e t
The microprogramed operate i n s t r u c t i o n s : operate group 1, operate group 2, and extended a r i t h m e t i c are d e f i n e d as a separate
instruction set.
Operate-execution := (
c l a (:= i<4> = 1 ) + (AC c 0 ) ; c l e a r AC. Connnon t o a l l o.oerate i n s t r u c t i o n s .
opr-l (:= io>= 0) + ( operate group 3
c11 (:= i<5 = 1 ) + (L 0); next p clear link
cma (:= id>= 1 ) + (AC C - AC); u. complement AC
cml ( : = i<;r> = I ) + (L +7 L ) ; next IL complement L
i a c (:= i<lI>= 1 ) --f ( L m c C L ~ +C 1 ) ; n e x t u. increment AC
ral (:= i d : I O > = 2) + (LWC + L m C x 2 {rotate)); p rotate l e f t
(:= i<8:10> = 3)
r t ~ + (LOAC ~ L O A Cx '
2 (rotate3); u, rotate twice l e f t
r a r (:= i<8:10> = 4) + (LOAC CLOAC / 2 ( r o t a t e ) ) ; u rotate right
r t r ( : = i<8:10> = 5) + (LOAC c L O A C / Z 2 ( r o t a t e l ) ) ; p rotate twice r i g h t
o p r 3 (:= i < 3 , 1 I > = 10) i ( operate group 2
s k i p c o n d i t i o n 62 (id>= 1 ) --f (PC +PC + I ) ; next u AC,L s k i p t e s t
s k i p c o n d i t i o n := ( ( m a A (AC < 0 ) ) v (sza A (AC = 0 ) ) v ( s n l A L))
n s r (:= i - ' 9
h l t (:= i<103= 1 ) -
= 1 ) + (AC < - AC
(Run to));
v Data s w i t c h e s ) ; w "or" switches
)I h a l t or s t o p
o p t i o n a l FA1 d e s c r i p t i o n
EAE (:= i 4 , 1 1 > = 1 1 ) -tEAF,instruction~xecution)
28 Part 1 I The structure of computers
additional conventions into the language, e.g., list the instructions is defined as a conditional expression (in the manner of ALGOL
in a table with their mnemonic names in a special column, rather or LISP):
than write the whole affair as an expression. (In fact, if you ex-
amine the first page of Fig. 2, you will note that the entire descrip- z(0:ll) := (
tion of the PDP-8 Pc is a single expression.) The reason is that -,ib + z”;
although many processors fit such a format very well, not all do ib A (10, Q z” < 17,) + (M[z”] t M[z”] + 1);next
so, e.g., microprogrammed machines. By making the ISP descrip- ib + M[z”])
tion a general expression for evoking action-sequences, we obtain
the generality we need to cover all the variations. We will have The right arrow (+) is analogous to the conditional sign used in
two examples with the PDP-8 itself: the microprogrammed feature the main instruction, equivalent to the “ i f . . . then . . .” of
and the fact that the interpretive cycle simply becomes part of ALGOL. The parentheses are used to indicate grouping in the
the total expression for the behavior of the processor. usual fashion. However, we arrange expressions on the page to
Let us now consider the action-sequence. We use standard make reading easier.
mathematical infix notation. Thus we write As the expression for z shows, we permit conditionals within
conditionals and also the nesting of definitions (zis defined in terms
AC t AC A M[z] of z”). Again, we should emphasize that the structure of such
definitions may reflect the underlying hardware organization, hut
This indicates that the word in Mp at address z is ANDed with it need not. When describing existing processors, as in this book,
the accumulator and the result left in the accumulator. It is as- the ISP description often reflects the hardware. But if one were
sumed that the operation designated hy A is well understood. (The designing a processor, the ISP expressions would he stated as
c, of course, is the transmit operation.) Each processor will have design objectives for the RT structure, and the latter might differ
a basic set of operations that work on data-types of the machine. considerably.
Here the data-type is simply the 12-hit word viewed as an array Special note should he taken of the opr instruction (op = 7)
of hits. in Fig. 2, since it provides a microprogramming feature. There
Operators need not involve memories actually within the Pc are two separate options depending on instruction(3) being 0 or
(the processor state). Thus, 1. But common to both is the operation of clearing the AC (or
not), associated with instruction(4). Then, within one option
(instruction(3) = 0) there are a series of independently executable
actions (following the clearing of L); within the other (instruc-
expresses a change in a word in Mp directly. That this must be
tion(3) = l), there are three independently settable control ac-
mechanized in the PDP-8 by means of some register in Pc is
tions. The nested conditionals and the use of “next” to force se-
irrelevant to the ISP description.
quential behavior make it easy to see exactly what is going on
We also use functional notation; for example,
(in fact a good deal easier than describing it in natural language,
as we have been doing).
AC t abs(AC)
(rather than by the individual operation circuits) is implied in the Run + (instruction t M[PC]; PC c PC + 1; next fetch
expressions for each instruction, and by the expression for the Instruction-execution) execute
effective address. The only thing that is left is to fetch the next
instruction and to execute it. The sequence is evoked so long as Run is true (i.e., its bit value
In a standard machine, there is a basic principle that defines is 1). The processor will simply cycle through the sequence, fetch-
operationally what is meant by the “next instruction.” Normally ing and then executing the instruction. In the PDP-8 there exists
the current instruction address is incremented by 1, but other a halt operation that sets Run to be 0, and the console keys can,
principles are used (e.g., on a processor with a cyclic Mp). In of course, stop the computer. It should be noted that the ISP
addition, several specific operations exist in the repertoire that can descriptions in this book do not, generally, include console behavior.
affect what program is in control. The basic principle acts like A state diagram (Fig. 3) is useful to represent the behavior of
a default condition: If nothing specific happens to determine the instruction-interpretation process. As an instruction is inter-
program control, the normal “next” instruction is taken. Thus, in preted, the system moves from state to state. Any of the states
the PDP-8 we get an interpretation process that is essentially the can be null, in which case a simple transition is to be made to
classic fetch-execute cycle (ignoring interrupts): the successor of the null state. The K(instruction interpreter) con-
fetch
(read)
i“\
?rand operand
store
(write)
-L PCZ PC2
/ operond
operotion
operotion specified address
calculation decoding calculotion calculation
(0v.r) (ov. w:
Return for s t r i n g
Instruction complete or v e c t o r d a t a
fetch next instructioh
’Mp controlled s t a t e
‘Pc controlled s t a t e
Note: Any s t a t e may be null
trols these movements according to the information in the instruc- instruction (indirect-bit, page,O,bit, page-address). Likewise, the
tion. Which states are null and which of multiple alternative form of the ISP expression shows that AC and PC both enter into
transitions occur depend on the instruction being interpreted. the instruction implicitly. That is, in the ISP description all de-
Within each state, various operations are carried out, under pendence on memory is exp1icit.l
the control of subordinate K’s. Note that the upper states in Fig.
3 are controlled by the Mp whereas the lower ones are controlled Data-types and data-operations
by the Pc. We have tried to use a simple mnemonic scheme to This completes the description of the ISP for the PDP-8. For more
label these states: o for operation, q for instruction, a for access, complex machines the number of data-types and the operations
r for read, and w for write. Similarly, we prefix the state with t on them are much more extensive. Then the data-types may be
to indicate the time duration of the state, and we may prefix the declared independently of the instruction set, in the same manner
state by s. as we declared memory.
Figure 3 is somewhat more detailed than is usual. We will use In fact, the one major piece of organization in the structure
it in Chap. 3 to describe a number of different processors. However,
of processors at the ISP le,vel that has not appeared in our example
the figure simplifies the familiar fetch-execute cycle:
involves the data-types. Each data-type has a set of operations
Fetch: {oq, aq} that are proper to it. Add, subtract, multiply, and divide are all
t.fetch = toq taq + proper to any numerical data-type, as well as absolute value and
Execute: (00, ov.r, av.r, 0, ov.w, av.w} negation. Not all of these need exist in a computer just because
t.execute = too + t0v.r + tav.r + . . . + t0v.r it has the data-type, since there are several alternative bases, as
+ tav.r + . . . + to + t0v.w + tav.w well as some levels of completeness. For instance, notice that the
PDP-8 first of all does not have multiply and divide (unless one
Consider, by way of example, the tad instruction of the PDP-8, has its special option), thus having a relatively minimal level of
using the general state diagram of Fig. 3. From the ISP, the net arithmetic operations, and second, it does not have a subtract
effect is operation, using a two’s complement add, which permits negation
( - AC) to be accomplished by complementation (TAC) followed
Run + (instruction t M[PC]; PC t PC + 1; next by add 1. Still, the options are rather few, provided one has de-
tad (: = op = 1)+ (LU AC + L O AC M[z]))+ cided to include a given data-type in the repertoire. In the Ap-
pendix at the end of the book are given with each of the data-types
where
(or classes thereof) the sets of operations that are proper to that
z(0: 11) : = (specijies the effective-address calculation process) data-type.
The PDP-8, for example, does not have several data representa-
The state diagram has more detail to explain the computer’s tions for what is, externally considered, the same entity. An oper-
behavior with respect to timing and its temporary registers. (Note ator that does a floating add and one that does an integer add
a complete state diagram for the physical PDP-8 is given in Fig. are not the same. However, we will denote both by the same
11 of Chap. 5.) The actual state table appears on page 31. symbol (in this case, + ), indicating the difference parenthetically
after the expression. Alternatively, the specification of the data
Notice again that the ISP description does not determine the
type can be attached to the data. Thus, in the IBM 7094 we have
way the processor is to be organized to achieve this sequencing
the instructions
or to take advantage of the fact that many instructions lead to
similar sequences. All it does is specify unambiguously what oper- ‘This is not correct, actually. In physically realizing an ISP description,
ations must be carried out for a program in Mp. The 1SP descrip- additional memories may be utilized (they may even be necessary). It can
tion does specify the actual format of the instruction and how it be said that in the ISP description these memories are implicit. However,
a consistent and complete description of an ISP can be made without use
enters into the total operation, although sometimes indirectly. For
of these additional memories whereas with, say, a single-address machine
example, in the case of the and instruction (op = 0), the definition it does not seem possible to describe each instruction without some refer-
of AC shows that the AC does not depend on the instruction, and ence to the implicit memories-as we see in the effective-address calcula-
the definition of z shows that z depends on other fields of the tion procedures where definitions look much like registers.
Chapter 2 I The PMS and ISP descriptive systems 31
soq [ toq MA t PC; Calculate the address of the instruction, q, and calculate the address of the next
PC c PC + 1 instruction, q + 1. The address is stored in the address register, MA, used
to control the access.
Sfetch
saq 1 1 taq 1 ME tM[MA]
Fetch the data from memory location, M[MA] (i.e., essentially M[PC]), and place
the result in a buffer (temporary) register.
so Y
to L 0 A C t L 0 AC + ME Do the operation specified by the instruction.
Add -+ (AC t AC +
M[e]); We also use braces as a modifier for the operation-type. For exam-
Add and carry logical word/ACL + ( ple, shifting (left or right) can be a multiplication or division by
AC t AC + M[e] {unsignedinteger}); a base, but it is not always an arithmetic operation. In the PDP-8,
Floating add/FAD -+ (AC c AC + M[e] {sf}); for instance, we have
Unnormalized floating add/UFA -+ (AC c AC + M[e] {suf});
Double-precision floating add/DFAD + ( L 0 AC tL 0 AC x 2 {rotate}
ACMQ t ACMQ M[e]OM[e + 11 {df}); +
Double-precision unnormalized floating add/DUFA + (
where the end bits L and AC(l1) are connected when a shift
ACMQ t ACMQ + M[e] 0 M[e + 11 {duf})
occurs (the operator is also referred to as a circular shift).
In general, the nature of the operations used in processors are
The first one, without a special indicator of data-type, is taken
sufficiently familiar to the computer professional that no definitions
to be integer addition; the next, unsigned integer; the next, single
are required, and they can all be taken as primitive. It is necessary
precision floating point; the next, unnormalized single precision
only to have agreed upon conventions for the different data repre-
floating point; the next, double precision floating point; and the
sentations used. The Appendix provides the basic abbreviations.
last, unnormalized double precision floating point. Although there
In essence, a data-type is made up recursively of a concatenation
are often clues that could be used to infer which form of addition
of subparts, which themselves are data-types. This concatenation
is being defined (e.g., double precision takes two words) we label
may be an iteration of a data-type to form an array. Fig. 4 shows
all but the integer operation.
the structure of various data-types and how each is built from more
We use braces { } to differentiate which operation is being
primitive data-types.
performed in the above examples. Thus, above, the data-type is
If required, an operation can be defined in terms of other
enclosed in braces and refers to all the memory elements (oper-
(presumably more primitive) operations. It is necessary first to
ands) of the expression. Alternatively, we use braces as a modifier
define the data format explicitly (including perhaps some addi-
on any memory to signify the information meaning. For example,
tional memory). Variables for the operands are permitted in the
a fixed point to floating point data-conversion operation would be
natural way. For example, binary single-precision floating-point
given as
multiplication on a 36-bit machine could be defined in terms of
AC{floating} t AC{fixed} the data fields as follows:
32 Part 1 I The structure of computers
x l : = normalize(x2) {sf} : = (
( x l mantissa = 0) -+ (xl exponent : = 0);
t'
((xi? mantissa # 0) A (x2(0) = x2( 1)))+ (
x l mantissa := xi? mantissa x 2;
x l exponent : = x2 exponent - 1; next
x l : = normalize(x2) {sf}))
The notational aspect is our use in ISP of a mnemonic abbre- Table 1 Abbreviations used to name data-types
viation scheme for data-types. We have already used sf for single
precision floating point. More generally, as Table 1 shows, an Precision Data-type-name Length-type
abbreviation is made u p of a letter giving the precision, a letter
fractional/f boolean/b * sca Ia r
giving the name, and a letter giving the length. A full treatment quarter/q sign vector/v
can be found in the Appendix. half/h decimal digit/digit/d matrix
The simple naming convention does not take into account all “single/s octal digit/octal/o array
that is known about a data-type. The information carrier for the double/d character/char/ch/c string/st
data is only partially included in the length characteristic. Thus triple/t byte/by
quad r u ple/q syllable
the carrier should also include the data base and the sign conven- multiple/m word/w
tion for representing negative numbers. The common sign con- +integer (eq. 10) signed integer/i
ventions are sign magnitude, true complement (i.e., two’s comple- unsigned integer/ui
ment for base 2), and radix-1 complement (i.e., one’s complement fraction/fr
for base 2). fixed/ m ixed / mx
For each of the data-types the processor must have the implied floating/real/f
unnormalized-floating/uf
operators. In fact, being able to represent a particular entity is complex real/complex/cx
useful only if particular transformations can be carried out on the
entity. The most primitive operation is data movement (i.e., trans- Examples:
w word
mission). Data movement can be thought of as a complex operation
bv boolean vector
consisting of accessing (locating), reading, and writing. Data-types i integer
which represent numbers require the ability to perform the arith- sfr single precision fraction
metic operations +, -, X , /, abs ( ), sqrt, max, min, etc. The mx mixed
address integer is a special case of an arithmetic quantity, and di double integer
10d 10 decimal digit (scalar)
often only additive arithmetic operations ( + and - ) are available
3.ch 3 character (scalar)
for it. Boolean scalars (or vectors) require some subset of the 16 chst character string
logical operations (sufficient subsets are l, A or l, V). When sf single precision floating
character strings are represented, the concatenation, deletion, and suf single precision unnormalized floating
transmission operations are required. Alternatively, we can look df double precision floating
duf double precision unnormalized floating
to string processing languages like SNOBOL or COMIT to see the
operations they require. If the strings also represent numeric quan- *May be optionally omitted from name
tities, then the arithmetic operations are necessary. Almost all
arithmetic and symbolic data require relational operations be-
tween two quantities, yielding a boolean result (true or false). Chap. 1. Second, what is the relationship between a PMS diagram
These relational operators are = and #, but for arithmetic quanti- of a processor and the ISP of that same processor. The questions
ties includes>, >, <, <. The more complex structured data- are related, but each is best answered separately.
types (e.g., vectors and arrays) also have a range of certain primi- With respect to the first question, the PMS system describes
tive operations such as scalar accessing and transmission. Typical the topmost system level (recall Fig. 1 of Chap. l), above the
operations of vectors are search and element-by-element compare programming, logic, and circuit levels. It lacks a characteristic that
operations. all these other levels share, namely, that of providing a complete
description of the computer’s performance. The programming
Relationship between PMS and ZSP
manual (with timing) tells everything that is significant about the
In the introduction to this chapter we discussed briefly the rela- performance of the computer (if it runs error-free). The same is
tionship between PMS and ISP. With the two described, we can true of the full description at the register-transfer level, the logic-
now be more precise. There are really two questions here. First, circuit level, and on down to the electrical circuit level. But the
where do these two descriptive systems fit in with respect to the PMS level is only an approximate description, from which only
general hierarchical view of computer structures discussed in certain aspects of the system’s performance can be calculated.
34 Part 1 I The structure of computers
The ISP does not constitute a distinct system level. Rather, it First of all, every memory in the ISP description corresponds
describes the interface between two levels, the register-transfer to a memory in the PMS description. The data operations in ISP
level and the programming level. It is used to define the compo- imply corresponding D’s in PMS and every occurrence of transmit
nents of the programming level-instructions, operations, and (c) implies a corresponding link between the M’s and D’s on the
seqnences of instructions-in terms of the next lower level. In right hand side and the M on the left, being written into. That
principle, and usually in fact, the language of the lower level is the instructions of the ISP are evoked only under certain condi-
used to describe the components and modes of connections, one tions implies that a control (Koperation-decode) exist in the PMS
level up. In many ways ISP is a register-transfer language (in structure. Similarly, the simple, two-state stored-program model
symbolic rather than graphical form-but as we noted in Chap. (instruction-fetch, instruction-execute) for the interpreter implies
1, there appear always to be two such isomorphic notations at an interpreter control (Kinterpreter). The action-sequence of each
each system level). However, ISP has been extended by allowing instruction, if it contains any semi-colons or next’s, requires addi-
the instruction-expression to be a general linguistic expression for tional K and possibly additional M (if the structure involves em-
a computation, just as if ISP were FORTRAN or ALGOL. This bedded operations such as (A + B) x (C + D)). Thus for every
is what permits us to talk of ISP as not necessarily determining ISP component there is an implied component in the PMS struc-
the exact set of physical registers and transfer paths. The instruc- ture of the processor.
tion-expressions describe the functions to be performed without The PMS diagram model for a computer shown initially on page
entirely committing to the RT structure. 17 has the “natural units” implied by the ISP description (with
If the ISP is the interface language between the RT and pro- the exception of the instruction format part) as suggested on page
gramming levels, what is its relationship to PMS, which is one 24. The data-operations D are therefore implied each time an
level above? Every PMS component has associated with it a set operation is written. Each process implies a control which we
of operations and a control structure for getting those operations lump into the single K of the figure. The model also shows both
executed in connection with the arrival of various external signals. the arrival of instructions and the flow of data between the proc-
As we noted earlier in the chapter, there is an ISP description essor (P) and memory (Mp).
for each operation in its context of control. That is, ISP is the There are several memories within Pc which are not explicitly
interface language for describing all PMS components in terms shown on page 17. These include temporary memory within D
of the register-transfer level, not just P. It happens that only one and the K for carrying out complex arithmetic operations. The
of these PMS components, the processor, carries with it an entire interpreter control has temporary memory, of course. Finally,
new systems level-the programming level. All the other compo- other kinds of memories have been omitted to simplify the model.
nents have no analog of the programming level and interface In multiprogrammed computers a mapping control and memory
directly to the register-transfer level (or even in simple cases to would be used, and in pipeline or highly parallel processors there
the logic-circuit level). Precisely because of the simplicity, we have would be temporary memory for various buffering (e.g., instruc-
not bothered to develop ISP descriptions of other components of tions and data). The Appendix lists the various memories of the
components other than processors. processor.
The second question, namely, the relation between the ISP and K(P), the control for the processor above, controls data move-
PMS descriptions of the same processor, arises from the ability ment among the Mp and M.processor,state and evokes the data-
to represent PMS components recursively as PMS structures made operations of D. Functionally, K(P) can be broken into several
up from more elementary PMS components. Thus, Mp(32 kw, 16 b) parts, each of which is responsible for a part of the overall instruc-
can be considered as compounded of 32k memories, M(l w, 16 b), tion interpretation and execution process, and each corresponds
with an addressing switch, %random. Indeed, if one carries this to a part of the ISP description. This decomposition is allowed
to the limit, where the M’s are single bit memories (flip-flops), in PMS, and if we did so, each component would contain an
the S’s are one bit gates, a couple of specific K’s are defined for independent control for its own domain, e.g., a K(D), K(Mp),
AND and OR, etc., then it is possible to draw a PMS diagram K(1nstruction-set interpreter). More elaborate processor structures
isomorphic to any logic circuit. Thus, a processor (P) can be rep- imply having controls for functions like multiprogram mapping.
resented as a PMS involving M’s, K’s, D’s, s’s, etc., and at varying The K(1nstruction-set interpreter) is the supervisory component
levels of detail. Since we also have a description of this same P which causes other processor K’s to be utilized in a complex
in ISP, it is appropriate to consider the correspondence. processor. In an ISP description of a C, the interpreter usually
Chapter 2 I The PMS and ISP descriptive systems 35
selects only the next instruction and then after decoding (or exam-
ining it) proceeds to have the instruction executed by K(instruction Instruction Data operand
fetch from fetch from
execution). Mptl 1 Mp # 0
By giving a resource allocation diagram along with the state \ ~ I 1 J’ t - time spent in a state
function of various programs and subprograms. They may show resource allocation diagram can then be used to evaluate the
Mp memory occupancy in a multiprogrammed environment. Some structure’s performance (in PMS) at a higher level (e.g., the number
other time scales of particular interest are the instruction(s), short of instructions/second it executes).
instruction sequences or subprograms, and the program times. The
first two time scales are influenced predominantly by the hardware,
and the latter time scale is influenced by software and the ex-
ternal environment.
The resource allocation diagrams also can describe the utiliza-
I
tion of the C’s resources over time (e.g., throughout the instruc-
tion-interpretation process) and provide a basis for more detailed
analysis and design.
\ S t a t e diaaram
,,,(behayj
The design problem at the PMS-ISP interface is mainly one ISP (description
and p r o g r o m )
of resources scheduling.
Summary
\
-- - -- - -- -- - - - - - - ---.
R T ( description
behavior )
[
order must the processing be performed? How are the jobs
the Instruction- Instruction-set and Instruction Execution
interlocked? Process
set Execution
We do not attempt to answer the above questions but intend The above description format conveys a rather narrow-minded
only to show the relationship of the various parts which define view of the ISP structure of computer systems. However, almost
the problem. ISP implies a certain structure (conversely, PMS all present computers fit easily into such a format. We do not
behavior is specified in terms of the ISP language). A particular presume to say whether it will suffice for future ISPs.
ISP structure and a program denote a certain path through a state With the introduction given here and with the definitions and
space as specified by a state diagram. Finally, the physical re- example in the Appendix at the end of the book, it should be
sources (in PMS) are constrained to operate according to the state possible to understand all the PMS diagrams and ISP descriptions
diagram as expressed by using a resources allocation diagram. The used throughout the book.
Chapter 3 PMS and ISP permit the description of an indefinite number
of computer systems-indeed, all that come within the scope of
the current design art. (They might even be taken as a definition
The computer space -
of what that current art is.) Some lo4 lo5 individual computer
systems have in fact come into existence, each of which can be
Introduction described in PMS and ISP. They are not all radically individual.
The preceding two chapters have provided a view of a computer There are about lo3 types of computer systems represented, if
system as an organized hierarchy of many levels: physical devices, we define two systems with the same Pc to be of the same type.
electronic circuits, logic circuits, register-transfer systems, pro- (By exercising various options, a single computer type could take
grams, and PMS systems. We must remember that these are levels on lo5 different forms.)
of description for what, after all, remains the same physical system. Of these thousand-odd types, we present in this book just 40.2
Each higher level describes more of the total system, but with What sort of total population do we have here? What does our
a loss of detail. As this is an engineered system, great care is taken miniscule sample look like when compared with the whole? More
that each level represent adequately all the behavior necessary fundamentally, what are the significant aspects of the computer
to determine the performance of the system. In natural systems systems that should be used in a comparison or classification? These
too there are often many levels of description (e.g., in biological are the questions we will try to deal with in this chapter. We can
systems, from the molecule to the organelle to the cell to the be neither comprehensive nor elegant. There has simply not yet
tissue to the organ to the organism). been done the necessary study on which to base an adequate
However, in natural systems we usually depend on statistics taxonomy of computer systems. Hut we can present a rough picture
to eliminate the details of lower levels and permit aggregation, based on the common lore of the field, filled in with our own
and they always do so imperfectly. In computer systems, on the predilections.
other hand, the aggregation is intended to be perfect. It fails, of For any system, either an entire computer, C, or a component,
course, and so both error detection and error correction exist as such as P, M, or S, it is convenient to distinguish its function, its
fundamental activities in computer systems. But these imperfec- performance, and its structure. The system is designed to operate
tions are ascribed to the system itself and not to our description in some task environment; to accomplish such tasks is its function.
of it, which is just the opposite from how we treat natural systems. How well it does these tasks is its performance. Evaluation of
Only the PMS level of description is natural, in the sense of not performance is normally restricted to these tasks. Although it is
being the intended result of the design. This is because perform- always noteworthy when a system can perform adequately outside
ance is defined ultimately at the programming level. The aggrega- its specified domain (e.g., when a business computer is also a good
tions and simplifications that go into a PMS description (e.g., control computer), it is rarely worth noting when a system cannot
measuring power by bits per second) are approximations, just as perform those tasks it was not built to perform. Thus, function
they are for any natural system (e.g., measuring the productivity denotes scope, and performance denotes an evaluation within that
of the economy by gross national product). scope.
We have provided descriptive systems for the top levels of the Structure denotes those aspects of the system that allow it to
hierarchy: the PMS level and the ISP level, the latter defining the perform. This includes descriptions of its subcomponents and how
basic components of the programming level in terms of the RT they are organized. Performance of subcomponents often may be
level just below. These are the two descriptions that are of most considered structure as far as the whole system is concerned,
concern in the overall design of a computer system. We did not especially if the performance can be taken as given. For example,
define the lower levels, because they go beyond the focus of this early digital transmission-oriented telephone lines came in two
book. Neither did we define the program level, partly because capacities, -200 bits/sec and -2,000 bits/sec. From the view-
there exists no uniform description (no common programming point of the telephone system, these are performance measures;
language) and partly because the computer designer works mostly
at the interface, defining the instruction set. This latter is what and software. The boundary appears to us not quite so invisible. We take
the ISP pr0vides.l the important task to be drawing the boundary in the right place for any
specific design.
'An increasingly popular view is that the program and RT levels (with 2Counting each of the families in Part 6 as one computer. The IBM Sys-
ISP in between) are one, thus erasing the difference between hardware tem/360 is actually a series.
37
38 Part 1 I The structure of computers
from the viewpoint of a computer system with remote terminals, themselves after the fact. Table 1 gives our set for function and
these are structural parameters. structure. Table 3 (page 52) gives our set for performance.
Typically, design proceeds in a context in which the function Table 1 gives only a single dimension for computer system func-
of the to-be-developed system is taken as given and certain struc- tion and 19 for computer structure; Table 3 gives 8 for per-
tures are available; the problem is to construct a structure that formance. However, the dimensions are not all independent. Many
achieves adequate performance. of the structure dimensions are highly (though not perfectly)
These terms apply to any designed system. For example, con- correlated. Thus, in Table 1 we have put the structure dimen-
sider automotive vehicles. Function is a classification by use: cars sions in seven horizontal groups, with the one at the left-hand
to carry people, trucks to carry goods, racers to win competitions, side being the most relevant. (In the first structure group, we
antiques to satisfy nostalgia and collectors’ pride. Performance is have also added two temporal dimensions, since a strong correla-
those aspects of behavior relevant to function: maximum speed, tion with time exists.) For performance, the dimensions form a
power-to-weight ratio, cargo capacity, run versus not run for an tree structure, where the higher dimensions are essentially aggre-
antique, and so on. Structure is such things as number of wheels, gate summaries of the lower ones. Finally, there is a general
shape of the vehicle, stroke volume, and gear ratios. Structure correlation between overall performance and the various structure
determines performance, although from the standpoint of design, dimensions, in Table 1, with increasing performance as one moves
of course, causality runs the other way: from function to perform- down the dimensions. We have left off two important dimensions
ance to structure. because we do not have values; these are reliability (mean time
There are, then, three main ways to classify or describe a between failures per operation) and physical size density (e.g.,
computer system: according to its function, its performance, or bits/ft3), both of which increase with generation.
its structure. Each consists in turn of a number of dimensions. It With each dimension we have indicated the range of possible
is useful to think of all these dimensions as making up a large space values. For some (Pcspeed, for example) this is a numerical quan-
in which any computer system can be located as a point. In such tity. However, for most, the range is a discrete set of design
a space all the thousand computer types built to date constitute choices, which may or may not have a simple ordering. Clearly,
a sparse scatter, clustering (it is to be hoped) in various regions these discrete values are selections from a meaningful subspace
that make sense functionally and economically. The 40 computer of design choices, but mostly we do not know how to construct
types in this book sample this larger scatter in some way, to give that subspace. The values given are those that have arisen in
a picture both of the entire space and of the part already explored. practice, and they serve to classify the computers in the book.
How many dimensions are there in this computer space? In- Obtaining a more rational subspace is a task for future research.
definitely many, if one wants to locate a computer with ultimate The body of the chapter will be taken u p with a discussion
precision. In fact, if one wants to go all the way, one might as of each of these dimensions, where we will discuss further their
well give the PMS and ISP descriptions (and down through the definition, the basis for their selection, and the reasons behind the
RT, logic, circuit, and device levels). The virtue of thinking of arrangements of Tables 1 and 3. We give the entire set of
such a space is to abstract to a small number of dimensions, and dimensions here at the beginning, both for later reference and to
to select those that are most relevant. Of the functions, one wants emphasize the view of a single computer space in which com-
those that most influence the design; of the performance, one puter systems can be located. We will refer to Tables 1 and 3
wants those that make the largest difference; of structure those from now on simply as the computer space or, more narrowly,
that not only affect performance but represent possible design as the computer structure space, the computer performance
choices by the computer engineer. In addition, one wants dimen- space, etc.
sions along which there is significant variation. Those aspects of
History
computer systems which are common to all, such as the use of
binary devices, though of supreme interest are not part of the Like all systems subject to variation and selection, computers have
computer space. evolved through time. So striking and rapid has been this evolution
What are the dimensions of the computer space? As we re- that the concept of “generation” has become firmly embedded in
marked earlier, there is no sufficiently comprehensive theory of the computer engineering culture (to say nothing of the marketing
computer systems to tell us. Considerable lore has grown LIP from culture and the view of the lay public). It is at best an ambiguous
experience to date in designing machines. But at some point one term, having none of the sharpness of its root term in biological
must simply propose a set of dimensions and let them justify evolution, where it is possible to draw a strict genealogical tree.
Chapter 3 I The computer space 39
Nevertheless, the term is useful in stressing that the history of It is a measure of American industry’s generally ahistorical view
computer systems is not just a story of particular men discovering of things that the title of “first” generation has been allowed to
or building particular things, but of a somewhat more impersonal be attached to a collection of machines which were some genera-
and widespread series of advances that have changed computer tions removed from the beginnings by any reasonable accounting.
systems radically. Mechanical and electromechanical computers existed prior to
The generations are best defined solely in terms of logic tech- electronic ones. Furthermore, they were the functional equivalents
nology: The first generation is that of vacuum tubes (1945 - 1958), of electronic computers and were realized to be such. They were
-
the second generation is that of transistors (1958 1966),and the also separated by a wide gap in performance and structure, both
from each other and from vacuum tube machines. Thus, by rea-
third generation is that of integrated circuits ( 1 9 6 6 ~ )In
. fact,
current usage describes hybrid logic technology machines, such sonable reckoning, we are currently in the fifth generation of com-
as the IBM System/360, as third generation, and so this extension puters, not the third. But usage is now too well established to
must he included. What will be called fourth generation is yet change.
to emerge; most likely it will he medium and large scale integrated Actually, it was not always viewed thus. Figure 1 reproduces
circuits with possibly integrated circuit primary memory. a genealogical tree of the early computers prepared by the Na-
Present
’ generation
First
generation
> Predecessors
5 Roots
....’ I
Fig. 1. The “family tree” of computer design. The remarkable growth of electronic computing systems in the Western world began primarily through
government support of research and development in the universities. The need for data-processing facilities of increased capacity inspired further
support for their development i n both educational institutions and private industry. The current generation of computers is predominantly the
result of development by private industry. The tree lists many of the machines developed in these ways. At the roots are the contributions of many
existing technologies t o the rapid growth from electromechanical t o electronic systems. Some of the milestones are ENIAC (Electronic Numerical
Integrator and Computer), the first electronic computer; EDVAC (Electronic Discrete Variable Automatic Computer), the first internally stored-
program computer and first acoustic delay-line storage; MADM (Manchester Automatic Digital Machine), the first index registers (6 lines) and first
cathode-raytube electrostatic storage; MTC (Memory Test Computer), the first core-storage computer. (Courtesy of National Science Foundation.)
40 Part 1 I The structure of computers
Computer function
Scientific
Business
Control
Communications
(switchinglstore and forward)
File control
Terminal
Time sharing
Mechanical
Electromechanical 1930 10-1 1000
(Fluidics) (1970) 10-2
Vacuum tube first 1945 10-3 10
Transistor second 1958 10-5 -1
Hybrid 1964 10-6
Integrated/lC third 1966 10-7 0.1
Medium to large- fourth? 197? 10-8 0.01
scale integrated/
MSI -
LSI
Linear (stack)
Linear (queue)
Bilinear tape (large) > 105 r
Cyclic-random disk (medium) magnetic card (large)l
Cyclic drum (large) drum (small) photostore (large) > 106
Random core (medium) core (smaller) >1 0 7
Content film (small) >108
Associative integrated circuit >109
M p concurrency lnterprocess communication
Serial by bit
Parallel by word
Multiple instruction streams, 1Pc
Multiple data streams (arrays)
1 instruction buffer
n instruction buffer
Look-aside memories
Pipeline processing
42 Part 1 I The structure of computers
tional Science Foundation in 1959. Notice that the Harvard Mark between companies. One advantage of such a time chart is its
machines, which were constructed from relays (hence electro- depiction of the life history of a single system, showing how long
mechanical) are accorded the place of honor as first generation it takes for computer systems to go from paper through prototype
(but Babbage is nowhere to be seen). to production.
It is not appropriate to provide here an adequate history of Not all computer types are shown on the chart, there being
computer technology. The early story has often been told, starting about 250 out of the estimated 1,000 types. Lack of space (and
with Babbage and early mechanical calculators, through Hollerith of perseverance) accounts for the omissions. The major United
punched cards, on to the relay calculators a t Bell Laboratories States manufacturers, as well as some minor ones, and all ma-
and Harvard, up to the birth of electronic machines with ENJAC, chines of substantial historical interest are represented. All the
and finally to the stored-program concept with the von Neumann machines discussed in this book are gathered together on a sep-
machine at the Institute for Advanced Studies (IAS), EDSAC at arate line (though they also occur elsewhere, if appropriate).
Cambridge University, and EDVAC at the University of Pennsyl- Foreign machines are omitted, unless they are described in this
vania (with the contemporary developments by ZUSE in Germany book. In addition, the machines of many early minor manufac-
often left out). And there have been a few scattered attempts to turers are missing (ALWAC, ELECOM, etc.).
tell some of the story of the last three generations. But to date The second part of the time chart arranges many computers
no really satisfactory historical account has been given. This is by word size, to give the reader our classification. Unfortunately,
due in part to recency and in part to the difficulties of evaluating only a few samples are given, owing to space limitations. Thus,
and sorting out the significant developments of a very complex the density on the graph does not indicate the true density of
technology undergoing rapid growth. existing machines. Many small computers, which are dedicated to
What is appropriate here is to view the evolution of computer a particular task, are beginning to be built and a comparatively
systems as measured by the dimensions of computer space and small number of very large computers have been built. On the
to localize the examples of this book in relation to calendar time bottom fine line we place the machines in this book.
and other computers. The concept of generation has led others The third part of the time chart deals with technology by
to attempt the same thing by constructing a family tree, Fig. 1 listing events along various dimensions that have been significant
being but one example. But the relationships between computers in the evolution of computers. Besides the dimensions in the
is not nearly as simple as such a tree implies. We prefer to plot computer space we have also added some dimensions describing
a straightforward t i m e chart,’ as shown in Fig. 2, i n which we group software systems. Although we have not been able t o deal with
the machines by manufacturer and within each group, by ac- the programming level in this book (except for the ISP interface),
knowledged family relationship (for example, 701-704-709-etc.). its development is clearly as important as that of the hardware,
There is clearly relatively closer kinship within a company than and there exists strong mutual interaction between the two.
The fourth (and final) part of the time chart gives selected
‘Whereas we have checked the Time Chart numerous times for accuracy, technological events leading up to the development of the com-
we make no claim about the nuniber of errors it still has. We have relied puter. It includes the early work of Babbage, desk calculators,
on the following source data: (1)Original papers. These are mostly shown
and the Bell Labs and Harvard calculators.
on the chart as “p”. Normally the reader can infer that the work pre-
sented in a paper occurs prior to the actual publication. There are notable Many stories can be read from the chart. For example, note
exceptions (e.g., the core memory, and Atlas papers) which were first pub- that the early Bell Telephone Laboratories relay calculator was
lished to lay claims to certain ideas. (2) Historical reviews. Primary his- used remotely at Dartmouth in 1940, about 20 years prior to
torical papers include: Rosen [1969] and Serrell [1962]. Secondary his- remote use of time-shared computers. Note also that successful
torical review papers include: Bowden [1953], Campbell [1952], Chase
manufacturers tend to have a small number of computer families,
[1952],Nisenoff [l966], and Samuel [1957]. ( 3 )Encyclopedia. (4) Computer
surveys. Two sources have been used: The Adams Associates Computer but add members as the technology dictates. (We omit the exodus
Characteristics Quarterly, published since 1960 [Adams, 1960; Adams of computer companies.) We hope the reader gets as much en-
Assoc., 1966, 1967, and 1968); and Martin H. Weik’s four Surtieys of joyment from browsing the chart as we have (even after we put
Domestic Electronic Digital Computer Systems [Weik, 1955; Weik, 1961 it together!).
(third); and Weik, 1964 (fourth)]. The Adams’ Charts give the date of The computer space in Table 1 and the time chart in Fig. 2
first delivery, and the Weik Survey gives the date the computer was first
operating. (5) Manufacturer, organization or person supplied dates. In a
provide an overall framework. We are now ready to consider each
few cases we have asked directly for sDecific oeerational and delivery of the dimensions individually, starting with those of system func-
information. tion, then the performance, and finally structure.
9l2: 1343 l9t4 19145 l 9 i 6 IC7 19148 19149 iq50 1251 i9,52 1253 1954 l9,55 19,56 I957 l9,58 I959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970
S O S / S c i e n t i f i c Data Systems
--
H I T LlNC based
POP-8 L I N - 8 Q ~ P - R I Z _ P D P - ~ /PDP-B/L
~
-
(12 b/wl k POP-5
[
D E C / O l q i f a l Eguipmevt C o r p o r a t i o n k PDP-4 ,PDP-7 POP-9 -p~P-15
( 1 8 b/w) P D P - i d
1
-6400 6500 7600 ku
-- - -
Large S c a l e S c i e n t i f i c (60 b/w) k 6600
17?0(16 bw
/-.r
CDClControl Data C o r p o r a t i o n k 3200 Lioo33no -3500
(24 b / w ) ~t
=
- - --
160A 160G ,8090,8092
( 1 2 b / w l k u 160.
k 3600 -3400 3800
--
148 b/wl ku 1604. 1604A.
--
I36 b/wl k 6 3 5 , 6 4 5 , 6 2 5 4 3 p p
4 1 2 (24 b/wl.
- --
[
215 235 205
-I
Honeydel i
Computer C o n t r o l s D i v i s i o n - -- -
-
-316
Datamatic 1000 (12 d )
(48 b/w. s t a c k .
.(48 b/w)
,
ku 800-400
m u i t i p r o c e 5 s o r ) 0-5000 0-825 6 0 0 0
l4OO-ldO
85000 05500 88501 B6500? 88501
NOTE: not a f a m i l y a+- - W R P w -
Burroughs
( 6 b / c h a r ) k 8250 m- -
Bu5ine55 (8 b / c h a r )
,
-
,8300
~B2500- B2501'B3500
ir ( 1 2 d / w - p l u q b o a i d program) E I O l *
I10 d/w) k 204,205.
oatatron Division
--E102* E103*
220-
0260
8270
8280
8263 8160
E213 8170
8283 0180
RCA/Radio C o r p o r a t i o n o f America
Business
LCPC*
( A c c o u n t i n g machines-ca:culators)
607" - 604*
STRETCH 170301 164 b/w) - -
609*
.~
608*.610* 6400
~~~ ~.
UNIVAC
I
ERA/Engineering Research ASSOL.
bM/Eckert-Mauchly
ice u n i v e r s i t y
Manchester U n i v e r s i t y
NPL/National P
! hysics La-
boratory 5
(Business)
(SciPntl
"fC* =%ivAc
MARK I ( i n d e x r e g i s t e r s l B - t v b e r )
ACE w l i s E l e c t r i c DEUCE
UNIVAC
n d i x G-15
;
I (12
I
b/w)
MUSE +ATLAS
(one l e v e l
ATLAS
-
UNIVAC I l L ( 2 7 b/w,
=
-
,RICE
b b/char)
e x t r ac o d e s ) -
-
-
ATLAS-LATLAS-2
I050 ( 3 0 b/w-bcd)
Feiranti Corporation
(0~~111255)
(50mhT10gic)-in
L I N C / L a b o r a t o r y l n s t r u m n t Computer
O p e r a t i o n at Wolf R and D
z -a
~~~d ~~~~~~~~i~~
JQHNNIAC
U n i v e r S l t y o f Chicago MANIAC I -
( t u b e s . selecfron mmor?)
- lmagnetlc core)
-
( t r a n s i s t o r s f o r a r i t h m e t i c element1
I 1 (Not IAS c o m p a t i b l e ) 111
- --
"on Neumann
D T IAS Based
e
U n i v e r s i t y o f I l l i n o i s ORDVAC ( f o r BRL ILLIAC I ( a l s o SILLIAC, CYCLONE, ILLIAC I 1 ILLIAC 1 1 1 ILLIAC IV
-P
WElZAC and M I S T I C f r o m same d e s i g n ) : ; 1
(not based) (Solonan based)
-
at lAs+(Burkr, G o l d s t i n e and vonNeumann)
Cambridge U n i v e r s i t y EOSAC (Willies)
- ,EDSAC II
.
University o f ~ e n n s y ~ v a n I a
(Moore School o f E l e c t r i c a l E n g i n e e r i n g ) EDVAC - ( E c k e r t , Hauchly and yon Neurnann)
* s
ENlAC* '--- i (patent f o r e ~ e c t d c i r c u i t s ) a Announcement for sale 51 Scheduled
P
- Delivered f i r s t w W I thdrawn
Bell TeleDhone L a b o r a t o r i T s
-
It* lit* IV*(BalIistici v* VI* Leprechan ( t r a n s i s t o r )
0 Operational k Reasonably c o m p a t i b l e s e r i e s
-- --
p Paper ku Upward c o m p a t i b l e
Haryaard U n i v e r s y t y
1945 19.46
MARK-lI*
1947 If48
HARK
15149
-
171.
1950 1951
MARK I V *
16-24
12-16
i-10
SHALL EARLY S C I E N T I F I C
_- . _. ,
!940 !941 :942 1943 !944 1945 1946 1947 194R 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 l961 1962 1963 1964 1965 1966 1967 1968 (969 1970
OPERATING SYSTEMS
DISCRETE SIMULATION LANGUAGES
i. I S T PROCESSING/STRING MANIPULATII
ALGEBRAIC MANIPULATION LANGUAGES
ALGORITHMIC LANGUAGES
ASSEMBLERS. LOADERSiDERS
HARDWARE LINES
MAPPING
P c CONCURRENCY
I
Pc FUNCTION
PMS STRUCTURE
SECONDARY MEMORY
HEMORY TECHNOLOGY
PRIMARY MEMORY
( 5 i ze;wi d t h ;t i me)
i C h a r l e s Babbage
(1792-1871)
Difference
~~~i~~
H u l l e r s Difference E n g i n e
0
. -
Analytical
card controlled
( 1 0 0 ~w o r d r ,
50 d i g i t s / w o r d l
......
B e l l Telephone L a b s
+-FIRST
I I/ Ill IV v VI
GENERFT I ON+SFCOND+TH I RD-
-2 1.
z i
u ;
A :
% j
I
MFCHANICAL MEMORY
I I
RLEGRAPH
VACUUM TUBES
--
................................................................................
ELECTROMAGNET. TELEPHONE.
I
j ELECTRO-
I
I
35
9
I
40
Operational
p Paper
I
L5
I
50 55
I I
60 6:
I
70
Function
The most striking fact about function is the existence of only a assembled for a task. The latter is often carefully specialized to
single dimension, and with only a few values. Perhaps we have the function to be performed. But this is mostly the amount of
taken a simplistic view of the functions that computers perform, Mp, the amount of types of Ms, and the number and types of T's.
but we think our computer space represents reality: To wit, tkere Within limit?, these are all items that can be attached to any type
is remarkably little shaping of computer structure to fit the func- of computer (i.e., to any Pc) and are handled in an environment-
tion to he performed. independent way. Thus there is little specialization of computer
At the root of this lies the general-purpose nature of computers, types, but great specialization of particular configurations. That
in which all the functional specialization occurs at the time of this should be the case indicates something about the nature of
programming and not at the time of design. However, it might the functional specialization-that it can be expressed adequately
seem that specialized environments would not require all the gen- in gross PMS terms, as more bits of storage and more data rate.
erality, so that functional adaptation would still be possible. But There is still more to the story. Some functional specialization
this appears not to be so for two reasons. First, the level of opera- exists, as indicated in the dimension. This depends primarily on
tions of the Pc (as defined in the ISP) is too basic to reflect the two kinds of things beyond the reach of the configurational adapta-
kind of specialization offered by the environment (think of infor- tion described above. The first consists of demands for reliability,
mation-transfer or conditional-transfer operations). Second, all ruggedness, small size, etc. These have strong effects on design,
environments ultimately require a variety of tasks in addition to but below the ISP and PMS levels. The second consists of demands
the main specialized task. These include at least language com- for large amounts of processing power. One response to this again
pilation or assembly, readable formatted output, debugging aids, affects design at the lower levels of logic, devices, and circuitry
and other utility routines. By the time these have been added, a and has little impact on design at the ISP and PMS level. But
substantial requirement for generality has been generated. response is also possible in terms of the data-types that are built
However, this is not the whole story. A second part is the differ- into the ISP. Large machines have data-types that are appropriate
ence between the computer type and the specific configuration to their tasks (with operations to match), and these affect the
Chapter 3 I The computer space 47
design. In fact, this effect is the substance of the functional spe- in the look-ahead of Stretch (Chap. 34) and the n-instruction buffer
cialization shown in the computer-space dimension. of the CDC 6600 (Chap. 39). This might be considered a unique
Finally, there is one last part of the story, and it is the most functional specialization for scientific computation. It is too early
interesting of all. Various groups of computer engineers have felt to tell, but it is our impression that, although the needs for sci-
strongly from time to time that functional specialization should entific computation initiated the exploration of concurrency and
exist, and they have set out to create such machines. These efforts parallelism, we will eventually see them in all computers above
have often produced machines that were different from the exist- a certain power, whatever the task domain. Physical limits on
ing main line of computers, i.e., were appropriately specialized. component speed and signal propagation will make these tech-
But the net effect of almost all such attempts has been that the niques universally attractive.
new idea was seen to be good in general for all computers and A better case for permanent specialization can be made in the
was taken back into the main line of computers. Thus, what started special algorithm computers, which compute the fast Fourier
out to be a functional separation turned out to be simply a way transform or do vector operations. Here we finally have systems
to produce rapid development of a more universally applicable whose whole design is responsive to a narrow class of problems.
computer. A classic example is the expansion of input/output This may extend to the very special kinds of Pc parallelism exhib-
facilities in creating a functionally specialized business machine, ited by the ILLIAC IV (Chap. 27), although there is substantial
which simply led to better 1/0 facilities for all computers. We generality in such systems.
will have more to say about such examples as we discuss the values
along the dimension. Business. In the early days of electronic computing it was felt by
many that there was a major functional separation between busi-
Computer-system function ness computing and scientific c0mputing.l Scientific problems were
Scientific. The first machines were clearly designed for scientific “large computing-small input/output”; business problems were
calculations. In fact, Aberdeen Proving Grounds funded the early “small computing-large input/output.” Certainly most of the
work on the ENIAC for the computation of ballistic firing tables. existing computers, designed for scientific computation, had poor
And the image used frequently by the early computer designers input/output facilities. The IBM 701, for example, used the Pc
was the computer as a statistical clerk, the arithmetic unit being to control everything dynamically, actually catching the bits from
the desk calculator, the memory the work sheet, and the program running tapes on the fly (by executing well-timed small loops).
the instructions that the mathematician gave to the clerk. These design efforts for business computers resulted in the IBM
From a design standpoint, scientific computation has posed two 702 (and subsequently the IBM 705,708, and 7080). This machine
striking requirements. The first is the great accuracy of the num- had two major innovations for IBM: It used characters, and it had
bers, which has led to word lengths of 36 to 60 bits (11 to 18 a PMS structure that permitted more flexible and voluminous
decimal digits of significance) and arises from the propagation of input/output. The latter feature was immediately incorporated
roundoff error during repeated arithmetic operations. The second into scientific computers, e.g., into the 709, and then into all large
is the emphasis on fast arithmetic operations, i.e., for arithmetic scientific computers as separate inpnt/output control (either Kio
power. In the early machines the standard rule for estimating or Pio), for it was realized that there were also demands on input/
computation times was to count the number of multiplications in output for scientific calculation. Thus the bifurcation was tempo-
a program; all else could be neglected. The arithmetic unit has rarily halted.
developed to where the floating point multiply is hardly more The specialization to characters as a basic type (as opposed
expensive than floating point add. This requirement on fast arith- to long words) was already present in the IBM 702 but did not
metic, however, has really been directed at the logical design level, have its effect until 5 years later with the development of the IBM
not at the ISP or PMS level. Thus, the main effect a t the ISP is 1401 (Chap. 18). The latter machine was adapted to business, both
the adoption of long word lengths, floating point data-types (in in being character-based and in being small enough so that small
addition to integers), and an extensive repertoire of arithmetic businesses could afford it. It was extremely successful (many thou-
operations in the ISP. The main PMS effect is the emphasis on sands were produced) and certainly represents a successful func-
the classic “statistical clerk” PMS design. ‘Such feelings are still extant, but we are concerned here not with the
The press for increased arithmetic processing has led in recent validity of the feelings but with what they led to at a particular period
times to the development of various forms of Pc concurrency, as of computer development.
48 Part 1 I The structure of computers
tional specialization for business. However, it is interesting that necessarily required. This in part reflects the fact that control
the specialization has not been maintained, for the IBM Sys- computers may retain their programs over their whole lifetime,
tem/360 (Chaps. 43 and 44) is again a single machine, although so that programming and reprogramming is less important. (It is
it has in essence two internal ISP’s, one centered around characters not absent, however, and so this is not a very strong functional
and the other around floating point data-types, that is, a business adaptation.)
and a scientific specialization residing side by side.l
Communication. The functional specialization of communication
Control. The third functional value is a computer used for control could be taken as a subfunction of a control computer. The function
in real time. Examples are process-control computers, aerospace is mainly to behave as a switch. In a message-switching application
computers, and laboratory instrument-control computers. The role the computer transfers messages from terminals (and links) into
of the computer is to act as a sophisticated control (K) in some primary (and sometimes secondary) memories and then transfers
larger physical process, and thus it plays a subordinate role. Their them to other terminals (and links). In message switching, messages
relatively late arrival was due to the high cost and unreliability are first stored and then forwarded. The computer in a telephone
of early computers, as well as to the lack of necessary interface exchange functions as a very sophisticated switch control. Here
equipment. the computer reads the off-the-hook signal, detects the dialed
The functional specialization is seen most strongly in the word numbers, rings the dialed parties, and finally sets the switches to
size, which reflects the appropriate numerical data-type. The connect the telephones together. In some instances, when it an-
numbers used in control processes are generated by physical de- swers information inquiries about new telephone numbers or re-
vices and are rarely better than 0.1 percent accurate. Since elab- routes calls to other phones, it functions as a memory. Thus a
orate arithmetic calculations are not called for, the numbers, and communications computer is functionally a switch or a control
hence the word size, can be around 12 bits. Most control com- for a switch.
puters have been 12 to 18bits/word. A second specialization, again The main distinction between control computers and commu-
reflecting appropriate data-types, is that all control computers are nications computers is that the task environment of the latter,
binary and have boolean operations. This arises because many of since it consists of digitally encoded messages (even in the case
the external conditions to be sensed and effected are binary in of the voice telephone exchange), can be handled directly by the
nature. communications computer. That is, the communications computer
About the only other functional specialization of control com- can do the work of transshipment and storage as well as control.
puters is the interrupt2 capability to allow them to respond to There are no pure examples of communications computers in
many potentially simultaneous external conditions in real time. this book. However, the Pio’s serve essentially the same function
This provides apparent parallelism, though still using a sequential within a single computer (Part 4,Sec. l),and they can profitably
processor. This is another possible example of functional speciali- be examined from this viewpoint.
zation leading to reunification rather than divergence, for it has
again been widely accepted that all general-purpose computers File Control. We list this as a separate specialization only because
must have good interrupt capabilities. However, in actuality, a number of computers have been built to do exactly this task.
interrupts, though not existing in early computers, were developed The specialization is easily described: It is a communication com-
to obtain good input/output facilities, not for control computers. puter with the messages being characters (since they are built for
Chapters 7 and 29 give examples of aerospace computers, and business), and with the large memory (the file) being considered
Chap. 33 describes the IBM 1800, which is specifically designed to be part of the system. There are no examples of file-control
for process control. As these examples show, a complex ISP is not computers in this book, but the early IBM 305 and UNIVAC file
computers serve this function. An IBM 1800 is used as the control
lThe story above has been told exclusively in terms of IBM machines. for a 1012-bit photo-optical memory, for example.
Although this does not distort the picture too strongly in terms of total
movements of the field, since IBM dominated the market, concurrent
Terminal. Since it is possible to obtain a separate computer system
developments were taking place throughout the field. UNIVAC I was the
first computer built by a manufacturer and did not have the idiosyncrasies whose only function is to run a display, we have listed this as a
we ascribe to IBM; on the other hand, the marketing effort for it was nil. separate functional specialization. In fact, it is better viewed (and
*Apparently introduced in the UNIVAC 1103. almost always occurs) as a component of a larger computer system,
Chapter 3 1 The computer space 49
i.e., as a special Pio. The DEC 338 is such a P.display and is such specifics. We want to know how well the computer system
described both later in this chapter and in detail in Chap. 25. performs, given some vague notion of the kind of task-programs
and data-that will be used with it. Although we know that we
Time-sharing. The requirement to have a large number of users cannot have adequate measures, we believe that there is something
in simultaneous conversational interaction with a single large that can be said about the performance-that tells us that a CDC
machine has bred a new specialization, that of the time-sharing 6600 is many times more powerful in actual performance than a
computer. All the computers described above can be time-shared PDP-8.
(even if they do not have interrupts or inherent multiprogram- An interesting way to look at the problem of specifying perform-
ming). However, the emphasis on this mode of operation with the ance is to play a simple game: We will give you a number, say
particular timing and flexibility requirements of human users doing 4. You are to give the best description of computer systems involv-
general computing at consoles in multiple software systems has ing only that many parameters (equivalently, dimensions or attri-
led to a number of innovations in design. The most important butes). That is, what is the best description of a computer that
is the virtual-memory techniques for achieving multiprogramming can be stated in four numbers? The game is easier to play if we
(described in Part 3, Sec. 6). There is also substantially increased speak of the dimensions, rather than the information content of
complexity of PMS structure to handle the integration of large the description (in bits, say).’ \lie have still not defined “best,”
files, swapping memories, and the huge software systems that seem of course. It can be taken to mean the best prediction of the
to be endemic to time-sharing systems. It is still too early to tell relative ordering of the computer system; better on the index
whether any of the design responses will produce permanent spe- means better on the same task.2
cialization or will again simply be the first instigation of design To start at the beginning, what single number would you give
features that will become universally used. to characterize a computer’s power? Such a question makes most
In summary, we see that there is functional specialization and people uncomfortable, since strong feelings exist for at least two
that it translates mostly into total size of the machine and into kinds of numbers, dealing with speed and memory, respectively.
the data-types available. Many of the other design aspects created If forced, we would probably settle for something related to proc-
in response to functional specialization have instead become the essing speed. The cycle time of the primary memory is a possibility
common property of all machines. because for simple machines it determines (limits) the operation
rate. It is a structural parameter, but that is no reason to avoid
it as a performance index. The average number of instructions per
Performance
second, or operations per second, is a better indicator. Since the
For a device that does a complex job, it is meaningless to ask for latter does not take into account the size of the word being proc-
a single precise index of performance. It is like asking for the essed, perhaps average bits processed per second is the best single
average speed of a given model of car over its lifetime without number. (We measure this number at the processor, and it may
specifying who will own it, where he will drive it, and what sort include both the instruction and data streams.)
of terrain he will encounter along the way. Notice that the diffi- To take an average we must adopt some weightings. The sim-
culty is as much in the complexity of the task environment as in plest scheme is simply to add all the instruction (or operation)
the complexity of the internal workings of the machine. Specify times and divide by their number. This is equivalent to weighting
everything about the environment, and the performance can often them equally, the rare ones and the common ones. If we want
be given in a single figure. It may be hard to determine, but at to do better than that we need some data. Several sets of relative
least it is well defined. If you know the terrain and road conditions frequencies, of instruction types, called “mixes,” have been used
perfectly and how the car was driven, then from the structure of in the literature. Table 2 gives four examples. The Gibson mix is
the car it is possible to figure out the instantaneous velocity and
from this to construct the average speed. ‘It is not fair, of course, to invent tricks to encode many conceptually
To put this in terms of computers, given a particular configura- independent dimensions into a single one, just to beat the limit. On the
tion for a computer system, given a particular program, and given other hand, composite dimensions, such as average operation time, are
perfectly acceptable.
a particular set of input data, it is possible to determine all aspects
2Definitionalprecision is not appropriate, since we are not attempting to
of the performance: how long it took, how much space was used, deal seriously with the technical questions of indices, only to illustrate the
whether it was correct, and so on. But we are not interested in issues.
50 Part 1 I The structure of computers
probably the best known. The best source for such data comes is restricted to programs coded in a procedure-oriented language,
from instruction counts of running programs. such as FORTRAN, where all computers accept FORTRAN.
Knight takes the view (Fig. 3) that a single number can be used Nevertheless, although it has often been done to compare two
to indicate power, and his formula has been evaluated for some systems, only occasionally has it been done for even a modest
300 computers [Knight, 19661. His formula is the product of number. We feel that for a general-purpose computer the com-
three factors: processing time, memory size (in words), and word piler-derived bench mark is a reasonable single-performance
length. The formula was derived (roughly) to measure power so number. Much actual use will be with the compiler, and good
that technological change could be modeled. Applying the formula compilers produce code to rival hand coding, so that special fea-
is like measuring automotive-vehicle power as a product of speed, tures of the machine are utilized. Cox [1968] compares several,
weight, and the number of wheels. (Such an indicator is roughly using hand coding and compilers for several tasks.
proportional to a car’s momentum.) Thus, although it is a reason- There is a difficulty with the bench-mark scheme that is inher-
able single-number indication for power, a computer buyer could ent in its strongest advantage, that of doing a total problem and
not use it directly. thus integrating all features of the computer. The number obtained
Taking averages, as in the case of mixes, suggests a more sophis- depends not only on the type of computer, for example, an IBM
ticated approach. A collection of programs, called a “bench mark,” 704, but on the exact configuration, for example, 16 kwords of Mp
is developed that does a variety of different tasks. Then the one versus 32 kwords, and even on the operating system and the soft-
number is the time it takes to do this collection. Such a bench ware (which version of FORTRAN). Thus, although the number
mark generates its own frequencies of occurrence of the primitive perhaps comes closest to an adequate single-performance figure,
instructions. It brings in a number of additional dimensions that it becomes much less of a parameter characterizing the structure
affect performance: the instruction code, the size of Mp, pro- of the computer than one characterizing a contingent total system.
gramming skill, input/output devices, etc. It also carries with it Let us underscore again the distinction between the computer
an implicit frequency of different kinds of task demands (how type and the particular configuration (possibly including basic
much of the set involves compiling, how much number crunching, software) assembled in a particular installation. Computer systems
how much I/O, etc.). are designed with certain forms of variability. To s p e c i ~a CDC
There are severe practical problems in carrying out such meas- 1604 is to specify many things, such as the ISP of the Pc, the cycle
urements on many computers, since the problems must be coded time of Mp, the K’s used to control secondary memories (Ms), and
and run on all the systems. It is somewhat easier if the task set interfaces to the external world. But it leaves open many other
Chapter 3 I The computer space 51
things, e.%., the types and sizes of Ms and the size of Mp. On If we had bench marks, which are themselves only approximations
some computers it can even leave open part of the ISP (e.g., at measuring performance, we might look at how well the param-
the multiply/divide options on many small machines), or the speed eters in Table 3 predict the bench marks. But there remain the
of the Pc and Mp (e.g., in the IBM System/360). difficulties of how to take into account the additional aspects of
When we ask questions about computer systems, we should be the total system (e.g., compiler efficiency) that are implied in the
clear whether we are talking about a computer “type,” such as bench mark. Alternatively, one might want to construct a mixed
CDC 1604, or whether we are talking about a particular installa- description of bench-mark numbers and measurements of the kind
tion, with all the variability specified. It is possible to describe in Table 3. Then the relationship between bench marks and these
either with PMS and ISP, provided we recognize that the diagrams other measurements would become an indirect measure of the
for the types represent maximal possibilities for assembling par- efficiency of the rest of the system.
ticular systems. This is how almost all the PMS and ISP diagrams We have discussed performance in a crude and cavalier way,
in this book were prepared. From the point of view of our “number but this accurately reflects the state of the art. There are no precise
game,” if we are talking about computer types, we might prefer measures for performance. There are precise structure and per-
numbers that do not depend on the particular configuration. formance measures of individual components (e.g., memory size,
If two numbers were available for describing performance, and speed and word length, and processor instruction times). When
what would they be? Clearly there are several directions to go. designers (and users) are faced with obtaining a certain total
One could fractionate the bench mark, so that one has a bench performance for a given cost, the only method is that of the bench
mark for arithmetic-rich tasks and a bench mark for others (a mark, because the task is such a significant variable. If performance
composite of compiling and data processing). One could decom- is to be increased, unless the task is sufficiently trivial, it is difficult
pose the processing rate into, say, operations per second and word to predict what effect changing even the most direct structural
size (from which bits per second can be recaptured approximately). variables will have (e.g., memory speed).
Alternatively, one could retain only a single number for processing
rate and add a measure of the memory available, e.g., size of Mp
(in bits). Of the three we would choose the latter, especially if Structure
we were talking about a particular installation rather than com- We now turn from function and performance, which provide
puter types, for which Mp size remains variable. design constraints and objectives, to the dimensions of structure,
We can continue this game through several numbers. Table 3 which provide the space in which the design is actually cast. A
shows some of our choices. Various parameters drop out or change structural dimension is one in which the designer can attain any
only when they are decomposed into other parameters from which of the values along the dimension by relatively direct means. Thus
they can be recovered. Thus, initially Mp must be measured into a machine is completely specified by listing all its values along
bits, but when the word size is given, Mp is more reasonably the structural dimensions. From this, the system’s function and
measured in words. One of the reasons for exposing such a list its performance within that function can be determined.
is to emphasize its judgmental and approximate character. There What dimensions should be selected for structure? The view-
is as yet no way to validate such proposals for brief descriptions. point is distinctly different from that of performance, where one
Number of
parameters
allowed: 1
Parameters: Pc(i.rate:(b/s)), -
2
Mp(size:(b))-
3 4
Pc(operation-rate:(op/s))-+
Pc(i.width(b))
5
>
Ms(size:(b)) Mp(i .(words))
-Ms(i.(words)) BT>
Chapter 3 I The computer space 53
averages and combines many features to summarize effective out- possible. For instance, the Rice University computer uses vacuum
put. This tends to obscure structure. For structure, one wants tubes, transistors, and integrated-circuit logic. But such complexi-
maximally independent aspects which are easily obtained if se- ties are rare; time and good engineering practice work against
lected as a design choice. For example, if the computer designer it. If it were necessary to consider such cases, then additional
had only a single dimension to describe a computer, he would dimensions (e.g., for secondary and tertiary logic) could be added,
undoubtedly select the logic technology used in the Pc and Ks. or several points in the space for a given computer could be
This tells him a good deal about many aspects of the computer's used.
structure. In fact, the technology and the average bits processed The computer-structure space is thus our choice of the seven
per second by the Pc are correlated, and so each can be used to most important dimensions. It is our response, so to speak, to
predict the other, though only imperfectly. If one is interested playing the number game, given only seven descriptors. They are
in performance, effective bits per second is preferred; if one is arranged in order of importance, although clearly no simple way
interested in design, technology is preferred. exists to validate such an order. But, if we were to have only three
The computer space in Table 1presents our choice of the major attributes to describe the structure of a computer system, we
structure dimensions. There is even less means to validate the would pick logic technology, word size, and PMS structure (i.e.,
choice of dimensions here than there is for performance. Never- what processors exist with what functions).
theless, there are a few hallmarks. Perhaps the most important At this point we are ready to proceed through the space, de-
is redundancy (the opposite side of the coin from independence, scribing the various dimensions and discussing how the computer
mentioned above). Several dimensions of structure may covary, systems in this book illustrate various points along them. We take
so that giving any one of them is tantamount to giving the others. up each major dimension separately. A few of the correlated
This covariation need not come from physical dependence; it may dimensions are accorded separate sections, but most are discussed
arise from the nature of an appropriate design and good engineer- along with the main dimension.
ing practice. Such a cluster of covarying dimensions is likely to
indicate an important dimension (which one among the correlates Technology
is to be used is a secondary matter). Table 1 is organized in terms Computers are constrained by the physical technology from which
of such clusters, with one of each selected as the main representa- they are constructed. It is not just that new technologies provide
tive and placed at the left. greater speed, size, and reliability at less cost, although of course
A second hallmark derives from the hierarchical nature of they do that. But technologies dictate the kinds of structures that
computer systems. Generally a description of a system consists of can be considered and thus come to shape our whole view of what
the union of the description of its parts, plus a description of the a computer is. For instance, the emergence of the PMS system
interconnections. This is the basic style of PMS, for example. But level is due to advances in technology. Prior to transistor technol-
there are a few features that affect the total system, Le., affect ogy, it did not make sense to think of elaborate PMS structures.
many components. These are usually rather important. Technology The costs of the various parts were too high and the reliabilities
is a prime example. were too low. When, occasionally, such a machine was in fact
Yet a third clue is that the dimensions discriminate the actual designed, it invariably proved too far ahead of its time to succeed.
population of computers. If all machines had single-address in- An example in this book might be the RW-40, described in 1960
structions, for instance, there would be no sense in using number (Chap. 38). A more classic example is the Analytic Engine of
of addresses per instruction as a dimension. Any computer engineer Babbage, which he designed in 1844 and was never able to com-
who had studied machines a t all would know this to be true of p1ete.l The technology of the time was entirely mechanical, and
all computers. Thus one looks for dimensions that spread the its crude state accounts for a large share of the failure. Thus the
machines out evenly into a substantial number of categories. technology is by all odds the most important single attribute to
If the dimensions of the space are known, a computer is sup- know about the computer system.
posed to be defined by a single point. For most existing computers Many technologies go into making up a computer. Each type
this is actually the case. However, if a computer system were of component typically uses a different one. In current (so-called
complicated enough, say consisting of several processors, each built
'Thus, the first real digital computer established the precedent of failing
with different technologies and having a different number of ad- by a large margin to meet the expected dates of completion and full
dresses per instruction, then such a representation would not be operation.
54 Part 1 I The structure of computers
third-generation) machines the Pc may use hybrid- and inte- cially when technological costs are of interest rather than market
grated-circuit technology for its logic, thin-film technology for the costs (which reflect numerous other factors). Nevertheless the
Pc generalized registers, core technology for the Mp, electro- effect of technology on costs has been so striking (while simulta-
mechanical technology for tapes and disks (with integrated circuits neously pushing up performance along all other dimensions) that
for logic), mechanical technology for card punches and type- it seemed necessary to give a measure of cost in Table 1, no matter
writers, and even manual technology for mounting tapes and disk how crude.
packs. The existence of all these technologies poses major issues We have indicated only a few of the dimensions that are corre-
of systems balance, issues which are only imperfectly resolved. For lated with technology. In fact, the only dimensions in Table 1 that
example, it remains true in the current generation that input/ are independent of technology are the word length and the Pc
output is not in balance with the internal structures. This is due addresses/instruction. All the rest show dependence on technol-
to the crude state of terminal technology, so that it appears to ogy. For some, such as memory speed and size, there is a direct
cost too much to provide an appropriate solution.’ correlation. For others, such as PMS structure and Pc concurrency,
The heterogeneity of technologies is not a consequence of the development of more complex versions-the leading edge, so
cost/benefit analysis; rather, each represents the forefront tech- to speak-depends on technology, but there is free use of all
nology for the type of device shown. (There is, of course, cost/ versions that are in existence at any given time. There are still
performance exchange for any component, but this is usually other dimensions of importance, not shown in Table 1, that have
within a technology.) Thus there is a sense in which the leading also changed with technology, e.g., electric-power consumption.
technology can be used to represent them all. This is the technol- One way to see both what varies and what is independent of
ogy used for the logic level and is the one listed in the computer technology is to compare selected machines. For instance, Whirl-
space. If it is known that transistor logic is used in the Pc of wind (Chap. 6), a first-generation system, and the IBM 1800 (Chap.
a computer, it is a safe prediction that Ms is electromechanical, 33), a third-generation system, have reasonably similar ISP descrip-
Mp is core, Tio is electromechanical printers and punches, etc. tions, if one ignores index registers, which were not invented at the
This reflects the fact that technology develops and hence be- time of Whirlwinds design. However, they have very different
comes locked with calendar time. Thus a prediction is from PMS structures. In Whirlwind, the early system, transferred infor-
logic technology to date and then to all other things known to mation between Tio’s and Ms was under program control of the
be current at that date. Pc. The existing Pc registers and transfer gates were used because
This correlation of date with technology is given in the com- it was too expensive to have separate ones. In the 1800, which
puter space along with the generation. It can also be seen in the uses hybrid circuits, it is economical to have additional subsystems
time chart. The correspondences must be taken as very rough only. devoted to special functions; hence there are many Pio’s operating
The technologies are listed in increasing power (and decreasing independently of the main Pc. It was not cost alone that limited
cost). The dates run in exactly the same order. The one exception the complexity of first-generation vacuum-tube systems. The large
is fluidics, which has been introduced very recently and is a special physical size of tubes introduced substantial transmission delays;
technology for ruggedness, reliability, and direct external coupling their large power consumption added dependency on a cooling
in certain control systems. (Small fluidic computers are at the early system; and their limited life and deteriorating nature constrained
prototype stage.) the number of tubes that could be used in a system requiring high
Alongside the technology dimension we list the dimensions: reliability.
Pc speed (operations per second), and cost (dollars per million op- The IBM 700 scientific series (701, 704, 709, 7090, 7040, 7044,
erations), all of which vary directly (or inversely) with logic tech- 7094 I and 11) offers another comparison, where there is an evolv-
nology. In general, costs are extremely difficult to determine, espe- ing structure over time, hence across technologies, but where for
reasons of compatibility the ISP’s have remained almost constant
(except for the 701). Again we see radical increases both inperform-
Although beside the point of the current discussion, one reason why these ance (Pc speed increases by a factor of 5 from the 701 to the 704
imbalances appear to be “permanent” is that the time constant for change
and another 10 to the 7094 11) and PMS complexity. But various
in the technology is of the same order as the time constant for human beings
(i.e., systems analysts, programmers, and users) to understand the imbal- other features, though not affecting compatibility, were locked in
ance. Before system imbalance is diagnosed and solved, the terms of the with the ISP and remained fairly constant. For example, Mp size
problem change, inducing new imbalances. went to 32 kw (kilowords) early in the series with the 704; and
Chapter 3 I The computer space 55
it took a jerry-rigged modification to get 64 kw on a 7094 toward digits (4 bits), the halfword, and the double word. A number of
the end of the lifetime of the series (see Chap. 41, page 517). features of the design are related to this hierarchical organization
Throughout this section we have referred to technology as the of data. Before we consider them, we need to characterize the
dominant factor in the computer. Does this mean that computer organization itself. One characteristic of this organization, the
development waits upon new fundamental windfalls? We have word length (in bits), gives most of the information, the rest of
been lucky in getting the transistor and, to a lesser degree, the the hierarchy adding only a little.
integrated circuit from external efforts. However, core memories Let us see why this is so. At the bottom there is the bit, encoded
were invented for the computer and resulted because of need. in two-state devices. Although other numbers of states are possible,
Read-only memories have also resulted both from development and ternary (three-state) machines have been proposed occasion-
at the circuit level and from pressure above, requiring the mem- ally, digital technology has developed exclusively to handle binary
ories to be developed. All the electromechanical secondary mem- information. There are several reasons for this. The first is the
ories (Le., magnetic tape, drums, disks, and photostores) have requirement for high reliability and high signal-to-noise ratios in
resulted from the computer's needs. Thus, although technology the basic devices. Generally a basic n-state device (that is, one
is dominant, the computer often forces the development. not built up from other k-state devices) is realized by breaking
The Pc operation rate is strongly correlated with logic tech- a continuous physical dimension, such as voltage, current, or
nology, as we have indicated in the computer space. Our discussion magnetic flux, into n discrete levels or regions. Reliability and
about technology and generations is also about operation rate. The signal-to-noise ratio then depend on keeping adequate separation.
principal reason for the higher operation rate is because of faster This is easiest to do with two states (e.g., in the limit they become
logic technology. Technology also has a secondary effect on in- on-off devices) and becomes progressively more difficult as n in-
creasing speed. More reliable devices allow large computers to creases. The second reason is the simplicity of the logical design
be built. Smaller devices allow higher device densities, thus de- for binary representations. A basic device for combining two
creasing stray capacitance and inductance and shortening trans- ternary digits must deal with 3 x 3 = 9 configurations, rather than
mission delays. Smaller components also allow increased inter- 2 x 2 = 4 configurations for the binary case. This also gets worse
connection density. as n increases.
Operation rate is also relatively highly correlated with total A final reason-the coup de grace, so to speak-is that no one
performance. If we hold the structure and concurrency constant, has ever found striking advantages for the resulting processing
the simplest way to increase performance is by increasing the clock structure in having more than two states. Thus there are no com-
rate. The increase in the performance/cost ratio over the past two pelling reasons to suffer the first two disadvantages. In short, what
decades of computer evolution has made their primary gains might have been an important dimension on which to distinguish
through higher operation rates. The two 16-bit computers already computers, namely, the number of states in the basic encoding,
mentioned, Whirlwind (Chap. 6) and the IBM 1800 (Chap. 33), turns out instead to be one of the great uniformities in digital
provide a nice comparison of the evolution. With a difference of technology.
10 years and two generations, their cost ratio is -1O:l whereas
performance is -1:5 and the internal clock rates are also -1:5.l Information base. That the physical devices deal ultimately in bits
does not imply that the information processing must be organized
in terms of bits. It is possible to select an arbitrary base (one with
Znformation structure: word length, information base, any number of states) and construct the entire ISP in its terms.
and data-types
A base unit is represented physically, of course, as a set of bits.
All computers structure their information in a hierarchy of units, If one wanted a base 13 machine, for example, one would have
which we defined as an i-unit in Chap. 2. For example, the IBM to use at least 4 bits (with 16 states) to encode it. But no operations
System/360 starts with the bit; then the byte, which is 8 bits; then at the ISP level would refer to anything but base units and data
the word, which is 4 bytes; then the record, which is a variable structures built up from sets of base units, and there would be
number of words. In between, playing minor roles, are decimal no way to manipulate directly the bits that represented the base.
'However, it is not as dramatic an example as we could find. By picking Thus, using a base other than binary obtains whatever advantages
a better third-generation example we might get a cost ratio of -1OO:l and might accrue to n-state units, without any of the disadvantages
a performance ratio of -1:lO. at the device level.
56 Part 1 1 The structure of computers
Computers have been built with a variety of different bases, Once these basic features are set, others follow. An integer
the main ones being binary, decimal, and character. The character number of any smaller units, such as the character, should fit into
has shifted between a 6-bit character and an 8-bit character a word, since otherwise a set of words will not provide a homoge-
(byte).’ The arguments for bases other than binary (which repre- neous sequence of subunits. (That is, only five 6-bit characters fit
sents the natural base of the computer) all hinge on the alphabets into 32 bits, so that a set of 32-bit words filled with 6-bit characters
used externally by human beings and the desire to avoid conver- has a number of 2-bit holes in it. This can complicate algorithms
sions into a different representation inside the computer. With that deal with long character strings.) The constraint of compati-
universal acceptance of higher languages, such as FORTRAN and bility is not so strong with Ms, since speeds are slow enough to
ALGOL, this argument has also lost much of its force. In fact, permit conversion algorithms (either hardware or software). Still,
all third-generation machines are binary. Nevertheless, in the fifties the system is simpler (and therefore usually will work better) if
there was much controversy over which base to use, and the incommensurabilities of information units do not exist. Thus, to
machines presented in this book exhibit all three bases. pick an example, the number of parallel tracks on magnetic tapes
There is little difference between binary and decimal com- tends to divide evenly into the word length. IBM tapes for the
puters in their ISP organization. However, there is a great differ- 700 series of 36-bit machines have six data tracks; for the Sys-
ence between these two and character machines. The latter are tem/360, which has a 32-bit word, the tapes have eight data tracks.
designed for handling text and are constructed to deal with varia- There is an interesting correlation between the word length
ble-length strings of characters. Correspondingly, they deempha- of a computer and the number of data-types that it makes availa-
size numerical computation. Both these decisions affect the ISP ble. As we saw in Chap. 2, the operations in a computer can be
considerably. Thus, in the computer space we indicate the base classified according to the type of data they operate upon. Each
dimension along with the word-length dimension. The two to- data type tends to have a certain set of operations appropriate
gether make up a single dimension. +
to it (for example, , -, X, and / for numbers) and the decision
to include a data-type carries with it the decision to include
Word length. Let us now examine the role of word length. The its operations, Thus the number of operations tends to grow with
word is the first major information unit above the base. It is defined
the number of data-types. The total amount of hardware in a
as n bits for a binary computer or n digits for a decimal computer
computer grows as the word size (because data paths are word-
(character machines being excluded as not having a fixed word
parallel2) and also as the number of operations. Thus machines
length). Sometimes there are intermediate units, but they always
with large word size tend to be large machines and have many
play a minor role and we can disregard them at this stage. As we
data-types and many operations, (“Large” as an adjective for
noted earlier, the main determinant of word length has been the
machines invariably means big and expensive, hence-given eco-
function of the total system: large word lengths for arithmetic
nomics-capable of doing large amounts of processing.)
systems, small word lengths for control systems (and character
There are two additional, somewhat independent, features that
strings for business). Thus, only within narrow limits is the word
support the relationship between word size, number of data-types,
length a free design choice.
and size of computer. First, with a large system there will already
However, the interesting thing about word length is not so
be available many of the pieces necessary to add additional oper-
much its determinant as the way it affects other aspects of the
ations. That is, the marginal cost of a new operation goes down
total system design. This starts with a design decision that the
as the system grows. Therefore, given a large system, there is a
unit of information transfer between components will be a word.
tendency to add more operations, The number of operations per
As soon as this becomes the case, then registers in various com-
data-type is not easy to increase; rather, one adds new data-types.
ponents must hold a word, since that is what arrives or is to be
Second, with small word lengths, one cannot define many worth-
transmitted. Thus the word becomes the information unit of the
while data-types that will fit into a word, and multiple-word data-
Mp, and most of the registers of the Pc hold one word. The instruc-
types are left to the programmer to define with software. With
tion is designed to fit into one word, since that is the number
large word lengths there are many different worthwhile data-types
of bits that is obtained “at once” and hence can be used to effect
that fit into the word, for instance, decompositions of the word
the next time increment of processing.
into partial words, or into character strings. Each of these requires
‘Seven bits have been proposed for communication purposes but have never
been made the basis of a machine, as far as we know. The issue of bit-serial versus bit-parallel is discussed subsequently.
Chapter 3 I The computer space 57
additional operations, since the initial data-types involve the entire for their definitions.) To be located at a point on this dimension
word or some large part of it (i.e., the word, address, and integer (say at floating point) means to have all the data types below it
operations). on the dimension, (i.e., word, address, integer, boolean.) Occa-
In sum, the word length stands as an indicator of many aspects sionally machines which violate this have arisen. Decimal ma-
of the machine. It not only tells something about the basic organi- chines do not generally have boolean data-types, and there has
zation of many components but indicates how big the computer been some attempt at machines with only floating point, i.e.,
is, both in number of data-types and number of operations. Figure without a separate integer type (e.g., the CDC G202).
2 shows time lines of well-known computers with their word The reason behind this cumulation of data-types in a fixed order
length, with a special time line for the ones in this book. Five is that certain general tasks must be performed by any computer.
groups are suggested in the figure which classify these c0mputers.l It must transmit data between the Pc and Mp, and this trans-
The classes overlap, and to separate a computer into one of two mission has nothing to do with the meaning or content of the data;
classes requires more knowledge (e.g., the number of data-types). thus there is always the “unit of transmission,” which is the word
For example, the 24-bit SDS 9300 and CDC 3200 appear in the (except on character machines). Next, all computers manipulate
same class with the 36-bit IBM 7090 just because both machines addresses to achieve generality (e.g., to compile), providing for a
have floating point hardware and, in fact, perform comparably for second data-type. Next come integers, since almost all algorithms
arithmetic tasks. make use of arithmetic (this could conceivably be absent in some
The one design choice that makes word length have few of the communications computers), and on up to floating point numbers,
consequences just described is making a computer bit-serial rather multiple precision, and vector and string operations. At each stage
than bit-parallel. In many machines information transfers are con- the uses are more specialized so that lower ones cannot be elimi-
ducted on a single bit stream (especially Pc-Mp transfers). Coinci- nated, except for a few cases such as handling addresses as regular
dent with this is the construction of operations on a bit-by-bit integers.
basis. This works well for arithmetic and logical operations. Time
Addresses per instruction and processor state
is traded for hardware. The cost of the system becomes independ-
ent of word length, but the processing rates go down correspond- The number of addresses in an instruction has been a traditional
ingly. This design decision was an extremely important one when way of describing processors (i.e., their ISP’s) and hence the com-
logic was expensive and unreliable. It has become less so in the puter systems containing these processor^.^ We use it in Parts 2
current era, where processors and transfer paths are relatively few and 3 to separate the different processors.
in number while both the cost and the reliability of components Originally the dimension was simple: one-, two-, three-, and
have improved. However, as large parallel processors are con- four-address machines were constructed. It has become somewhat
sidered (- lo3 P’s), bit-serial processors again become a serious more complex. A “one plus one” machine has one address for data
design alternative. (See the serial computers of Part 3, Sec. 2.) and one for determining the next instruction, and is to be distin-
In summary, word length is an important dimension, and we guished from a two-address machine, which uses both addresses
find many characteristics either proportional to or inversely pro- for data. Index registers and so-called general registers provide
portional to it. To be sure, these relations hold only for current instruction schemes which lie somewhere between one- and two-
design practice, as we have seen with the bit-serial designs. The address organizations. When processors admit several instruction
main-line computers in Part 2 are ordered according to increasing formats or variable-length instructions, matters become even more
word length. complicated.
A correlated dimension in the computer space is the amount
Data-types. We have presented the number of data-types as being of processor state, that is, the number of bits that exist in the
correlated with word length and also with computer size through processor, as described in the ISP. This is the amount of informa-
the effect on number of operations. Although far from perfect, tion that can be held at the end of one instruction to provide the
there is a rough order in which specific data-types are included processing context for the next instruction. It consists of a number
in a computer. We have listed the main types in such an order of status and mode bits (in modern machines packaged into regis-
in the data-type dimension of the computer space. (See Chap. 2
Originally the Bendix G-20.
3Although used mostly to describe Pc’s, the description applies to any
‘The class number is essentially [log,(Mp word length) - 21. processor.
58 Part 1 I The structure of computers
ters, but in earlier machines simply scattered around in the proc- organization is the extra time to store in Mp results that need only
essor), the next instruction address, the accumulator and other temporary storage. Thus, also, index registers and general registers
arithmetic registers, the index registers, and other general registers almost always imply increased processor state, although they need
making up a “scratch-pad’ memory. It is a simpler descriptor of not do so logically (that is, the registers could exist in Mp and
the ISP than addresses per instruction, since it is independent of still have their effect on the instruction format).
the number and variety of instruction formats. It is easy to define With interrupts and multiprogramming the processor state
processor state generally for any ISP, but difficult to define ad- gains additional significance, since it is the amount of information
dresses per instruction. that has to be saved and restored when switching programs.
The processor state is not the total number of bits in the proc- For example, in the Honeywell H-800, an early three-address
essor, since there may be registers in the physical system that are computer, the processor state per program consisted only of the
used within the interpretation of one instruction but which carry program counter and index registers, and when io-halts occurred
no information between instructions. Address registers for obtain- during processing, the Pc was switched immediately to another
ing operands from Mp are the most common such “underground” program. Eight programs could run concurrently (by having a total
or “temporary” registers, but there can be others. We implied this processor state of 64 program registers). In present computers with
distinction by defining processor state in terms of the ISP rather general-register state, often 25- 100 words must be stored, which
than the physical processor. implies an appreciable time for switching contexts.
The correlation between the processor state and the number We can now consider briefly the different organizations accord-
of addresses per instruction is not simple, since it rests on two ing to addresses per instruction. To show the common similarities,
separate issues. For the first, note that larger programs perform we give in Fig. 4 a state diagram that can be used for all processors.
transformations on the state of Mp (or even Ms or Tio’s) and are In common is the basic idea of the stored program: Fetch an
not concerned with the state of the processor. Processor state instruction, determine what the instruction is to do, then execute
enters only because, in decomposing the total algorithm into a it (the fetch-execute cycle). Other than this, only a part of the
series of small steps, it is not possible (or efficient) to make each state diagram will be applicable to a given processor type.
step a transformation from Mp to Mp. Basically, this happens As shown in the computer space, the addresses-per-instruction
because the instruction does not hold enough information to spec- dimension starts with zero addresses, then one address, then one
ify the Mp-to-Mp transformations. For example, if one wants to plus indexing, one plus general registers, and on u p to two, three,
add two numbers, two operands are required, and an instruction and variable addresses. However, from an expository viewpoint
must contain at least two addresses; if it does not, then an inter- one should follow a different course, starting with single-address
mediate state (i.e., processor state) must be created to hold the machines, then indexing, then two- and three-address machines,
information while the additional instructions are fetched. Thus, then general registers, and finally the zero-address and variable-
one-address organizations require the most processor state, with address organizations. This not only puts the more common
less for two- and three-address organizations, This consideration organizations first but makes it easy to relate the organizations
stops at three (two operands and a result) because only a few to each other.
elementary operations are more than binary. The processor state
cannot be eliminated entirely, however, since there must be at P(l address) and P(l + index address). These Pc’s constitute most
least an instruction address (a program register) to maintain con- first-, second-, and simple third-generation computers. The earliest
tinuity of the program. outline of the structure was the IAS computer (Chap. 4), which
~ The second source of correlation between processor state and has come to be known as the von Neumann computer. Although
instructions per address comes from differential access time to fundamentally like the IAS computer, EDSAC’s adaptation ap-
processor registers and to Mp. As long as there is an appreciable pears to be the closest prototype to this class. Although EDSAC
differential, substantial gain, processing power can be obtained is not described, it influenced M.I.T.’s Whirlwind I significantly
from increasing processor state. This derives, again, from the struc- (Chap. 6).
ture of algorithms which generate intermediate results that are A significant change to the IAS machine was the addition of
used almost immediately afterward and then are of no further the index register (called B-tubes) in the Manchester University
interest. Rapid temporary storage and retrieval are beneficial machine in the early 1950s. The evolution can be seen by compar-
under these conditions. Thus, working against higher address ing the first and third generations using Whirlwind (Chap. 6) and
Chapter 3 I The computer space 59
operand
fetch fetch store
(read) (write)
lav. r I (av. w)
4 \
PC2
operation opera1
specified address
CaIcuIa t ion
q (0) (0v.w)
Return for s t r i n g
or v e c t o r d a t a
fetch next instructiok
'Mp controlled s t a t e
'PC controlled s t a t e
Note: Any s t a t e may be null
the IBM 1800 (Chap. 33) or looking at the IBM 701-7094 evolution For the 1 address processor, the processor state (Mps) typically
in Part 6, Sec. 1. Index registers are motivated by the frequent consists of the program counter (instruction location counter), an
occurrence, in 1 address systems, of circuitous address calcula- Accumulator/AC, a Multiplier-Quotient register/MQ (the exten-
tions that involve first computing the address (e.g., the index of sion of AC), and one or more Index registers/X/XR.
an array in Mp) and then planting it just ahead in the instruc- With only one address in the instruction, the one arithmetic
tion stream in order to make use of it as an address. Providing register, A, must be used for temporary results. Thus an effective-
a set of index registers introduces a second address into the in- address integer (z) is computed as a function of the address part
struction, even though of extremely limited function. Thus we (v part) of the instruction (9)and the index registers. This process
classify processors with indexing as having (1 x) addresses + is typically
per instructi0n.l An alternative view of index registers suggests
that they double the number of data-types by allowing operations
z := v + X[j]
on vector data elements rather than just scalars. where X[j] is the jth index registers as specified in the instruction.
'Indirect addressing, on the other hand, does not add to the addresses per There are several forms for the transmission operators between
instruction; rather, it introduces a second operation per instruction. A and Mp.
60 Part 1 I The structure of computers
A t z loud immediate But simple operations on an X are also desirable; for example,
A MpExI
+ load direct
X t X + 1
A t Mp[Mp[x]] load indirect
M[x] c A store direct Here X is used to point to (access) the next element in a vector.
Mp[Mp[z]] t A store indirect More complex operations can be carried out by placing X in the
A register, via the program steps:
In indirect operations a convention may be required to determine
what address in Mp[z] is to be used. A t X load A with I<
Similarly, the binary operations (+, -, X , /, A, V, 0, con- A c f(A) manipulate A
catenation, etc.) are generally of the form’ X t A load X with A
AtAbMp[z] An operation to add k to X would then be
MP[ZI + MP[ZI
x +- Mp[zI
Mp[z] t u A which assumes no transmission paths between X and A. Ideally
we would like to perform any operation directly on X as simply
In both the above cases, exclusion of the operations that place
results in Mp[z] stems from the added cost of including the sym- X c X + k
metrical function and the marginal utility of such a function,
From this begins the idea that X should look like the main arith-
which stems from the result of applying u not being available for
metic register, A. This is, no doubt, one evolutionary path to
further processing.
general-register processors.
The transmission, unary, and binary operators account for al-
Part 2, Sec. 1 is devoted entirely to 1 address computers in
most all operations in these computers. If we allow A to stand
the first three generations. They were the “main line” of computer
for any part of the Mps, rather than just the accumulator, then
development.
the instructions not included above are input/output data trans-
mission, e.g.,
P ( 2 address) and P ( 3 address). The computers in Part 3, Sec. 1
MpcT and TcMp
have instructions which contain multiple addresses per instruc-
and conditional execution tion. The addresses (v) specify operands in Mp (Fig. 4). The Mps
decreases as the number of addresses per instruction increases,
(branch if zero AC) -+ ((AC = 0) -+ (P c z))
since the operands need not be held temporarily between instruc-
Having index registers requires operations to process them. At tions (Le., each instruction performs a complete operation).
a minimum they must be loaded and stored (usually from and to The instruction form for the 3 address computer is
Mp), Le.,
av.w (Fig. 4). MIDAC (Chap. 14) and Strela (Chap. 15) are typical instruction location be variable rather than the implicit next ad-
three-address computers. dress used for most processors. This is almost universal practice
A 2 address computer does not necessarily require more proc- in computers with Mp.cyclic (see LGP-30 in Chap. 16 for an
essor state than a 3 address computer, since the operations can exception).
correspond to +
Microprogrammed processors may use the 1address to locate
the next instruction, and there may be several such next addresses.
Microprogram subroutines tend to be short (intrinsic to interpret-
and ing an instruction set), and there are many jump addresses. The
increased speed from not having to compute the next instruction
address is worth the added space cost. The IBM System/360 Model
However, sometimes extra Mps is usual. The RW-400 (Chap. 30 (Chap. 32) shows the use of multiple (+1) addresses and if
38) has an accumulator, and operations generally terminate with classified according to our scheme would be at least a €'(micro-
results both in primary memory, Mp[v,], and in the accumulator. program; 3 + 1 address).
The branch on accumulator instructions allows results to be
checked directly without referring to Mp. An especially nice P(generaZ register). The general register processor has a small array
instruction in 2 address computers is the transmission instruction of registers that can be used for multiple functions. These have
(a special-case unary operation): Mp[v,] t Mp[vl]. fast access compared with the Mp, so that it pays to do as much
The IBM 1401 (Chap. 18) has two registers, L a d d r e s s and processing as possible within them. Since the general register array
B-address, which hold v1 and v2 and can be loaded by the v1 and is small, it requires only a small address (3 to 8 bits). Thus the
v2 parts of the instruction. These registers point to (address) oper- instruction format contains fields for one (or more) general regis-
ands and do not contain data. The remaining processor state is ters. There must still exist addressing for Mp, though this never
the Instruction-address. The 1401 has instructions with no exceeds a single address. Thus we classify general registers ma-
address parts, and these instructions take as operand addresses +
chines as (1 g) addresses per instruction.
the values of L a d d r e s s and B-address as of the previous in- +
The organization of a (1 g) system can vary from something
struction. The 1401 instruction-interpreter state diagram is given +
very close to a (1 x) organization, in which essentially every
in Chap. 18 (Fig. 3). The state-diagram specialization (Fig. 4) instruction involves some Mp information, to an organization in
is roughly: which the only Mp instructions are transfers between Mp and Mps
(the processor state holding the general registers), and there is a
oq, aq, 00 {ov.rl,av.r1,0v.r2,av.r2,0,0v.w2,av.w2}...
two- or three-address instruction set involving only Mps (see the
{ ov.rl,av.r1,0v.r2,av.r2,0,0v.w2,av.w2}
CDC 6600 in Chap. 39). That is, from a data point of view the
where the sequence delimited by the { . . . } is the operation on Mps acts like a directly addressable Mp.
a character; because the 1401 operates on variable-length strings, The processor state of a general register processor is invariably
it is repeated until the end of the string.
held entirely within the general register array (rather than having
additional independent registers). This is due in part to an already
P(n + 1 address). Processors with n + 1 addresses deviate only
available mechanism (the array) and in part to the need for pro-
slightly from the u-address processors above. The final, or +1,
gram switching, which is somewhat simplified by having all the
address explicitly specifies the address of the next instruction. As
Mps held in a single homogeneous memory.
such, it can be used with any instruction set. There are two reasons
The general registers typically perform a variety of functions:
why + 1 addressing is used. First, freedom is provided in the
placement of each instruction within the program address space. 1 Arithmetic registers (accumulator and the accumulator ex-
Second, the next instruction address can be calculated in parallel tension for the multiplier-quotient).
with the execution of the current instruction.
2 Index registers.
For computers with cyclic memories (Part 3, Sec. 2), the + 1
address allows both data and the next instruction to be specified 3 A second index register or base register; if the program
independently, providing the opportunity to arrange the program addresses (v) are short, a base register is needed to address
and data in an optimum fashion. Since each instruction completion any area of Mp.
time depends on the location of data, it is desirable that the next 4 Subroutine linkage registers.
62 Part 1 1 The structure of computers
7 Address pointers to data arrays and lists. g, g,, g,, g, are instruction parts specifying a general register, G
v, vl, v,, v3 are Mp addresses specified as a function of instruction and
8 Temporary data storage for intermediate results. +
general registers (for example, v : = (address G[g]) or v := (ad-
9 Temporary program storage for short program loops.
dress + G[g,] + G[g,]) in the IBM System/360).
The power of a general register processor is obtained because General registers can be thought of as an outgrowth (generali-
the registers can serve many functions. Thus the operations on zation) of the 1 + x processors, as we have already suggested.
these registers can be extensive, because the operations need not Alternatively, they can be thought of as evolving from a 2 or 3
be duplicated in other parts of the structure. For example, special address structure. The UNIVAC 1103A, a 2 address processor
operations for index registers are not necessary because the opera- (Chap. 13), was no doubt a forerunner of the general register
tions for integers apply universally to both the accumulator and UNIVAC 1107 and 1108. Pegasus (Chap. 9) is, we think, about the
index registers. Of course, such generality requires compromises. earliest computer to use general registers (1956). In Part 2, Sec.
The stack computer is faster for problems which can utilize stacks, 2 we discuss four general registers computers.
whereas the general register Pc must utilize Mp for the stack(s)
and does not have the encoding efficiencyof a pure stack processor P.stack (0 addresses per instruction). From a PMS viewpoint the
(see below). In addition, the assignment (and reassignment) of P.stack is built around having a first-in-last-out memory (Mstack)
general registers is most crucial, since they are a scarce resource as part of the processor state. Conceptually, it is built around the
with many uses. A general register organization allows processors fact that computations can often be sequenced so that no explicit
with a high degree of parallelism to be constructed, since several names (Le., addresses) are required for temporary results. All
instruction subsequences can be executed concurrently. operations are performed on the top of the stack. As each partial
The actual number of registers is rather critical and depends result is computed, it is pushed down in the stack and appears
not only on the algorithms of tasks coded but also on the technol- again to participate as an operand at exactly the appropriate point
ogy. In multiprogramming and interrupt computers, the program in later calculation. Thus the stack operates as an implicit memory
switching time increases with the number of registers. Thus the for all intermediate products and not only are transfers between
upper bound on the number of registers is both cost and program P and Mp avoided but space in the instruction for Mp addresses
switching time. is eliminated.
We would expect to find instructions which produced the fol- Instructions in such a system consist only of operations, since
lowing affects. all their operands are in the stack. Thus the instruction format
is that of zero addresses per instruction. There must, of course,
be some addressing of Mp (just as in a general-register organiza-
tion). However, the addresses for Mp themselves sit in the stack
Addresseslinstruction
so that the instruction contains only the transfer (load or store)
operation, not the address. There still must exist some way of
getting fresh data in the stack, and all P.stacks have at least one
operation that loads an address written in the program stream onto
the top of the stack.
Why there should be this happy correspondence between cal-
culations and memory to be performed and stack memories re-
quires a little explication. It rests fundamentally on the phrase
structuring of calculation in which each partial result is required
at one and only one point, so that each subcomputation can be
nested in the program (and hence its result nested in the stack)
Chapter 3 I The computer space 63
in the same order as it will occur as operand to the one operation hence how many additional words to obtain from Mp. (In a char-
that uses it. acter-based system this may require several reads per operand;
There are several arguments against a Pstack. Multiple stacks in a word-based system this may be one or two operands per read.)
are often required. Part of the power of a P.stack is derived from
having higher-speed Mps for the stack. Yet only the top few (2 8)
registers of the stack can be in Mps. When M.stack overflows into
- The gain in such a system is the higher average density of opera-
tions per instruction, bought at the price of extra Mp accesses.
Most such variable-address processors have a mixture of one,
Mp, the speed of operations can become much worse than not two, and three addresses per instruction-simply a mix of the types
having a stack at all. A simpler implementation, for example, already considered. The fundamental limit to such variability is
P.general,registers, is as fast and perhaps more general. Another the processor state (plus the additional within-instruction tempo-
difficulty with the stack is the inability to access other than the rary state). This, of physical necessity, must be finite, and the
top. If full addressing is provided, then the organization has be- number of addresses must yield an amount of information that is
come almost general register. Yet another difficulty arises from less than this total state. Otherwise the processor cannot hold onto
inhomogeneity of data-types, especially if several of them are it to process it.l Thus the various processors which claim to operate
packed into a single word (the width of the stack). Thus, for in- from a higher language (see the P.languages of Part 4, Sec. 4 ) must
stance, in one stack machine (the Burroughs B 5000 in Chap. 22) in fact either translate into another simpler programming lan-
there is a completely separate nonstack ISP for string manipula- guage, as does the FORTRAN machine (Chap. 31), or become an
tion. interpreter which processes a small amount of a language state-
A simple numerical computation is given in Table 4 as a com- ment before the rest.
parison of the P.stack, P . l address, and P.general,registers. Here,
the Pstack is probably shown at its best as there are no array- PMS structure
indices calculations or program-flow manipulations involving The idea that there is significant higher organization to computers
testing, etc. The criteria we measure are the algorithm encoding is relatively new. Texts on logical design of computers develop
space and the problem running time. a model based on an arithmetic section, input/output devices, a
The kinds of instructions interpreted by a P.stack are typically: memory for holding instructions and data, and a single control
to force the other components to interact. A PMS diagram of an
Interpreter state early model is given in Fig. 5 (X represents an external agent,
Operation sequence Example usually a man). The Whirlwind I manual-model figure (page 10)
used in Chap. 1 was rather highly developed because it had a
Load oq, aq, 00, ov.r, av.r M.stack-top t Mp[v] secondary memory and switching. Figure 6 is a PMS diagram
Store oq, aq, 00, ov.w, av.w Mp[v] t M.stack-top
which reflects this more accurate model. Often computer designers
Unary operation oq, aq. 00. o(u) M.stack-top t u M.stack-top
Binary operation O q , aq, 00,o(b) M.stack-top c M.stack-top b lump the devices at the periphery and call them all input/output;
M . stack-top- 1 these devices are both input/output terminals (T) and secondary
memories (Ms).
Variable numbers of addresses per instruction. Although there are
a few operations that require the specification of three or more 'If it processes a large amount of information, but in pieces (i.e., sequen-
tially in real time), it is not really executing a single instruction based on
addresses, these are of such low frequency that no machine has all the addresses but has decomposed the total computation, just as a
ever been built (or seriously proposed, for that matter) that has single address organization has.
more than three data addresses and one next-instruction address.
(Some of the microprogrammed processors have more than one
next-instruction address, and they often do several operations in
parallel in one instruction.)
However, there have been developed processors that can have
a variable number of operands. Most of these involve the use of
an instruction that is larger than a single Mp word. Thus, bringing
in the first word of an instruction, which contains the operation
code, determines how many additional operands are needed and Fig. 5. Early model of a stored program digital computer PMS diagram.
64 Part 1 I The structure of computers
Table 4 Comparison of stack, general registers, and accumulator Pc for evaluating the expression: f = (a - b)/(c - d x e)
Program size:
Address integer/ai 6 ai 6 ai + 8 ai(gr) 8 ai
Operation parts/o 40 70 80
Number of Mp refer-
ences for data:
Program size for 6 x (18 + 1) 6 x (18 + 6 + 42) 8 x (18 + 6)
hypothetical example 4x6 1 x (6 + 2 x 49
machines: 138 182 192
Program size in bits B850 13: 168 IBM System /360:208(above1) IBM 7090:288(above1)
among specific C’s: :224(actual) 360(actual)
+ base register overhead
-
(0 192)*
If we separate each component according to its function, assign To consider larger structures, consisting of several Mp’s, P’s,
control (K) to each element, and finally introduce the processor Ms’s, and T’s, one might think to expand the system as shown in
(P), we get the structure of Fig. 7 . Of course, a large part of P Fig. 9, in which we connect everything through a single switch.
is a data operator (D).The processor has the behavioral properties If the central S has sufficient power for multiple conversations,
attributed to the structure of Fig. 5. If we include the control this indeed provides maximum generality. However, although
within each component, we get Fig. 8 from Fig. 7.
Fig. 6. Early computer model (with Ms and S) PMS diagram. Fig. 7. General computer model (with distributed control) PMS diagram.
Chapter 3 1 The computer space 65
I K-Sfx
-€::-
rT-
Fig. 8. General computer model (without K) PMS diagram. Fig. 11. Tree-structured computer (1Pc) P M S diagram.
66 Part 1 1 The structure of computers
shared among a set of T’s and Ms’s. (That is, one purchases a single formance of a one-processor structure. In Part 6, Sec. 3, when we
magnetic-tape controller for, say, four magnetic tapes.) The shared discuss the IBM System/:360, we advocate multiprocessing.
K also explains why only one of a given class of devices (e.g., Today there is no parallel processing in the form suggested
magnetic tapes) can operate at a time. As technology changes in Chap. 37. We include a discussion of parallel processing on the
(especially costs), these separate K s may disappear. bet that it will come in the future. Part 5 is dedicated to moving
Nearly all the computers discussed in this book fit the lattice along the PMS structure dimension.
model of Fig. 10. However, it is not unlikely that structures will The simple 1 Pc structure shown in Fig. 11 is a tree. Although
be or have been built that do not conveniently fit it. For example, there are no values on the information rates, the nature of the
NOVA (Chap. 26) does not fit the model nicely, although the more fixed1 and time-multiplexed switches indicates that perhaps the top
complex ILLIAC IV arithmetic-computer portion (Chap. 27) does. two T’s, one Ms, and one of the bottom T’s can all be active at
The values along the PMS structure dimension of the computer a given time. In Fig. 12 a 1 Pc, 2 Pi0 computer is given. Here
space have been generated from the general model and laid out we note that the control of one secondary memory is by a Kio
in the order of their evolution. This evolution is strictly from less rather than the Pio. (The Kio cannot fetch its next instruction from
complex to more. The seemingly more complex network structures, Mp and must rely on Pc for control.) Note that there is necessarily
such as the duplexed computers, are not necessarily as complex a lattice connection between the 2 Mp and the Pc, 2 Pio, and
as a single multiprocessor computer. Duplex computers have been Kio. The special cases of P.displays multiprocessors, P(array I wired
used for some time. The slow evolution to the parallel processor algorithm), and parallel processing are all realized from the general
structure is due primarily to limitations in technology. A struc- model of Fig. 10.
tured computer with a distributed control is more expensive than
a tightly integrated design with shared function. In addition, Switching
multiprogramming-a question of software-must be present to A principal issue of a computer design at the PMS level is switch-
allow multiprocessing. ing (as we indicated in the preface). Unfortunately, we do not
The PMS structure plays only a minor role in obtaining multi- illuminate switching problems in this book except to provide
processing and parallel processing. The classical debate about examples. The switching dimension of the computer space is cor-
building large computers has always been resolved by building related with PMS structure, as we have just seen. To have a more
a single large processor (e.g., the CDC 6600 and Stretch, Chaps. complex structure, more complex intercommunication (switching)
39 and 34). Proponents of multiprocessors say that one can always is required. Figure 13 shows the various logical switches, together
add several large processors to a structure and increase the per- with some of the more common implementations. The switch
parameters are also given in the Appendix of this book. Each of
the switching issues will be discussed in turn as they apply to
various parts of the structural model (Fig. 10). The reader should
rTT:
note that Fig. 13 has relatively primitive switches. More complex
Mp
MP ~ K SI ’ --S ~ ~
switches can be formed by cascading (connecting) the primitives
together. (A noncomputer example is the manner in which tele-
~
phone exchanges are constructed and interconnected together.)
Group I . H i e r a r c h i c a l s w i t c h e s f o r c o n n e c t i n g am comDi
t o bn c o m p o n e n t s f o r 2-way conversations. The l o g i c a
'Jbn
al-L-S-b I L-s
.la gate; switching at b )
.3a S(dual-duplex; radial; switching a t b , duolez version o f . % a )
al- 5- L b
1
-
,]b gate; switching at a )
Hi'"'
n
L L
.2 ( d u p l e x I a : n b: c o n c u r r e n c y : l ; n S.gate)
a l [ ~ ~ bl~
~ b 2
L L
.za
L- S-b
duplex; r a d i a l ; switching at b )
i':s
.3c S ( d u a l - d u p l e x ;
hn
bus/chain; duplex version of .Zc)
S- L-b
I
alf?- L- bp
rn
.4 S t i m e - m u l t i p l e x . c r o s s - p o i n t : m a: n b . c o n c u r r e n c y : l ;
L S - L - b + s . ~ c'ascale ~ ~ ~ of ; ?: dupZes
,2b duplex; radial; switching at a)
S-L-b
1 i
a-L
1 --I-S-b
1 il
L
S-b
a - L-S
. 2 n
L .
c
-S-b
n
.2c S d u p l e x ;
P-K interconnection
bus/chain; c o m o n Z y used for k'-T,
3 Fig. 13. Logical and physical switch structures PMS diagrams.
68 Part 1 I The structure of computers
a -S
I 1
"- s i
L
a T S 4
s-n
L .6a s (dual-duplex; cross-poi n t ; radial )
;:a,
a m I
s ( c r a s s - p a i n t,-)
a
rn a; n b; c o n c u r r e n c y : l
am b"
;::% 1
m a; n b: cancurrency:rnin(m,n)
m x n S.gate
bl
a - L
b2
L L L
I l l
bl bp...bn bn
a I- L Y L I L - i
Group 1 1 . Non-hierarchical switching f o r interconnecting a
components f o r 2-way c o n v e r s a t i o n s .
S(duplex; n o n - h i e r a r c h i c a l )
a i
bl
b
2
_.. bn
, 8 s ( duplex; n o n - h i e r a r c h i c a l ; concurrency:l)
.5b S(crass-paint: bus/chain. use? f o r V p - P interconnect
r u a l - d u p l e x cross-
a-L-S
point l
a -L-S
aJ
a; n 0; concurrency:
1 a-L-S
.9a S ( d u p l e x : n o n - h i e r a r c h i c a l ; c e n t r a l )
constant
F
.IO S k - t r u n k ; n o n - h i e r a r c h i c a l ; rn a ; concurrency:min(rn/2
x rn S.gate; T ' s mau not be extemai!
k); 'A switch which allows communication in one direction between two
ports.
Z A switch which allows communication in either direction but only one
direction at a time.
Fig. 13. (Continued) 3A switch which allows concurrent communication between two ports.
70 Part 1 I The structure of computers
maintain control of many K s by giving a K a single instruction to work on a peripheral requires the use of the rest of the com-
task. At the completion of the task the K signals the processor puter. The S. dual-duplex is becoming more common; it provides
that the task has been completed. a method of off-line operation for maintaining better component
The switch provides a link between processor and controls for utilization and a more reliable structure.
the secondary memory or the terminals and is parameterized by
the number of processors, the number of controls, the number of Control-terminal and control-secondary-memory switching. The
simultaneous conversations, and who originates the dialogue. In switches which link a control with a particular terminal or second-
these switches the control of information transmission is always ary memory are generally fairly straightforward. Normally, a fixed
by the processor. The evolution has been approximately as follows: duplex switch is used. However, a dual-duplex switch is used if
multiple access paths to the component are required. The switch
1 S(nul1; 1P; 1K; concurrency: 1; initiator: P) links a secondary memory to a control during the transmission
P and K are connected during data transfers. of relatively long information units (e.g., records). A typical ex-
2 S(simp1exI half-duplex I full-duplex/duplex; 1P; 1K; ample of such a switch is the bus structure used when magnetic
concurrency: 1; initiator: P, K) tape units connect to a common control. Only one of the units
Each K operates independently because it can return or operates at a time (although all can be rewinding simultaneously).
request communication with P when control task is com- The switches are far less interesting than those above. Because
pleted. they are nearer the periphery, failure in them does not imply a
3 S(dua1-duplex; 2P; 1K; concurrency: 2; initiator: P, K) failure in the complete system.
Duplex paths from dual P’s to each K for reliability.
Processor function
4 S(cross-point; pP; kK; concurrency: min (p,k) initiator: P,K)
General case of multiple P’s and K’s with communication The emergence of complex PMS structures is coincident with the
among the components. development of functionally specialized processors. In the simple
computers of Figs. 5 to 9 there is place only for Pc. In the general
The early machines used the first structure, and concurrent lattice there can be a Pc specialized to perform no input/output
operation of controls was possible only by starting several controls operations; one or more Pio’s specialized to communicate with
and by very carefully programming the timing for the data trans- the T’s and Ms’s and even to organize information in Mp for
fers. Two conditions occurred to cause this: The buffering for a transshipment; additional Pio’s specialized to handle graphic dis-
T or an Ms was associated with the processor, and the control plays (hence P.display); and even P’s specialized to work on spe-
could not signal the processor. Although rather trivial to imple- cific data-types (for example, P.array) or specific algorithms (e.g.,
ment, the idea (item 2 above) of allowing a K to signal the proc- the fast Fourier transform). In addition, any of these processors
essor did not occur until after the idea of arithmetic processor may be realized by microprogramming, which is to say, by having
traps were incorporated into processors. The interrupt was used its ISP interpreted by a specialized P.microprogram.
as the method by which a K communicated its desire to converse Although the existence of various functionally specialized
with a P. The early IBM 709 provided a separate, independent processors is coupled most closely with the PMS structure dimen-
processor for handling the communication with input/output sion, the processors themselves are defined primarily by the data-
equipment. Simultaneous processor-to-input/output or secondary- types they can process. In this they agree entirely with the com-
memory dialogues could take place (provided the devices were puter-system-function dimension. Possibly the processor-function
connected to the right processor). In most of the early computers, dimension should be considered simply an extension of the com-
part of the control function (data buffering) was associated with puter-system-function dimension. On the other hand, the inclusion
the Pc, and, as such, only one device could operate at a time. This of microprogrammed processors really extends the PMS structure
stemmed from the comparatively high cost of registers, so that dimension to where a P can be seen as a cascade of two P’s.
links were established for a fixed period of time during a com- The processor-function dimension in the computer space is laid
plete block transfer of data. out in an evolutionary way, so that its correspondence with PMS
In some of the military computers a duplicate set of K’s is structure is clear. P.microprogram is put at the beginning of the
provided for reliability. The more elaborate switching structures dimension ahead of Pc, not because it occurs earlier in evolu-
(types 3 or 4 above) are rarely used between Pio’s and K’s; thus tionary development, but because it extends the PMS dimension
Chapter 3 1 The computer space 71
down into the processor. Any of the P’s along the dimension can realize several ISP’s within a single physical processor. IBM has
be attained by a P.microprogram. exploited this feature extensively in the System/360 (Part 6,
As an actual dimension characterizing a total computer it must Sec. 3), which is by far the most ambitious use of microprogram-
be viewed cumulatively (similarly to the data-type dimension). ming. One can argue that without the additional payoff, which
Thus, if a computer has a Pio, it also has a Pc, and if it has a P.array was used to ease the transition to a new incompatible computer
it also ha5 the prior ones. There are numerous exceptions to this, system by providing emulation of the old system, the micropro-
such as small Pc’s with €‘.displays (hence with no Pio’s). This gramming would be marginal.
evolutionary ordering does not correspond to complexity or num- Several P.microprogram design approaches have emerged:
ber of data-types in the P. Pc and P.array are the most complex; Kampe (Chap. 29) presents a design based on a short word; the
Pi0 and P.vector,move are least. internal processor is very much like a conventional processor. At
We will make a few brief comments on each functional type, the other extreme, the IBM System/360 (Chap. 32) is based on
taking them in the order of the dimension. a long word which allows multiple operations to be coded in
parallel. (The parallel operations are necessary to gain an accept-
able performance level.) Thompson Ram0 Wooldridge called their
Microprogram processor (P.microprogram).The term microprogram-
ming was introduced initially in “The Best Way to Design an AN/UYK a “stored logic” computer, and it provided the ability
Automatic Calculating Machine” (Wilkes, 1951~). We use “micro- to use primary memory for defining the ISP. The IBM System/36O
Model 25 (page 567) also iises this approach. The Hewlett-Packard
programmed” to mean that an ISP is defined by an interpreter
program residing in an internal Mp, processed by an internal desk calculator (Chap. 20) shows the use of microprogramming
processor (the €‘.microprogram). Thus the structure is really an on a relatively circumscribed, but complex, task.
external processor (ISP) being defined by the computer formed as
Central processors (Pc).These processors interpret an instruction
P : = Mp(interna1; read-only)-P.microprogram set for manipulating arithmetic, logical, and symbolic data-types.
In all simple systems it is the only processor and thus does all
The operations that microprogram processors perform are tasks. The growth of processor specialization can be described in
primitive in comparison with other processors. The task of the terms of relieving the Pc of simpler functions that require sub-
microprocessor is to interpret the instructions of the ISP it is stantial processing time but do not make full use of the devices
realizing. This involves mostly data transfers among the registers within the Pc, such as the arithmetic units. Crucial to this issue
of the processor state (Mps) plus simple boolean tests. Although is the time it takes the Pc to switch from one task to another (recall
it must handle all the data-types of the larger ISP, it does so only the discussion on Mps, the processor state), since many of the jobs
as bit fields to be extracted and transferred from one register to that are extracted to specialized processors are demand jobs, such
another. The complex data operations (e.g., multiplication) are as input/output.
carried out by other units (D’s). In fact, if a complex instruction With the removal of tasks from the Pc, it becomes more spe-
set were to be used for the P.microprogram, the external processor cialized. A very pure example of this is the Pc of the CDC 6600
might as well be implemented directly in hardware. In very (Chap. 39), which has no input/output instructions of any kind
minimal P’s, for example, C(PDP-8) in Chap. 5, the ISP is essen- in the Pc. That is, not only has the control and management of
tially already at the level of a microprogram ISP, as shown by the communication and transmission with the T’s and Ms’s been re-
inclusion of instruction that can be microcoded. moved from the Pc, but the act of initiation has been removed
The long lag between the idea of microprogramming and its as well and placed in the Pio’s. Thus, the 6600 Pc is just an
more widespread adoption is due to several reasons. Early ISP’s engine for working on the arithmetic, logical, and symbolic (ad-
were comparatively straightforward, so that a microprogram ap- dress) data-types.
proach was not economically justified. The interpretation overhead The mixture of operations to be performed in most complex
time is higher than with the hardwired approach, and unless algorithms prevents specialization of the Pc from going very far,
complex functions are realized this time becomes objectionable. e.g., from there being a P.arithmetic, for with every switch be-
In addition, suitable read-only memories were not developed until tween capabilities distributed in distinct P’s there must be inter-
the mid 1060s (though it is imclear whether this is came or effect). communication of the components, which introduces an overhead
An additional feature of using a P.microprogram is the ability to cost in processing time.
72 Part 1 I The structure of computers
lnput/output processors (Pio).The Pi0 specializes in the manage- The €'.display is a good example of a highly complex but spe-
ment of peripherals (secondary memories and terminals). They are cialized data-type for which there are substantial local operations
also called peripheral processors, data channels, and channels1 to perform, that is, where no interaction is needed with a complex
The tasks a Pi0 and its subordinate peripherals perform are the algorithm (that requires the Pc). Users of displays wish to correct,
transmission of information between Ms and Mp; the transmission modify, and transform the display in geometrically simple ways
of information between some extra computer real-time system (in effect, edit and view) between processing of the graphic infor-
(e.g., human); and the transmission of information outside the C, mation by complex algorithms. Thus the graphic display is a prime
via a T to some other information media (e.g., a card reader, card candidate for the development of a specialized processor.
punch, line printer, etc.). All the above tasks are similar and often The DEC 338 (Chap. 25) is typical of these processors, being
are considered the same, though in principle they can be quite neither the simplest nor the most complex (e.g., it does not have
different. A task in this environment is the management of some rotation or hidden line elimination instructions).
quanta of information, whether it be one bit or character, a voice
message, or a record or file from magnetic disk or magnetic tape. Array processors (P.array).The array processor might be considered
Thus a Pi0 does not usually change any information; it is merely a more general Pc. It has been proposed or discussed in the litera-
an interpreter for moving information. There are three exceptions: ture for some time. (See bibliography for Chap. 27, page 329.) The
Computation is required for error and correction and/or detection; information unit processed is an array of one (vector) or two
computation is required if recoding and reformatting are done; (matrix) dimensions. Instructions are provided to operate on these
and computation is required when search operations are carried data. The specification of algorithms for a P.array is based on the
out on Ms without Pc intervention. assumption that an operation can be carried out in parallel for
To accomplish the above tasks requires a fairly simple instruc- array elements. Actually, both serial (sequential) and parallel
tion set. Typically it contains jump (branch); data transmission (concurrent) execution can be implemented. Both structures have
within Mp to initialize process variables; simple counting ability, the same logical characteristics, from an ISP viewpoint, and may
e.g., to control error retries; subroutine calling; interrupt process differ only in execution rate. The three array processors, ILLIAC
handling; initializing KMs or KT; testing the state of KMs or KT; IV (Chap. 27), NOVA (Chap. %), and the IBM 2038 (page 577),
and sometimes code conversion (data in one code format is con- are discussed in Part 4, Sec. 2 (page 315).
verted to another code). Thus substantial arithmetic and logic Vector-moue processors. The vector-move processor is a special-case
facility is not needed. Part 4, Sec. 1 provides a detailed discussion P.array. It is capable only of moving a word vector at some loca-
of Pio's. tion in Mp to some other location within Mp. Because of its limited
instruction set, such a P is found only in computers which require
Display processors (P.display). The P.display is a complex Pi0 that constant Mp shuffling. This condition arises either because of a
processes information for display terminals. The data-type is a hierarchy of Mp speeds or because the programs must have a
representation of a complex graphic object, e.g., lines, points, particular structure before they can be interpreted by the proc-
curves, and spatially localized text. The representations vary con- essor. A time-shared computer might require such a processor for
siderably from system to system, using various list pointers and multiprogram memory management. It is therefore common to find
vector encodings. The operations on the data-types include the block (vector) transmission instructions in a Pc. The IBM Sys-
maintenance of the display (due to the short-term persistence of tem/360 has Pio(Storagc channel) for this function (page 577).
the CRT); the selective modification of the representation under
commands from the T.display or the Pc, such as adding or deleting Special algorithm processors (P.aZgorithm).Only a small number
a line, inserting text, etc.; the control of T.inputs such as key- of special algorithm processors have been specified and/or imple-
boards, light pens, joysticks; and the performance of more complex mented. High performance is almost guaranteed by hardwiring and
spatial transformations, such as translation, rotation, scale change, through specialization. The time to fetch the algorithm (instruc-
and determination of hidden lines. tion fetch time) and many of the references to Mp for temporary
'These terms are usually used without distinguishing between a Pin and
a Kin, that is, whether the device interprets a sequential program (and -
data are eliminated by hardwiring. A hardwired algorithm can
easily outperform a stored program by a factor of 10 100. The
lack of these processors in systems stems mainly from lack of
thus is capable of sustained independent activity) or only decodes a single
instruction. market demand.
Chapter 3 I The computer space 73
It is not clear that the special algorithm processors meet our for memories of a specified access algorithm. Where there has
criteria for being a processor, because of the rather limited func- been variation, either it has been linear with size (e.g., buying
tions they perform. In fact, some so-called processors are just K’s, two boxes of magnetic core Mp versus buying one) or there has
or D’s since they have no instruction location counter and inter- been a narrow range of cost/performance tradeoff (as in data rate
pret only a single instruction at a time, requesting each new for magnetic tapes, in which modest increases in density and tape
instruction from a superior component. speed can be bought for substantially increased dollars). Table 5
Algorithms which have been hardwired (or proposed) include shows the relative price, size, and performance of various mem-
the fast Fourier transform using the Cooley-Tukey algorithm; ories. The memory-size versus information-rate plot (Fig. 14) shows
cross-correlation, autocorrelation, and convolution processing; the clustering of memories and their suitability for a particular
polynomial and power-series evaluation; floating-point array function.
processing; and neural network simulation.’ From a technology standpoint, Mp’s have been constrained to
either cyclic- or random-access memories (although one can easily
Language processors (P.Zanguage).Laqguage P’s interpret a lan- construct any type from random-access memories). In Part 2, Sec. 1
guage that has been designed to some external criteria, such as we have not separated the machines according to whether they
a procedure-oriented language (ALGOL or FORTRAN) or a list used cyclic- or random-access memories. The early first-generation
language (IPL-VI). Thus complexity takes the form of a complex computers used cyclic-access memories. Part 3, Sec. 2 presents
data-type for the “instruction,” rather than a complex data-type only the cyclic-access memories.
for processing (e.g., floating complex numbers). If such processors Similarly, Ms’s have been constrained to be cyclic or linear,
were extended to do all the things a Pc also does, then they would although quasi-random access has been achieved with some disks
become more complex than a Pc. However, to date, most of them and magnetic-card memories (random by block and linear or cyclic
are experimental and focus exclusively on language interpretation. within a block). Any Ms’s can be part of almost any computer
In Part 4, Sec. 4, several examples are presented. It is worthy structure. Thus there is no large effect of Ms structure on the main
of note that of the three P.1anguage.s only EULER (chap. 32) has design features of computer systems, and they are not discussed
been implemented in hardware using a P.microprogram. to any extent in the remainder of the book. Our discussion of
memory type below deals exclusively with Mp and Mps.
Memory access
The most useful classification of memories is according to their Stack and queue memories (M.stack, M.queue). Data elements in
accessing algorithm.2 These are queue (i.e., access according to a stack and queue are not accessed explicitly, as we noted above.
first-in-first-out discipline); stack (i.e., access according to first- The stack has some rather unique properties that aid in the com-
in-last-out discipline); linear (e.g., a tape with forward read and pilation and evaluation of nested arithmetic expressions. Although
rewind); bilinear (e.g., a tape with forward and backward read); there are no machines employing stacks exclusively for primary
cyclic (e.g., a drum); random (e.g., core); and content and associa- memory, there are stacks in some arithmetic processors. Part 3,
tive. All these memories are explicitly addressed except the stack Sec. 5 is devoted to processors with stack memories (i.e., with
and queue, which deliver an implicitly specified i-unit on each stacks in the processor state).
read. The IPL-VI machine (Chap. 30) is the only computer in the
Memory size and basic operation times (Le., the time constants book to have its entire memory organized as a list of stacks.
in the access algorithm) are important too, of course. But once Although no hardware exists that inherently behaves as a stack
a distinction is made between Mp and Ms, then for any given or queue,3 it can be simulated by a random-access memory. A shift
technological era there have existed characteristic sizes and speeds register capable of shifting in either of two directions is a stack.
expensive, simple, producible memory. By the second generation The memory is organized on a digit-by-digit serial basis for a word
the cost of Mp.random (though still more expensive than an (e.g., ZEBRA with binary and IBM 650 with decimal). Hence, the
Mp.cyclic) was about equal to the processor logic. The incremental arithmetic or logic function hardware is implemented for only a
cost for an Mp.random in a large system was then small, whereas single digit. An operation is done for the entire word by iterating
the performance gain could be a factor of up to 3,000 (access time over all digits in time; thus the cost of a serial computer is nearly
-
of 10 microseconds versus 30 30,000 microseconds). Some of the independent of its word length.
first-generation machines were reimplemented using transistors Because of the cyclic and synchronous nature of these Mp's,
(the LGP-30 became the LGP-21). Only a few new cyclic it is difficult to synchronize them with secondary memories and
access machines were introduced in the second generation. Most terminals (which are also synchronous). The very early machines
notable was the low-cost Packard-Bell PB-250 using transistor logic had no large secondary memories. In some cases, where magnetic
and magnetostrictive delay lines (a derivative of the Bendix G-15 tape was used, it was added at very low performance (low density,
and NPL ACE). low speed, and, therefore, low data rates) so that synchronization
Nearly all these computers use some form of n 1 addressing. + was not a problem. In other cases a small random-access core
r
J3
m
IO”
1 0’0
109
108
I0 7
106
105
/
(6-8)
I
---
11-21 M a g n e t i c c a r d
M o v i n g h e a d disk-pak
( 1 unit 1
L
Moving h e a d disk
I
’ ’,l
(321
/-
----‘\
Chapter 3
i n t e g r a t e d an
super conductive
1 The computer space 75
M = (p r o c e s s o r d e f i n i t i o n 1,
read card only, capacitive
2.
0 Ms -tape, drums,disc,magnetic
E (128)
al Content
3 104 addressed.
integrated
1o 3 M(terrnina1
M(working)
~ - ~ - ~-~ - ~ _ ~
- -~ ~ ~ - - ~~-
M(Logic) \
10’
Stepping switches Transistor
/
circuits ,Integrated transistor \I
\ /Mechanical Relays Fluid
100
io5 lo6 10’ io8 io9 iolo io”
’ ( x ) indicates width of informotion,In bits E f f e c t i v e i n f o r m a t i o n rate! in b i t s / s e c
memory was added to provide synchronization between the two electrostatic and depended on maintaining a charge on plates of
memories (for example, IBM 650). an array of capacitors. The most common was the Williams tube
(invented by F. H. Williams at the University of Manchester)
Rundoni-uccess memories (Mp.randon~).Random-access memories which works in essence like a CRT, with the beam used to charge
were used late in the first generation, and they have remained a capacitor array at the tube face [Williams and Kilburn, 19491.
the predominant memory during the second and third generations. Other schemes included an array of capacitors which were selected
It is unlikely that their popularity will decline unless content- by digital logic (Pilot, Chap. 35).
addressable memories can b e constructed sufficiently cheaply (if Late in the first generation Forrester [1951] invented the core
then). The earliest first-generation random-access memories were memory, which rapidly became the predominant primary-memory
76 Part 1 I The structure of computers
component. It is unlikely that it will be replaced in the near for content-addressable memories with a large information-content
future; the most likely candidate is large-scale integrated-circuit address. For example, the read-only memories for microprogram
arrays of flip-flops. processors use long words principally because content-addressable
The random-access memory seems nearly perfect for the Mp’s memories are not available. Ideally a microprogrammed processor
of present computers. Of course, enthusiasm for this memory may would like to look at a fairly large processor state to determine
be based on not knowing how computers would have developed what action is to be taken in the microprogram. It is interesting
if we had not had them. However, with little or no effort an to speculate about the evolution of computers if a content-
M.random can be a stack, a queue, a linear, a cyclic, and even addressable memory had been developed in place of the random-
(within limits) a content or associative memory. It is an organiza- access memory.
tion which is very hard to beat.
M p concurrency
Content-addressable and associative memories. It is p o s d d e to Multiprogramming is the simultaneous existence of multiple,
conceive of many exotic accessing capabilities, and numerous independent programs within Mp being processed sequentially or
proposals have been made involving either theoretical structures in parallel by one or more processors. Multiprogramming provides
or experimental prototypes. Since no particular varieties have each user program with a memory space independent of other
become widespread, terminology is still variable. Content- users. It may provide, in addition, the sharing by several users (for
addressable memories are usually taken to mean a collection of independent use, not for communication) of a block of Mp, which
cells of predetermined size (i.e., a fixed i-unit) such that if one thus does not have to be duplicated. For example, operating sys-
presents as “address” the contents of a predetermined part of the tems software, including compilers, assemblers, loaders, and edi-
cell (the tag or content address) then the contents of the entire tors, can be usefully shared.
cell will be retrieved. An associative memory is usually taken to The ability to have multiple programs gives rise to a corre-
mean a system such that, when presented with an item of informa- sponding problem of communication between programs. We have
tion, it delivers one or more “associated” items of information. defined this as a correlated dimension in the computer space
The principle of association is variable, yielding different kinds (interprogram communication) and will discuss it in the next sec-
of associative memories. Content-addressable memories provide tion. The issues it raises are just the opposite from those raised
a form of association, as do all memories, in fact. Thus the term by the requirement for multiple programs, which are discussed
“associative memory” tends to denote forms of association different in this section. Here we are concerned with protecting one pro-
from familiar ones-forms that presumably have less sharp con- gram from another-with assuring that no unjustified communica-
straints imposed by the structure of memory (as opposed to the tion will occur-and with obtaining appropriate space in Mp so
structure of the information in the memory). that multiple programs can run.
No examples exist of a computer with a content-addressable The requirement for protection is obvious. If two independent
memory as its primary-memory structure. However, both the IBM programs are to be resident in Mp at the same time, they must
360 Models 67 (page 571) and Model 85 (page 574) use 8 and not have access to each other’s space. Not only would such access
-1,000-word content-addressable memories, respectively, to in- (especially for writing) have disastrous consequences when the
crease performance (in both cases they are transparent to the programs are running, but they would be entirely unpredictable
program). The CDC 6600 instruction buffer is in effect a small and undebuggable from the viewpoint of the programmer of each
content-addressable memory. In the above three cases, the con- individual program. Thus this requirement is absolute; i.e., it must
tent-addressable memories vary in size and position in the struc- be highly reliable. This implies a hardware solution, although
ture; however, the pattern of use is common. There is a large but purely software schemes are possible in special cases.
slower Mp.random behind the content-addressable memory. The The requirement for appropriate space is somewhat more sub-
purpose of the fast small content-addressable memory is to hold tle. Certainly there must be enough space in Mp for all the pro-
local, current data so that an access will not have to be made to grams that are to be resident simultaneously. It must be possible
the random-access memory. to find that space, assign it to a new program, and make it available
Small prototype associative addressable M’s have been con- again when that program is finished. But what kind of space will
structed, but they are normally based on random-access memories do? Must it be a single interval of Mp, large enough for the total
nnder the control of special hardware. There are immediate uses program with data? Arid if the program is assembled or compiled
Chapter 3 I The computer space 77
in Mp and is removed temporarily to make room for another instruction) and finds the actual address in Mp, so that the correct
program, must it be brought back into the exact same addresses contents can be obtained.
into which it was originally assembled? This might seem simply a complicated and abstract way to view
The key issue resides in the kind of intercommunications that matters, but it becomes essential as soon as we realize that the
hold within a program and its data, for these determine how and computer can have hardware memory mappings other than the
in what way a program is interconnected and depends on the familiar direct-addressing structure of Mp. Furthermore, if this
specific Mp addresses that it occupies. These connections are of mapping is given the right properties, it may solve some of the
two kinds: explicit addresses present in the program and data and space-assignment and protection problems for Mp concurrency.
implict relations between addresses due to addressing algorithms What we have really done is to divorce the addressing required
(e.g., that programs are laid sequentially in Mp, or that the ele- by the programs from that provided by the physical computer,
ments of an array are to be accessed by indexing and hence must so that we can redesign it (via the memory mapping) to meet new
occupy consecutive addresses). Again, although some purely soft- design requirements that were not apparent when the original
ware solutions to the space issue exist, hardware is involved in random-addressing schemes were created.
a fundamental way. Let us make the notion of memory mapping more precise. The
Thus, the two main questions of program concurrencyl- program contains virtual addresses, z (that is, symbols in the pro-
protection and space assignment-imply basic design features of gram that denote addresses are taken to denote addresses in Mv).
a computer system. It might seem that they imply separate fea- During the execution of the program, whenever there is a refer-
tures and should be separate dimensions in the computer space. ence to an address z (either explicitly via an address calculation
In fact, each proposal for how to solve the space-assignment prob- or implicitly via, say, getting the next instruction), a computation
lem also contains a particular proposal for the protection problem. occurs on z to obtain the actual address in Mp. This computation
Thus we treat them as a single dimension. is part of the Pc, just as is an automatic indexing or indirect-
addressing calculation. It takes as input not just the virtual address
Virtual-address space and Mp mapping. Before considering various z but information on where the program is located in Mp. The
solutions to Mp concurrency (Le., the values along the dimension), latter information is called the map, and a program’s map infor-
let us introduce two concepts in terms of which all current solu- mation is determined when it is placed into Mp on a given run.
tions can be understood. Consider a particular program, PRO- Thus, using our ISP notation, and calling the address calculation
GRAM-1, one of many that might wish to reside in the Mp. PRO- f, we get
GRAM-1 assumes a set of addresses, some explicitly and some
Mv[z] : = Mp[f(z,map)]
implicitly, in the addressing algorithm it uses. PROGRAM-1 re-
quires a memory space that has addresses that satisfy all these That is, the information in virtual memory at virtual address z
requirements, the implicit and explicit ones. Other than that it is the same as the information in actual memory at address
does not care how these addresses are realized. Let us call this f(z,map).
address space required by PROGRAM-1 its virtual memory, Mv. This whole scheme is built to permit programs to be placed
Thus, each program has its own virtual memory. (You might think in Mp’s in various ways, e.g., relocated or scattered around, and
of this as having its own Mp, except, as we shall see, this Mp may still make it possible to run the program. Any such scheme brings
be many times bigger than any actual Mp and still be entirely a solution to the protection problem, namely, that for some values
feasible.) of z the above calculation cannot take place or is invalid (i.e., there
Actually to run PROGRAM-1 requires that it be placed in the is no mapping for z). This can correspond to a violation of protec-
real Mp in such a way that the real addresses of Mp containing tion, which can then be prevented. All calculations may even be
it satisfy all the requirements, that is, that it be a faithful image permissible, but f is so arranged that it never produces an address
of the virtual memory. Thus there must be some memory mapping in anyone else’s part of Mp.
that maps the actual addresses into the actual memory. Once The memory map is part of each user’s program. With many
PROGRAM-1 is placed in Mp there must be some process that users, it must reside in Mp, since there will not be enough space in
takes each virtual address (as it occurs to be processed in an Mps to hold a large amount of mapping information. However,
when a program is being executed, some part of the mapping
‘See also Randell and Kuehner [1968]. information becomes part of the Mps (Le., at least the Mp address
78 Part 1 I The structure of computers
of the rest of the map). In addition, the map may contain special be one-dimensional, Mv[O:n], or two-dimensional, Mv[O:s][O:m].It
access control information, such as whether a part may be read, could be of higher dimension, but the need seems not to have been
read as data, written, or read as program. The map can also collect felt (since within any single dimension one can have multi-
statistical information concerning whether a part of the program dimensional arrays as one normally does in a regular Mp). How-
has been used or has been changed (written). ever, the two-dimensional array, which also is called segmented
Random-access memories for Mp constrain the mapping by addresses, since it can be taken as a discrete collection of s + 1
requiring linear addresses of the form Mp[O:p], since the mapping segments each of m + 1 linear addresses, has advantages in terms
calculation must be economical (as it is performed with very high of the mappings; namely, segments can be placed disjointly in Mp
frequency). We would not consider a map structure which provides without fear that virtual-address calculations will cross from one
every word in Mv to be mapped into an arbitrary word in Mp, segment to another.
for this would require a map exactly the same size as Mv. With With this introduction to the problems of multiprogramming
many programs in Mp, there would be little room for anything we will look at some of the hardware schemes. Table 6 provides
but maps. Similarly, the amount of processing in f, the calculation, a summarization of them, including a brief description of how each
must be very minimal. These two aspects constrain the mapping scheme operates.
scheme strongly.
The constraint to linear addresses appears to force the structure No special mapping hardware. If no hardware exists in the Pc to
of virtual memory to consist of a multidimensional array. This can accomplish a memory address mapping, then when the address
Hurrlioare designution
(cinaizged in order uf t n c r m w i g Method of’ memory allocation Limits of particular
hardiLcire coinplerity) among multiple users method (example of use)
Xo relocation Mr 5 ,Mp;
Conventional computer-no memory-al- No special hardware. Completely done by inter- Completely interpretive programming
location hardware pretive programming. required. Very high cost in time is paid
for generality. (JOHNNIAC interpret-
ing JOSS).
1 + 1 users. Protection bit for each A protection bit is added to each memory cell. Only 1 special user + 1 other user is al-
memory cell The bit specifies whether the cell can be lowed. User programs must be writ-
written or accessed. ten at special locations or with special
conventions. or loaded or assembled
into place. The time to change bits if
a user job is changed makes the
method nearly useless. No memory
allocation by hardware. (IBM 1800)
1 + 1 users. Protection bit for each A protection bit is added for each page. (See No memory allocation by hardware. (SDS
memory page. above scheme.) Sigma 2)
Page-locked memory Each block of memory has a user number which Not general. Expensive. Memory reloca-
must coincide with the currently active user tion must be done by conventions or
number. by relocation software. A fixed, small
number of users are permitted by the
hardware. No memory allocation by
hardware. A program cannot be moved
until it is r u n to completion. (IBM
System/360)
Chapter 3 I The computer space 79
One set of protection and relocation reg- All programs written as though their origin were As users enter and leave, primary-mem-
isters (base address and limit regis- location 0. The relocation register specifies ory holes form, requiring the moving
ters). Also called boundary registers. the actual location of the user, and the pro- of users. Pure procedures can be im-
tection register specifies the number of plemented only by moving impure part
words allowed. adjacent to pure part. (CDC 6600,
PDP-6)
Two sets of protection and relocation reg- Similar to above. Two discontiguous physical Similar t o above. Simple, pure proce-
isters. Two segments. areas of memory can be mapped into a homo- dures with one data array area can be
geneous virtual memory. implemented. (UNIVAC 1108, PDP-10)
n 23 sets of protection and relocation Similar to above. More similar t o page mapping. Has not been used in any conventional
registers. computer.
Mapping, Mu 2 Mp:
Memory page mapping For each page (26 to 21' words) in a user's vir- Relatively expensive. Not as general as
tual memory, corresponding information is following method for implementing
kept concerning the actual physical location pure procedures. (Atlas, CDC-3500,
i n primary or seconaary memory. If the SDS-940)
map is in primary memory, i t may be desir-
able t o have "associative registers" at the
processor-memory interface to remember
previous reference to virtual pages, and their
actual locations. Alternatively, a hardware
m a p may be placed between the processor
and memory to transform processor virtual
addresses into physical addresses.
Memory page/segmentation mapping Additional address space is provided beyond a Expensive. Little experience t o judge
virtual memory above by providing a seg- effectiveness. (GE 645, IBM 360/67)
ment number. This segment number ad-
dresses or selects t h e page tables. This al-
lows a user an almost unlimited set of ad-
dresses. Both segmentation and page map
look-up is provided in hardware. May be
thought of as two-dimensional addressing.
Indirect references through a descriptor All data are considered part of a descriptor An indirect reference must be made to
table to segments. array which is referred to by a number. A the description table in Mp. (B 5500)
descriptor table indexed by the descriptor
number is used to locate the array in Mp
and give its size.
80 Part 1 1 The structure of computers
z is encountered in the program, the information at Mp[z] will Every reference Mv[z] takes place as
be obtained. There are still, however, two different ways to obtain
Mv[z] : = (7Mp[z](protect,bit) + Mp[z];
the effect of a virtual memory.
Mp[z](protect-bit) + protection violation t 1)
First, one can operate interpretively, with a software system
taking the place of hardware. That is, the programs of all the users That is, any reference to a word with a protect bit causes an error.
are in a nonmachine language (e.g., a higher procedure-oriented The other two schemes protect on the basis of blocks of words.
language), and each access in the language is processed by the
software interpreter before an access is made to Mp. It is clear Protection and relocating register(s) hardware. A protection and
that all the logical power of a memory mapping is available with relocation register mechanism is used in four schemes of Table
this scheme. The only drawback is the loss of efficiency from the 6. These provide either one concatenated, one additive, two addi-
interpretation, which may range from a factor of 5 to 100. Conse- tive, or n additive register pairs for mapping a single program into
quently this scheme is used only in special circumstances, such one, one, two, or n nonadjacent blocks in Mp. The authors know
as multiuser time-shared conversational algebraic languages. of no schemes where more than three registers are used; this would
The second scheme is to modify the code at the time it is placed really be akin to using a more general page map. Generally, these
in the Mp for a given run, so that all addresses in the code corre- schemes restrict Mv 5 Mp.
spond to the actual Mp addresses used. That is, an assembly or An additive protection and relocation register pair is shown
translation operation is performed each time the program is placed in Fig. 15 in which four users are occupying a Mp[0:7999]. Each
in Mp. The advantage of this scheme is that no further address user program is written to occupy a continuous address space in
calculations are necessary. There are three disadvantages. Assem- a virtual Mv. Thus in ISP, when Pc is running programs for user-j,
bly operations are expensive so that, although the scheme is tolera- which address Mv[z], with z varying from 0 to vj - 1 the map-
ble if the program is brought in once and run to completion, it ping uses actual memory. The action is
is not tolerable if programs are continually being swapped in and
out of Mp. In addition, the program must be laid into continuous
Mv[z] : =((z < Protection) -+ Mp[z + Relocation];
intervals of Mp corresponding to predetermined segments of the
z 2 Protection + (Protection violation t 1))
program, for assembly occurs on a static representation of the Protection and Relocation are the two registers that specify map-
program and cannot unravel the potential effect of address algo- ping. The implementation of this scheme generally takes the form
rithms. Finally, the size of Mv (i.e., the addresses used externally) of adding the contents of the relocation register after all address
must be not greater than Mp. calculations have taken place. Thus, in PMS we might think of
Relative to these software schemes-one interpretive and very the structure
expensive and one involving assembly (Le., compilation) and load-
ing-the hardware schemes to be described appear as address Mp-K(ad&ess translation)-Pc.
interpreters, where the cost of continuous interpretation has been
M(lProtection,Relocation)
made tolerable.
- 0 i 2 2
restricts each block to he the same size. Note that Mv can he Protection H +
,/ k--Z-21
u q
greater than Mp. In addition, parts of the virtual memory may
remain unused. when user 2 is
----'I
Hardware registers,I7 ,,
User-memory'' addresses in 1,000s of
words
There are two ways the above scheme is usually implemented: running
\
"Absolute memory'' addresses in 1,000s of words
Mp-M.map-Pc
lntetprogram communication
2 The map is retained in Mp and referenced by a protection
and relocation register which are set for the particular active The dimension of interprogram communication is completely cor-
user. In order to avoid making references to Mp for each related with the multiprogramming dimension as we have previ-
word reference to Mv by a Pc, a small, fast M(content ad- ously noted. To have a problem of intercommunication, there must
dress) is placed between Pc and Mp. The PMS structure is be a structure of components that require communication. At the
simplest level the dimension is represented by a single program,
and there is no need for intercommunication. Variables of the
L(data) -
Pc
,,,,,,r
The Hurroughs R 5000 (Chap. 22) and the later R 8500 have
a mapping that is more closely integrated into the Pc because they Fig. 16. Memory allocation using a page allocation map.
82 Part 1 I The structure of computers
program are completely accessible to the whole program, and the called. Typical conditions which cause traps are arithmetic results
address space is essentially uniform. outside expected range or erroneous program conditions (e.g.,
The second value of the dimension, subroutine calling, produces trying to call someone else’s program). The trap causes a change
a hierarchy of communication contexts. There is not a fixed num- in context that is synchronized with the process causing it. Trap-
ber of levels to the hierarchy, since each subroutine may call others ping is a form of program interruption; a trap is an intraprocess
ad izuuseum. When subroutines are present, address names and interrupt as distinct from interprocess interrupts.
values within the subroutine become addresses which are local Intercommunication between two independent processes (being
to that part of the subprogram. Such a structuring is apparent carried out by two independent components) is usually accom-
when looking at the higher-level languages such as FORTRAN, plished by using the program interrupt. The interrupting process
ALGOL, and PL/I, where there are explicit statements for con- requests that a program interrupt occur in a component (inter-
trolling the names (addresses) that are available to each of the ruptee). The interrupter’s request is acknowledged by the inter-
parts of the program. The concept of subroutine structure has been ruptee, and a change of process state occurs in the interruptee;
with us almost from the first programs. a new process is then run in the interruptee on behalf of the
The next value of the dimension relates to signaling within a interrupter. The program interrupt is used among processors in
single process. It is akin to subroutines embedded in hardware. a multiprocessor system and between 1Pc and nPio’s. A control
These are called extracodes and were perhaps first suggested for K may also use the program-interrupt request to communicate
the Atlas (Chap. 23). Each extracode can be looked at as just a with its superior Pi0 or Pc. For example, a Pi0 does not usually
call to a specific subroutine. The variables of the user (caller’s) have the logical capability to execute an algorithm which would
program are made available to the called (extracode defined) decide that action is to be taken for various error conditions.
program. The calling usually is accompanied by a context shift, Usually the interruptee is equipped with certain logic which
in which a completely different program (one that is used by any is capable of arranging priorities of requesting interrupters. The
number of calling programs) takes command to interpret the in- typical kinds of interrupt requests are component faults (e.g.,
struction. This scheme is used in systems which are controlled by parity error), a timer has counted down, and various task comple-
a special software monitor. When a function such as the input tions (e.g., a program has completed, a tape unit has rewound,
or output of a file is required, the main program issues a call to a disk arm has stopped moving, a certain record has been found
the monitor to make the transfer. (In theory, the monitor knows on tape, a buffer is full).
about conditions in the system and has the capability to perform State diagrams would show how each of the communication
the complex function.) A central monitor control can then begin methods above are similar to one another. A typical interrupt state
to run another program if the request is one which would normally diagram is shown in Fig. 17. There are four states: normal process
halt the computer. This form of communication is useful to supply interpretation, process state saving, interrupt process interpreta-
extra facilities to users and to have a method of knowing what tion, and process state restoration. The sequence is as follows:
the users are doing (e.g., so that equipment will be better utilized). Normal instruction interpretation is occurring in the inter-
As more complex program structures are directly represented ruptee.
by the hardware, the intercommunication complexity also in-
The interrupter requests an interrupt.
creases beyond the simple subroutine call. If a segmented-memory
scheme is used, the problem of communicating between the seg- After some delay, t.acknowledgment, a state is reached in
ments can be solved in a range of ways. The value of the range which part of the interruptee’s process state is saved.
would be somewhere between ignoring the problem with the After t.acknowledgment +tsave, a program is running in
hardware and providing methods for naming of addresses between the interruptee in response to the interrupter.
the communicating segments.
The interrupt program is run for t.interrupt.
In the above cases, the communication among the various
programs or parts of programs is done explicitly by one program At the completion of the interrupt program, the original
to another program. The instruction trap does not fit this view process state is restored in the interrupter.
so nicely. Here, conditions occurring within a single process which After t.restore, normal processing resumes in the inter-
are not explicitly called cause another part of the program to be rupter.
Chapter 3 I The computer space 83
The significant attributes of the system are the various times re-
quired to move from state to state. These times are directly related Interrupt request
f r o m interruptor,
to the amount of process state which must be saved (and restored)
when switching context.
The intercommunication problem is probably the least under- Interpret
instruction in Mp
stood dimension in the computer space. It is rather intimately (interpretation in
i n t e r r u p t e d state1
related to the ISP, in that the various calling methods (implicitly
and explicitly) depend on the ISP. Also, the amount of processor
state (a function of the ISP) affects the response time for making
context transitions. Most interrupt systems allow several inde-
pendent classes and/or sources of interrupters. The classes are No i n t e r r u p t t restore Interrupt
request execution
program
arranged in priority so that lower-level interrupters are ignored
until higher-level interrupt programs are run to completion (see
Chap. 42 on the SDS 910-9300 series). The design problems as-
sociated with intercommunication are not those of implementa- Fig. 17. State diagram for the interrupt process.
the processor we assume that almost every internal register- memory. This scheme is discussed for the IBM System/360 Model
transfer operation requires one or more clock times. (A simple 85 (page 574). The look-aside memory suggested by Wilkes
multiply operation usually takes between n/2 and 2n clock times.) [1965] is a content-addressable memory for retaining the active
We do not mean to rule out multiple simultaneous internal opera- (most recently used) memory words.
tions within the processor, but they are exceptions. With only a
view of a processor's registers, it is easy to tell if multiple opera- Pipeline processing. Pipeline (assembly-line) concurrency is the
tions are possible. Most of these processors do only one operation name given to a system of multiple functional units, each of which
at a time. As a rule, the simple processor is locked to the primary- is responsible for partial interpretation and execution of the in-
memory cycle time (usually core). Approximately 2 - 10 events struction stream. A pipeline processor has several partially com-
pleted instructions in process at one time. Each processor stage
(clock times) are available within the processor. For example, the
PDP-8 (Chap. 5 ) has four events, and the IBM 7090 (Chap. 41) operates on a specific part of the instruction, e.g., instruction fetch,
has 10 events. A precise measure of parallelism would count the effective-address calculation, operand fetching, execution of opera-
number of operations per clock time for given program conditions. tion specified by the instruction, and results storing. A PMS dia-
gram for a pipeline processor is given in Fig. 19. Thus there is
Multiple instruction streams, 1 Pc. The only example of this a separate functional unit for each state suggested by the state
structure in the book is the CDC 6600. Opportunities for such diagram of Fig. 4. There must be interlocks so that sequence is
a structure are possible with the parallel computer suggested by preserved, i.e., so that results are not used until they are available.
Lehmann (Chapter 37). Figure 18 shows a time/function diagram of a pipeline processor.
There are at least three instructions being interpreted simultane-
Multiple datu streams. The most obvious implementation of ously. Although we have not extended Fig. 18, we would expect
multiple data streams with one or more instruction streams is the processor in the sketch to operate on about eight instructions
the array processor. Part 4, Section 2 is devoted to these struc-
tures.
the instructions. Stretch (Chap. 34) and the CDC 6600 (Chap. 39)
use instruction buffers. A small, restricted content-addressable
memory holds a block of instructions. In the simplest case of these
computers a block of memory, relative to the instruction counter, toq Operation t i m e t o determine instruction q
toq Access t i m e to d e t e r m i n e instruction q
is kept in the local instruction buffer memory. to"
tov
Ooeratlan t m e t o determine dotum v
A c c e s s t i m e t o determine datum v
to O p e r a t i o n t l m e for instruction
too Operation t i m e t o determine operation of instruction
tq Total instruction time
Look-aside buffering (sluve)memories. Look-aside is a more general
form of instruction buffering because both instructions and com-
monly accessed data tend to migrate to the faster look-aside .
Fie. 18. Time-function diagram for a .pipeline
I . processor.
Chapter 3 I The computer space 85
L
-
M . instructions
instruction fetch
t4.data
I U
data setup execution
M.data
-
data restore
Fig. 19. Example of processor parallelism by spatially independent control function (pipeline processing) PMS
diagram.
at one time. Note that the processor sometimes completes later in a multidimensional space. The previous discussion has enumer-
instructions first. In this model there is only one instruction fetch- ated the values of one dimension, while (in effect) holding the
ing, one operand fetching, and one operand storing unit, while values of other dimensions constant. The dimensions are highly
there are multiple data operation units. The particular number correlated, especially with cost and evolutionary time. We have
of each type of unit is obviously not fixed for all structures but been brief in presenting the dimensions because the book is pri-
depends heavily on the memory system, the number of instruction marily about computer examples. However, one should he able
streams, and the ISP. to recognize the dimensions and values when they are encountered
A processor may require many data-operation units in order within the context of a particular computer.
to avoid bottlenecks. Each unit is independent and may be The remainder of the book is organized around these dimen-
functionally capable of carrying out only selected tasks. Multiple sions. The examples lose the identity of dimensions because they
data-operations are normally desirable in a pipeline processor are descriptions of points in the space (computers). Furthermore,
so that several operations can be carried out at a time, since the descriptions themselves are not especially organized around
most of the processing time within the processor is spent on the these dimensions but are based on the designer’s own view of his
operations (e.g., multiplication, division, shifting, etc.) machine.
References
Conclusions
AdamA60,66,67,68; AdamC6O; ArbuR66; ArdeB66; BowdB53;
You now have our view of the important aspects of the stored- CampR50; CasaC62; ChasC52; CoxJ68; DennJGS; FlynM66; ForrJFjl;
program computer. We have tried to organize the parameters as GibsC66; KnigK66; MolnC67; NiseN66; RandB68; RoseS69; SamuA57;
dimensions so that a computer can b e viewed as a point (or points) SerrR62; WeikM55,61,64;WilkMSla,65; WillF49.
PART 2
a7
88 Part 2 1 The instruction-set processor: main-line computers
89
90 Part 2 1 The instruction-set processor: main-line c o m p u t e r s Section 1 I Processors w i t h o n e address per i n s t r u c t i o n
The SDS 910-9300 series UNIVAC is a single-address, decimal computer with 12 digits/
The SDS 910-9300 computers are illustrative of typical, second- word. Two instructions are stored per word. In effect, UNIVAC
generation, 24-bit computers. The computers are discussed in is a decimal version of the IAS computer. The Mp consists of
Part 6, Sec. 2, page 542. Chapter 42 also attempts to show 1,000 words, made up of 10 words/delay line. Each delay line
how implementation affects performance for the series. requires 404 microseconds to recirculate.
UNIVAC is significant because it was the most important
computer during the early 1950s. Its performance record is
The LGP-30 and LGP-21 discussed in Chap. 8. The UNIVSERVO magnetic-tape system
The LGP-30 and later LGP-21 is presented in Chap. 16 and dis- was rather advanced for 1950, considering performance, error
cussed in Part 3, Sec. 2, page 216. checking, and buffering. Particularly nice is the ability to parti-
tion the input/output system for off-line printing and key
punching.
IBM 650 instruction logic
The IBM 650 (Chap. 17) is a one plus one address computer.
One-level storage system
Its attributes as a cyclic-memory computer, though hardly ap-
parent at the ISP level, are discussed in Part 3, Sec. 2, page The 48-bit Atlas was developed at Manchester University and
216. subsequently manufactured by Ferranti Corp. (now part of Inter-
national Computers and Tabulators). The development began
about 1960, and the paper was written in 1962. The importance
The IBM 7094 I, II of Atlas with respect to current and future machines is dis-
Part 6, Sec. 1 shows the evolution of the IBM 36-bit scientific cussed in Part 3, Sec. 6, page 274.
computers. The IBM 7094 I I (Chap. 41) is presented for many
reasons (page 517). Among them are its effect on the later IBM
The engineering design of the Stretch computer
System/360 and its position as the standard large scientific
computer of the late fifties and early sixties. The IBM Stretch (also called the IBM Model 7030) single-
address computer (Chap. 34) is one of the earliest computers
built to provide maximum computing power subject to no ap-
The UNIVAC system parent cost, size, and producibility constraints. A discussion
The YNIVAC system, first delivered in March, 1951, was later of its importance is given in Part 5, Sec. 2, page 396.
known as UNIVAC I. UNIVAC (UNIVersal AutomaticComputers)
was the second computer1 to be manufactured by the Eckert-
Mauchly Computer Corporation, subsequently a division of
Remington-Rand.2
PART I
92
Chapter 4 I Preliminary discussion of the logical design of an electronic computing instrument 93
machine. We proceed to discuss what quantities the memory with at present by a factor of about 10. The precision is also safely
should store for various types of computations. higher than what is required for the great majority of present day
2.2. In the solution of partial differential equations the storage problems. In addition, we propose that we have a subsidiary
requirements are likely to be quite extensive. In general, one must memory of much larger capacity, which is also fully automatic,
remember not only the initial and boundary conditions and any on some medium such as magnetic wire or tape.
arbitrary functions that enter the problem but also an extensive
number of intermediate results. 3. First remarks on the control and code
3.1. It is easy to see by formal-logical methods that there exist
a For equations of parabolic or hyperbolic type in two inde- codes that are in abstracto adequate to control and cause the
pendent variables the integration process is essentially a execution of any sequence of operations which are individually
double induction. To find the values of the dependent vari-
available in the machine and which are, in their entirety, con-
ables at time t + At one integrates with respect to x from
ceivable by the problem planner. The really decisive considera-
one boundary to the other by utilizing the data at time t
as if they were coefficients which contribute to defining the tions from the present point of view, in selecting a code, are more
problem of this integration. of a practical nature: simplicity of the equipment demanded by
Not only must the memory have sufficient room to store the code, and the clarity of its application to the actually impor-
these intermediate data but there must be provisions tant problems together with the speed of its handling of those
whereby these data can later be removed, i.e. a t the end problems. It would take us much too far afield to discuss these
of the ( t + At) cycle, and replaced by the corresponding questions at all generally or from first principles. We will therefore
data for the (t + 2At) cycle. This process of removing data restrict ourselves to analyzing only the type of code which we
from the memory and of replacing them with new informa- now envisage for our machine.
tion must, of course, be done quite automatically under the 3.2. There must certainly be instructions for performing the
direction of the control.
fundamental arithmetic operations. The specifications for these
b For total differential equations the memory requirements orders will not be completely given until the arithmetic unit is
are clearly similar to, hut smaller than, those discussed in described in a little more detail.
(a) above. 3.3. It must be possible to transfer data from the memory to
c Problems that are solved by iterative procedures such as the arithmetic organ and back again. In transferring information
systems of linear equations or elliptic partial differential from the arithmetic organ back into the memory there are two
equations, treated by relaxation techniques, may be ex- types we must distinguish: Transfers of numbers as such and trans-
pected t o require quite extensive memory capacity. The fers of numbers which are parts of orders. The first case is quite
memory requirement for such problems is apparently much obvious and needs no further explication. The second case is more
greater than for those problems in (a) above in which one subtle and serves to illustrate the generality and simplicity of the
needs only to store information corresponding to the in-
system. Consider, by way of illustration, the problem of interpola-
stantaneous value of one variable [tin (a) above], while now
tion in the system. Let us suppose that we have formulated the
entire solutions (covering all values of all variables) must
necessary instructions for performing an interpolation of order n
he stored. This apparent discrepancy in magnitudes can,
however, be somewhat overcome by the use of techniques in a sequence of data. The exact location in the memory of the
which permit the use of much coarser integration meshes (n + 1)quantities that bracket the desired functional value is, of
in this case, than in the cases under (a). course, a function of the argument. This argument probably is
found as the result of a computation in the machine. We thus need
2.3. It is reasonable at this time to build a machine that can an order which can substitute a number into a given order-in
conveniently handle problems several orders of magnitude more the case of interpolation the location of the argument or the group
complex than are now handled by existing machines, electronic of arguments that is nearest in our table t o the desired value. By
or electro-mechanical. We consequently plan on a fully automatic means of such an order the results of a computation can be in-
-
electronic storage facility of about 4,000 numbers of 40 binary
digits each. This corresponds to a precision of T 4 0 0.9 x
i.e. of about 12 decimals. We believe that this memory capacity
troduced into the instructions governing that or a different com-
putation. This makes it possible for a sequence of instructions to
be used with different sets of numbers located in different parts
exceeds the capacities required for most problems that one deals of the memory.
94 Part 2 1 The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
To summarize, transfers into the memory will be of two sorts: 1 p e c = 1 microsecond = 1 0 F seconds
Total substitutions, whereby the quantity previously stored is 1 msec = 1 millisecond = lop3 seconds
cleared out and replaced by a new number. Partial substitutions
in which that part of an order containing a memory location-
number-we assume the various positions in the memory are 4. The memory organ
enumerated serially by memory location-numbers-is replaced by 4.1. Ideally one would desire an indefinitely large memory ca-
a new memory location-number. pacity such that any particular aggregate of 40 binary digits, or
3.4. It is clear that one must be able to get numbers from word (cf. 2.3), would be immediately available-Le. in a time
any part of the memory at any time. The treatment in the case which is somewhat or considerably shorter than the operation time
of orders can, however, be more methodical since one can at least of a fast electronic multiplier. This may be assumed to be practical
partially arrange the control instructions in a linear sequence. at the level of about 100 psec. Hence the availability time for a
Consequently the control will be so constructed that it will nor- word in the memory should be 5 to 50 psec. It is equally desirable
mally proceed from place n in the memory to place ( n + 1) for that words may be replaced with new words at about the same
its next instruction. rate. It does not seem possible physically to achieve such a capac-
3.5. The utility of an automatic computer lies in the possi- ity. We are therefore forced to recognize the possibility of con-
bility of using a given sequence of instructions repeatedly, the structing a hierarchy of memories, each of which has greater
number of times it is iterated being either preassigned or depend- capacity than the preceding but which is less quickly accessible.
ent upon the results of the computation. When the iteration is The most common forms of storage in electrical circuits are
completed a different sequence of orders is to be followed, so we the flip-flop or trigger circuit, the gas tube, and the electro-
must, in most cases, give two parallel trains of orders preceded mechanical relay. To achieve a memory of n words would, of
by an instruction as to which routine is to be followed. This choice course, require about 40n such elements, exclusive of the switching
can be made to depend upon the sign of a number (zero being elements. We saw earlier (cf. 2.2) that a fast memory of several
reckoned as plus for machine purposes). Consequently, we intro- thousand words is not at all unreasonable for an all-purpose instru-
duce an order (the conditional transfer order) which will, depend- ment. Hence, about lo5 flip-flops or analogous elements would be
ing on the sign of a given number, cause the proper one of two required! This would, of course, be entirely impractical.
routines to be executed. We must therefore seek out some more fundamental method
Frequently two parallel trains of orders terminate in a common of storing electrical information than has been suggested above.
routine. It is desirable, therefore, to order the control in either One criterion for such a storage medium is that the individual
case to proceed to the beginning point of the common routine. storage organs, which accommodate only one binary digit each,
This unconditional transfer can be achieved either by the artificial should not be macroscopic components, but rather microscopic
use of a conditional transfer or by the introduction of an explicit elements of some suitable organ. They would then, of course, not
order for such a transfer. be identified and switched to by the usual macroscopic wire con-
3.6. Finally we need orders which will integrate the input- nections, but by some functional procedure in manipulating that
output devices with the machine. These are discussed briefly in organ.
6.8. One device which displays this property to a marked degree
3.7. We proceed now to a more detailed discussion of the is the iconoscope tube. In its conventional form it possesses a linear
machine. Inasmuch as our experience has shown that the moment resolution of about one part in 500. This would correspond to a
one chooses a given component as the elementary memory unit, (two-dimensional) memory capacity of 500 x 500 = 2.5 x lo5.
one has also more or less determined upon much of the balance One is accordingly led to consider the possibility of storing elec-
of the machine, we start by a consideration of the memory organ. trical charges on a dielectric plate inside a cathode-ray tube.
In attempting an exposition of a highly integrated device like a Effectively such a tube is nothing more than a myriad of electrical
computing machine we do not find it possible, however, to give capacitors which can be connected into the circuit by means of
an exhaustive discussion of each organ before completing its an electron beam.
description. It is only in the final block diagrams that anything Actually the above mentioned high resolution and concomitant
approaching a complete unit can be achieved. memory capacity are only realistic under the conditions of tele-
The time units to be used in what follows will be: vision-image storage, which are much less exigent in respect to
Chapter 4 I Preliminary discussion of the logical design of an electronic computing instrument 95
the reliability of individual markings than what one can accept peculiar characteristics of the acoustic delay line, as well as various
in the storage for a computer. In this latter case resolutions of other considerations, seem to justify a serial procedure. For more
one part in 20 to 100, i.e. memory capacities of 400 to 10,000, details, cf. the reports referred to in 4.1.) The essential difference
would seem to be more reasonable in terms of equipment built between these two systems lies in the method of performing an
essentially along familiar lines. addition; in a parallel machine all corresponding pairs of digits
At the present time the Princeton Laboratories of the Radio are added simultaneously, whereas in a serial one these pairs are
Corporation of America are engaged in the development of a added serially in time.
storage tube, the Selectron, of the type we have mentioned above. 4.4. To summarize, we assume that the fast electronic memory
This tube is also planned to have a non-amplitude-sensitive switch- consists of 40 Selectrons which are switched in parallel by a com-
ing system whereby the electron beam can be directed to a given mon switching arrangement. The inputs of the switch are con-
spot on the plate within a quite small fraction of a millisecond. trolled by the control.
Inasmuch as the storage tube is the key component of the machine 4.5. Inasmuch as a great many highly important classes of
envisaged in this report we are extremely fortunate in having problems require a far greater total memory than 212 words, we
secured the cooperation of the RCA group in this as well as in now consider the next stage in our storage hierarchy. Although
various other developments. the solution of partial differential equations frequently involves
An alternate form of rapid memory organ is the acoustic feed- the manipulation of many thousands of words, these data are
back delay line described in various reports on the EDVAC. (This generally required only in blocks which are well within the 212
is an electronic computing machine being developed for the capacity of the electronic memory. Our second form of storage
Ordnance Department, U.S. Army, by the University of Pennsyl- must therefore be a medium which feeds these blocks of words
vania, Moore School of Electrical Engineering.) Inasmuch as that to the electronic memory. It should be controlled by the control
device has been so clearly reported in those papers we give no of the computer and is thus an integral part of the system, not
further discussion. There are still other physical and chemical requiring human intervention.
properties of matter in the presence of electrons or photons that There are evidently two distinct problems raised above. One
might be considered, but since none is yet beyond the early dis- can choose a given medium for storage such as teletype tapes,
cussion stage we shall not make further mention of them. magnetic wire or tapes, movie film or similar media. There still
4.2. We shall accordingly assume throughout the balance of remains the problem of automatic integration of this storage
this report that the Selectron is the modus for storage of words medium with the machine. This integration is achieved logically
at electronic speeds. As now planned, this tube will have a capac- by introducing appropriate orders into the code which can instruct
ity of 2’* = 4,096 =: 4,000 binary digits. To achieve a total elec- the machine to read or write on the medium, or to move it by
tronic storage of about 4,000 words we propose to use 40 Selec- a given amount or to a place with given characteristics. We discuss
trons, thereby achieving a memory of 212 words of 40 binary digits this question a little more fully in 6.8.
each. (Cf. again 2.3.) Let us return now to the question of what properties the sec-
4.3. There are two possible means for storing a particular ondary storage medium should have. It clearly should be able to
word in the Selectron memory-or, in fact, in either a delay line store information for periods of time long enough so that only a
memory or in a storage tube with amplitude-sensitive deflection. few per cent of the total computing time is spent in re-registering
One method is to store the entire word in a given tube and then information that is “fading off.” It is certainly desirable, although
to get the word out by picking out its respective digits in a serial not imperative, that information can be erased and replaced by
fashion. The other method is to store in corresponding places in new data. The medium should be such that it can be controlled,
each of the 40 tubes one digit of the word. To get a word from i.e. moved forward and backward, automatically. This considera-
the memory in this scheme requires, then, one switching mech- tion makes certain media, such as punched cards, undesirable.
anism to which all 40 tubes are connected in parallel. Such a While cards can, of course, be printed or read by appropriate
switching scheme seems to us to be simpler than the technique orders from some machine, they are not well adapted to problems
needed in the serial system and is, of course, 40 times faster. We in which the output data are fed directly back into the machine,
accordingly adopt the parallel procedure and thus are led to con- and are required in a sequence which is non-monotone with re-
sider a so-called parallel machine, as contrasted with the serial spect to the order of the cards. The medium should be capable
principles being considered for the EDVAC. (In the EDVAC the of remembering very large numbers of data at a much smaller price
96 Part 2 1 The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
than electronic devices. It must be fast enough so that, even when building digital machines in the decimal system, we feel strongly
it has to be used frequently in a problem, a large percentage of in favor of the binary system for our device. Our fundamental unit
the total solution time is not spent in getting data into and out of memory is naturally adapted to the binary system since we do
of this medium and achieving the desired positioning on it. If this not attempt to measure gradations of charge at a particular point
condition is not reasonably well met, the advantages of the high in the Selectron but are content to distinguish two states. The
electronic speeds of the machine will be largely lost. flip-flop again is truly a binary device. On magnetic wires or tapes
Both light- or electron-sensitive film and magnetic wires or and in acoustic delay line memories one is also content to recog-
tapes, whose motions are controlled by servo-mechanisms inte- nize the presence or absence of a pulse or (if a carrier frequency
grated with the control, would seem to fulfil our needs reasonably is used) of a pulse train, or of the sign of a pulse. (We will not
well. We have tentatively decided to use magnetic wires since we discuss here the ternary possibilities of a positive-or-negative-
have achieved reliable performance with them at pulse rates of or-no-pulse system and their relationship to questions of reliability
the order of 25,00O/sec and beyond. and checking, nor the very interesting possibilities of carrier fre-
4.6. Lastly our memory hierarchy requires a vast quantity of quency modulation.) Hence if one contemplates using a decimal
dead storage, i s . storage not integrated with the machine. This system with either the iconoscope or delay-line memory one is
storage requirement may be satisfied by a library of wires that forced into a binary coding of the decimal system-each decimal
can be introduced into the machine when desired and at that time digit being represented by at least a tetrad of binary digits. Thus
become automatically controlled. Thus our dead storage is really an accuracy of ten decimal digits requires at least 40 binary digits.
nothing but an extension of our secondary storage medium. It In a true binary representation of numbers, however, about 33
differs from the latter only in its availability to the machine. digits suffice to achieve a precision of lolo. The use of the binary
4.7. We impose one additional requirement on our secondary system is therefore somewhat more economical of equipment than
memory. It must be possible for a human to put words on to the is the decimal.
wire or other substance used and to read the words put on by The main virtue of the binary system as against the decimal
the machine. In this manner the human can control the machine's is, however, the greater simplicity and speed with which the
functions. It is now clear that the secondary storage medium is elementary operations can be performed. To illustrate, consider
really nothing other than a part of our input-output system, cf. multiplication by repeated addition. In binary multiplication the
6.8.4 for a description of a mechanism for achieving this. product of a particular digit of the multiplier by the multiplicand
4.8. There is another highly important part of the input- is either the multiplicand or null according as the multiplier digit
output which we merely mention at this time, namely, some is 1 or 0. In the decimal system, however, this product has ten
mechanism for viewing graphically the results of a given compu-
tation. This can, of course, be achieved by a Selectron-like tube
which causes its screen to fluoresce when data are put on it by
possible values between null and nine times the multiplicand,
inclusive. Of course, a decimal number has only log,,2 -
0.3 times
as many digits as a binary number of the same accuracy, but even
an electron beam. so multiplication in the decimal system is considerably longer than
4.9. For definiteness in the subsequent discussions we assume in the binary system. One can accelerate decimal multiplication
that associated with the output of each Selectron is a flip-flop. by complicating the circuits, but this fact is irrelevant to the point
This assemblage of 40 flip-flops we term the Selectron Register. just made since binary multiplication can likewise be accelerated
by adding to the equipment. Similar remarks may be made about
the other operations.
5. The arithmetic organ An additional point that deserves emphasis is this: An important
5.1. In this section we discuss the features we now consider part of the machine is not arithmetical, but logical in nature. Now
desirable for the arithmetic part of our machine. We give our logics, being a yes-no system, is fundamentally binary. Therefore
tentative conclusions as to which of the arithmetic operations a binary arrangement of the arithmetical organs contributes very
should be built into the machine and which should be pro- significantly towards producing a more homogeneous machine,
grammed. Finally, a schematic of the arithmetic unit is described. which can be better integrated and is more efficient.
5.2. In a discussion of the arithmetical organs of a computing The one disadvantage of the binary system from the human
machine one is naturally led to a consideration of the number point of view is the conversion problem. Since, however, it is
system to be adopted. In spite of the longstanding tradition of completely known how to convert numbers from one base to
Chapter 4 1 Preliminary discussion of the logical design of an electronic computing instrument 97
another and since this conversion can be effected solely by the to us not at all clear whether the modest advantages of a floating
use of the usual arithmetic processes there is no reason why the binary point offset the loss of memory capacity and the increased
computer itself cannot carry out this conversion. It might be complexity of the arithmetic and control circuits.
argued that this is a time consuming operation. This, however, There are certainly some problems within the scope of our
is not the case. (Cf. 9.6 and 9.7 of Part 11. Part I1 is a report issued device which really require more than 2-40 precision. To handle
under the title Planning and Coding of Problems for a n Electronic such problems we wish to plan in terms of words whose lengths
Computing Instrument.’) Indeed a general-purpose computer, used are some fixed integral multiple of 40, and program the machine
as a scientific research tool, is called upon to do a very great in such a manner as to give the corresponding aggregates of 40
number of multiplications upon a relatively small amount of input digit words the proper treatment. We must then consider an addi-
data, and hence the time consumed in the decimal to binary tion or multiplication as a complex operation programmed from
conversion is only a trivial percentage of the total computing time. a number of primitive additions or multiplications (cf. $9, Part
A similar remark is applicable to the output data. 11). There would seem to be considerable extra difficulties in the
In the preceding discussion we have tacitly assumed the de- way of such a procedure in an instrument with a floating binary
sirability of introducing and withdrawing data in the decimal point.
system. We feel, however, that the base 10 may not even be a The reader may remark upon our alternate spells of radicalism
permanent feature in a scientific instrument and consequently will and conservatism in deciding upon various possible features for
probably attempt to train ourselves to use numbers base 2 or 8 our mechanism. We hope, however, that he will agree, on closer
or 16. The reason for the bases 8 or 16 is this: Since 8 and 16 inspection, that we are guided by a consistent and sound principle
are powers of 2 the conversion to binary is trivial; since both are in judging the merits of any idea. We wish to incorporate into
about the size of 10, they violate many of our habits less badly the machine-in the form of circuits-only such logical concepts
than base 2. (Cf. Part 11, 9.4.) as are either necessary to have a complete system or highly con-
5.3. Several of the digital computers being built or planned venient because of the frequency with which they occur and the
in this country and England are to contain a so-called “floating influence they exert in the relevant mathematical situations.
decimal point”. This is a mechanism for expressing each word as 5.4. On the basis of this criterion we definitely wish to build
a characteristic and a mantissa-e.g. 123.45 would be carried in into the machine circuits which will enable it to form the binary
the machine as (0.12345,03),where the 3 is the exponent of 10 sum of two 40 digit numbers. We make this decision not because
associated with the number. There appear to be two major pur- addition is a logically basic notion but rather because it would
poses in a “floating” decimal point system both of which arise from slow the mechanism as well as the operator down enormously if
the fact that the number of digits in a word is a constant, fixed each addition were programmed out of the more simple operations
by design considerations for each particular machine. The first of of “and”, “or”, and “not”. The same is true for the subtraction.
these purposes is to retain in a sum or product as many significant Similarly we reject the desire to form products by programming
digits as possible and the second of these is to free the human them out of additions, the detailed motivation being very much
operator from the burden of estimating and inserting into a prob- the same as in the case of addition and subtraction. The cases for
lem “scale factors”-multiplicative constants which serve to keep division and square-rooting are much less clear.
numbers within the limits of the machine. It is well known that the reciprocal of a number a can be
There is, of course, no denying the fact that human time is formed to any desired accuracy by iterative schemes. One such
consumed in arranging for the introduction of suitable scale fac- scheme consists of improving an estimate X by forming X’ =
tors. We only argue that the time so consumed is a very small 2X - ax2. Thus the new error 1 - uX’ is (1 - ax)?-, which is the
percentage of the total time we will spend in preparing an inter- square of the error in the preceding estimate. We notice that in
esting problem for our machine. The first advantage of the floating the formation of X’, there are two bona fide multiplications-we
point is, we feel, somewhat illusory. In order to have such a floating do not consider multiplication by 2 as a true product since we
point one must waste memory capacity which could otherwise be will have a facility for shifting right or left in one or two pulse
used for carrying more digits per word. It would therefore seem times. If then we somehow could guess l / a to a precision of 2-5,
6 multiplications-3 iterations-would suffice to give a final result
lSee Bibliography [Goldstine and von Neumann, 1963b, 1963c, 1963dI. good to 2-40. Accordingly a small table of Z4 entries could be used
References in this chapter are all to this report. to get the initial estimate of l/a. In this way a reciprocal l / a
98 Part 2 1 The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
is not in excess of (n - o + l)/2v+1 since there are n - o + 1 sum was formed correctly, excepting a possible error 2. If several
terms in the sum; since, moreover, each p,(o) is a probability, it such additions are performed in succession, then the ultimate error
is not greater than 1. Hence we have may be any integer multiple of 2. That is, the accumulator is an
adder which allows errors that are integer multiples of 2-it is
an adder modulo 2.
It should be noted that our convention of placing the binary
point immediately to the right of the left-most digit has nothing
Finally we turn to the question of getting an upper bound on
to do with the structure of the adder. In order to make this point
a, = ~ ; = ~ p , ( v Choose
). K so that 2K 5 n 5 e K + l .Then
clearer we proceed to discuss the possibilities of positioning the
binary point in somewhat more detail.
We begin by enumerating the 40 digits of our numbers (words)
from left to right. In doing this we use an index h = 1, . . . , 40.
This last expression is clearly linear in 1~ in the interval Now we might have placed the binary point just as well between
2K 5 n 5 2 K + 1 , and it is = K for n = ZK and = K + 1 for digits j and i + 1, i = 0, . . . , 40. Note, that i = .0 corresponds
n = 2K+1,i.e. it is Z21og n at both ends of this interval. Since to the position at the extreme left (there is no digit h = i = 0);
the function 210g n is everywhere concave from below, it follows j = 40 corresponds to the position at the extreme right (there is
that our expression is s210g n throughout this interval. Thus no position h = i + 1 = 41); and j = 1 corresponds to our above
a, 5 210g n. This holds for all K , i.e. for all n, and it is the in- choice. Whatever our choice of j , it does not affect the correctness
equality which we wanted to prove. of the accumulator’s addition. (This is equally true for subtraction,
For our case n = 40 we have a, 5 log,40 - 5.3, i.e. an average cf. below, but not for multiplication and division, cf. 5.8.) Indeed,
length of about 5 for the longest carry sequence. (The actual value we have merely multiplied all numbers by 2i-I (as against our
of u4(, is 4.62.) previous convention), and such a “change of scale” has no effect
5.7. Having discussed the addition, we can now go on to the on addition (and subtraction). However, now the accumulator is
subtraction. It is convenient to discuss at this point our treatment an adder which allows errors that are integer multiples of 2i it
of negative numbers, and in order to do that right, it is desirable is an adder modulo 2j. We mention this because it is occasionally
to make some observations about the treatment of numbers in convenient to think in terms of a convention which places the
general. binary point at the right end of the digital aggregate. Then j = 40,
Our numbers are 40 digit aggregates, the left-most digit being our numbers are integers, and the accumulator is an adder modulo
the sign digit, and the other digits genuine binary digits, with 24”. We must emphasize, however, that all of this, i.e. all attribu-
positional values 2-l, 2-*, . . . , 2-39 (going from left to right). Our tions of values to j , are purely convention-Le. it is solely the
accumulator will, however, treat the sign digit, too, as a binary mathematician’s interpretation of the functioning of the machine
digit with the positional value 2O-at least when it functions as and not a physical feature of the machine. This convention will
an adder. For numbers between 0 and 1 this is clearly all right: necessitate measures that have to be made effective by actual
The left-most digit will then be 0, and if 0 at this place is taken physical features of the machine-i.e. the convention will become
to represent a + sign, then the number is correctly expressed with a physical and engineering reality only when we come to the
its sign and 39 binary digits. organs of multiplication.
Let us now consider one or more unrestricted 40 binary digit We will use the convention i = 1, i.e. our numbers lie in 0 and
numbers. The accumulator will add them, with the digit-adding 2 and the accumulator adds modulo 2.
and the carrying mechanisms functioning normally and identically This being so, these numbers between 0 and 2 can be used to
in all 40 positions. There is one reservation, however: If a carry represent all numbers modulo 2. Any real number x agrees modulo
originates in the left-most position, then it has nowhere to go from 2 with one and only one number X between 0 and 2-0r, to be
there (there being no further positions to the left) and is “lost”. <
quite precise: 0 5 X 2. Since our addition functions modulo 2,
This means, of course, that the addend and the augend, both we see that the accumulator may be used to represent and to add
numbers between 0 and 2, produced a sum exceeding 2, and the numbers modulo 2.
accumulator, being unable to express a digit with a positional This determines the representation of negative numbers: If
value 2l, which would now be necessary, omitted 2. That is, the <
x 0, then we have to find the unique integer multiple of 2, 2s
100 Part 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
<
(s = 1, 2 , . . .) such that 0 5 T 2 for F = x + 2s (Le. - 2s 5 0-and then adding 1 in the right-most position (and effecting
x < 2(1 - s)), and represent x by the digitalization of X all the carries that this may cause). (Note how the left-most
In this way, however, the sign digit character of the left-most digit, interpreted as a sign digit, gets inverted by this procedure
digit is lost: It can be 0 or 1 for both x 2 0 and x 0, hence < as it should be.)
0 in the left-most position can no longer be associated with the A subtraction x - y is therefore performed by the accumulator,
+ sign of x. This may seem a bad deficiency of the system, but Ac, as follows: Form x + y’, where y’ has a digit 0 or 1 where
it is easy to remedy-at least to an extent which suffices for our y has a digit 1 or 0, respectively, and then add 1in the right-most
purposes. This is done as follows: position. The last operation can be performed by injecting a carry
We usually work with numbers x between - I and 1-or, to into the right-most stage of Ac-since this stage can never receive
be quite precise: - 1 x <
1. Now the X with 0 5 X 2, which < a carry from any other source (there being no further positions
differs from x by an integer multiple of 2, behaves as follows: If to the right).
x 2 0, then 0 x < s <
1, hence X = x, and so 0 X 1, the left- 5.8. In the light of 5.7 multiplication requires special care,
< s <
most digit of X is 0. If x 0, then - 1 x 0, hence Z = x + 2, because here the entire modulo 2 procedure breaks down. Indeed,
<
and so 1 5 X 2, the left-most digit of li: is 1. Thus the left-most assume that we want to compute a product xy, and that we had
digit (of 3 is now a precise equivalent of the sign (of x): 0 corre- to change one of the factors, say x, by an integer multiple of 2,
sponds to + and 1 to - . say by 2 . Then the product (x + 2)y obtains, and this differs from
Summing up: the desired xy by 2y. 214, however, will not in general be an integer
The accumulator may be taken to represent all real numbers multiple of 2, since y is not in general an integer.
modulo 2, and it adds them modulo 2. If x lies between - 1 and We will therefore begin our discussion of the multiplication
<
1 (precisely: -1 5 x 1)-as it will in almost all of our uses of by eliminating all such difficulties, and assume that both factors
the machine-then the left-most digit represents the sign: 0 is + x, y lie between 0 and 1. Or, to be quite precise: 0 5 x 1, <
and 1 is - . 05y<l1.
Consider now a negative number x with -1 5 x 0. Put < To effect such a multiplication we first send the multiplier x
<
x = - y , 0 y 1. Then we digitalize x by representing it as into a register AR, the Arithmetic Register, which is essentially just
+
x + 2 = 2 - y = 1 (1 - y). That is, the left-most (sign) digit a set of 40 flip-flops whose characteristics will be discussed below.
of x = - y is, as it should be, 1; and the remaining 39 digits are We place the multiplicand y in the Selectron Register, SR (cf. 4.9)
those of the complement of y = -x = 1x1, i.e. those of 1 - y. and use the accumulator, Ac, to form and store the partial prod-
Thus we have been led to the familiar representation of negative ucts. We propose to multiply the entire multiplicand by the
numbers by complementation. successive digits of the multiplier in a serial fashion. There are,
The connection between the digits of x and those of -x is now of course, two possible ways this can be done: We can either start
easily formulated, for any x 5
0. Indeed, -x is equivalent to with the digit in the lowest position-position 2-39-0r in the
highest position-position 2-1-and proceed successively to the
39
left or right, respectively. There are a few advantages from our
- x} + 2-39 = (2:2-i
2 - x = ((21 - 2-39)
i=O
-
)+ 2-39
point of view in starting with the right-most digit of the multiplier.
We therefore describe that scheme.
(This digit index i = 1, . . . , 39 is related to our previous digit The multiplication takes place in 39 steps, which correspond
index h = 1, . . . , 40 by i = h - 1. Actually it is best to treat to the 39 (non-sign) digits of the multiplier x = 0, El,&, . . . ,
i as if its domain included the additional value i = 0-indeed [39 = (0&c2, . . . , &9), enumerated backwards: (39, . . . , &&.
i = 0 then corresponds to h = 1, i.e. to the sign digit. In any case Assume that the k - 1 first steps (k = 1, . . . , 39) have already
i expresses the positional value of the digit to which it refers more taken place, involving multiplication of the multiplicand y with
simply than h does: This positional value is 2-i = 2-‘h-1’. Note the k - 1 last digits of the multiplier: [39, . . . , &,; and that we
that if we had positioned the binary point more generally between are now at the kth step, involving multiplication with the kth last
i and i + 1, as discussed further above, this positional value would digit: [40-k. Assume furthermore, that Ac now contains the quantity
have been 2-(h-j). We now have, as pointed out previously, j = 1.) p,-,, the result of the k - 1first steps. [This is the (k - 1)st partial
Hence its digits obtain by subtracting every digit of x from 1-by product. For k = 1 clearly p , = 0.1 We now form 2p, = pk-l +
complementing each digit, i.e. by replacing 0 by 1 and 1 by &-,y, i.e.
Chapter 4 I Preliminary discussion of the logical design of an electronic computing instrument 101
the position of the binary point in xy is appropriately assigned. keeping “within range”.] However, it involves a loss of significant
Specifically: Let the binary point of xy be between digits 1 and digits, and the choice i = 1makes it unnecessary in multiplication.
I + 1. x has the binary point between digits i and i + 1, and its We will therefore make our choice i = 1, i.e. the positioning
sign digit is 0, hence its range is 0 5 x <
2i-l. Similarly y has the of the binary point immediately right of the sign digit, binding
range 0 5 y < <
ek-l,and xy has the range 0 5 xy 2z-1. Now the for all that follows.
ranges of x and y imply that the range of xy is necessarily 5.10. We now pass to the case where the multiplier x and
0I - xy <
21-l ek-l = 21+k-2.Hcnce 1 = i + k - 1.Thus it might the multiplicand y may have either sign +
or -, i.e. any combi-
seem that our actual positioning of the binary point-immediately nation of these signs.
right of the sign digit, i.e. i = k = 1-is still a mere convention. It would not do simply to extend the method of 5.8 to include
It is therefore important to realize that this is not so: The the sign digits of x and y also. Indeed, we assume - 1 5 x 1, <
choices of i and k actually correspond to very real, physical, engi- -1 s <
y 1, and the multiplication procedure in question is defi-
neering decisions. The reason for this is as follows: It is desirable nitely based on the 2 0 interpretations of x and y. Hence if x 0, <
to base the running of the machine on a sole, consistent mathe- then it is really using x + 2, and if y <
0, then it is really using
matical interpretation. It is therefore desirable that all arithmeti- <
y + 2. Hence for x 0, y 2 0 it forms
cal operations be performed with an identically conceived posi-
tioning of the binary point in Ac. Applying this principle to x and
(x + 2)y = xy + 2y
y gives i = k. Hence the position of the binary point for xy is given for x 2 0, y < 0 it forms
by j + k - 1 = 2j - 1. If this is to be the same as for x, and y,
then 21 - 1 = 1, i.e. i = 1 ensues-that is, our above positioning
x(y + 2) = xy + 2x
of the binary point immediately right of the sign digit. for x < 0, x < 0, it forms
There is one possible escape: To place into Ac not the left 39
digits of xy (not counting the sign digit 0), but the digits i to i + 38
(x + 2)(y + 2) = xy + 2x + 2y + 4
from the left. Indeed, in this way the position of the binary point or since things may be taken modulo 2, xy + 21 + 214. Hence
of xy will be (2j - 1) - ( j - 1) = j , the same as for x and y. correction terms -2y, -2x would be needed for x 0, y <
0, <
This procedure means that we drop the left i - 1 and right respectively (either or both).
40 + i digits of xy and hold the middle 39 in- Ac. Note-that posi- This would be a possible procedure, but there is one difficulty:
tioning of the binary point-means that x 2i-l, y < <
2i-l and xy As xy is formed, the 39 digits of the multiplier x are gradually
can only be used if xy <
21-l. Now the assumptions secure only lost from AR, to be replaced by the right 39 digits of xy. (Cf. the
xy < 223-2.Hence xy must be 2j-l times smaller than it might be. discussion at the end of 5.8.) Unless we are willing to build an
This is just the thing which would be secured by the vanishing additional 40 stage register to hold x, therefore, x will not be
of the left i - 1 digits that we had to drop from Ac, as shown available at the end of the multiplication. Hence we cannot use
above. it in the correction 2x of xy, which becomes necessary for y 0. <
If we wanted to use such a procedure, with those dropped left Thus the case x <0 can be handled along the above lines, but
i - 1 digits really existing, i.e. with j # 1, then we would have not the case y <0.
to make physical arrangements for their conservation elsewhere. It is nevertheless possible to develop an adequate procedure,
Also the general mathematical planning for the machine would and we now proceed to do this. Throughout this procedure we
be definitely complicated, due to the physical fact that Ac now will maintain the assumptions - 1 5 x <
1, - 1 5 y 1. We <
holds a rather arbitrarily picked middle stretch of 39 digits from proceed in several successive steps
among the 78 digits of xy. Alternatively, we might fail to make First: Assume that the corrections necessitated by the possi-
such arrangements, but this would necessitate to see to it in the bility of y < 0 have been taken care of. We permit therefore
mathematical planning of each problem, that all products turn y $ 0. We will consider the corrections necessitated by the possi-
out to be 2i-l times smaller than their a priori maxima. Such an <
bility of x 0.
observance is not at all impossible; indeed similar things are un- Let us disregard the sign digit of x, which is 1, i.e. replace it
avoidable for the other operations. [For example, with a factor by 0. Then x goes over into x’ = x - 1 and as - 1 x 0, this <
2 in addition (of positives) or subtraction (of opposite sign quanti- d will actually behave like (x - 1) + 2 = x + 1. Hence our
ties). Cf. also the remarks in the first part of 5.12, dealing with multiplication procedure will produce x‘y = (x + l ) y = xy + y,
Chapter 4 I Preliminary discussion of the logical design of an electronic computing instrument 103
and therefore a correction - y is needed at the end. (Note that sign - and the positional value 1. That is to say, -t3,-k in the
we did not use the sign digit of x in the conventional way. Had sign digit.
we done so, then a correction -2y would have been necessary, This, however, is inadmissible. Indeed, <39-k might cause carries
as seen above.) (if t39-k= l),which would have nowhere to go from the sign digit
5
We see therefore: Consider x 0. Perform first all necessary (there being no further positions to the left). This error is at its
steps for forming x'y(y 5 0), without yet reaching the sign digit origin an integer multiple of 2, but the 39 - k subsequent shifts
of x (i.e. treating x as if it were 2 0 ) . When the time arrives at reduce its positional value 239-k times. Hence it might contribute
which the digit toof x has to become effective-Le. immediately to the end result any integer multiple of 2-(38-kJ-and this is a
after became effective, after 39 shifts (cf. the discussion near genuine error.
the end of 5.8)-at which time Ac contains, say, jZl (this corresponds Let us therefore add 1 - &-, to the sign digit, i.e. 0 or 1 if
to the p,, of 5.8), then form & - k is 1 or 0, respectively. We will show further below, that with
this procedure there arise no carries of the inadmissible kind.
Taking this momentarily for granted, let us see what the total
effect is. We are correcting not by - x but by cz?l
2-i - x = 1 - - x. Hence a final correction by - 1 + 2-39 is
This is xy. (Note the difference between this last step, forming needed. Since this is done at the end (after all shifts), it may be
-
-
p , and the 39 preceding steps in 5.8, forming p,, p,, . . . , p39.) taken modulo 2. That is to say, we must add 1 + 2-39, i.e. 1 in
<
Second: Having disposed of the possibility x 0, we may now each of the two extreme positions. Adding 1 in the right-most
assume x 2 0. With this assumption we have to treat all y 0. position has the same effect as in the discussion at the end of 5.7
Since y 2 0 brings us back entirely to the familiar case of 5.8, we (dealing with the subtraction). It is equivalent to injecting a carry
need to consider the case y < 0 only. into the right-most stage of Ac. Adding 1 in the left-most position,
Let y' be the number that obtains by disregarding the sign digit i.e. to the sign digit, produces a 1, since that digit was necessarily
of y' which is 1, i.e. by replacing it by 0. Again y' acts not like 0. (Indeed, the last operation ended in a shift, thus freeing the
y - 1, but like ( y - 1) + 2 = y + 1. Hence the multiplication sign digit, cf. below.)
procedure of 5.8 will produce xy' = x(y + 1) = xy + x, and there- Fourth: Let us now consider the question of the carries that
fore a correction x is needed. (Note that, quite similarly to what may arise in the 39 steps of the process described above. In order
we saw in the first case above, the suppression of the sign digit to do this, let us describe the kth step (k = 1, . . . , 39), which
of y replaced the previously recognized correction -2x by the is a variant of the kth step described for a positive multiplication
present one - x.) As we observed earlier, this correction - x cannot in 5.8, in the same way in which we described the original kth
be applied at the end to the completed xy' since at that time x step loc. cit. That is to say, let us see what the formula (1) of 5.8
is no longer available. Hence we must apply the correction - x has become. It is clearly 2p, = p , - , + (1 - ( 4 0 - k ) + t4"-,y', i.e.
digitwise, subtracting every digit at the time when it is last found
in AR, and in a way that makes it effective with the proper posi-
tional value.
Third: Consider then x = 0, tl, t,, . . . , t39= (E1, t2. . . t3J.
The 39 digits c1 . . . t39
of x are lost in the course of the 39 shifts That is, we add 1 (y's sign digit) or y' (y without its sign digit),
of the multiplication procedure of 5.8, going from right to left. according to whether <4n-k = 0 or 1. Then p , should obtain from
Thus the operation No. k + 1 (k = 0, 1, . . . , 38, cf. 5.8) finds 2p, again by halving.
t39-kin the right-most stage of AR, uses it, and then loses it Now the addition of (2) produces no carries beyond the 2"
through its concluding right shift (of both Ac and AR). After this position, as we asserted earlier, for the same reason as the addition
step 39 - (k + 1) = 38 - k further steps, i.e. shifts follow, hence of (1)in 5.8. We can argue in the same way as there: 0 5 p , 1 <
before its own concluding shift there are still 39 - k shifts to come. is true for h = 0, and if it is true for h = k - 1, then (1) extends
Hence the positional values are 23y-k times higher than they will it to h = k also, since 0 5 Y ' ~5 1. Hence the sum in (2) is 2 0
be at the end. <39-k should appear at the end, in the correcting and <2, and no carries beyond the 2" position arise.
term - x , with the sign - and the positional value 2--(39-k3.Hence Fifth: In the three last observations we assumed y 0. Let <
we may inject it during the step k + 1 (before its shift) with the us now restore the full generality of y 5
0. We can then describe
104 Part 2 1 The instruction-set processor: main-line computers Section 1 1 Processors with one address per instruction
the equations (1) of 5.8 (valid for y 2 0) and (2) above (valid for tube to each binary counter which will detect whether an incom-
<
y 0) by a single formula, ing carry pulse would produce a carry and will, under this cir-
cumstance, pass the incoming carry pulse directly to the next
2pk = pk-1 + Yp stage. This circuit would require at most 39 rise times for the
YL [ = y’s sign digit
= y without its sign digit
for
for
540-k =0
540-k = 1 (3) completion of the carry. (Actually less, cf. above.)
At the present time the development of a static accumulator
Thus our verbal formulation of (2) applies here, too: We add y’s is being concluded. From preliminary tests it seems that it will
sign digit or y without its sign, according to whether <40-k = 0 add two numbers in about 5 psec and will shift right or left in
<
or 1. All p k are 2 0 and 1,and the addition of (3)never originates about 1 psec.
a carry beyond the 2 O position. p k obtains from 2 p , by a right We return now to the multiplication operation. In a static
shift, filling the sign digit with a 0. (Cf. however, Part 11, Table accumulator we order simultaneously an addition of the multi-
2 for another sort of right shift that is desirable in explicit form, plicand with sign deleted or the sign of the multiplicand (cf. 5.10)
i.e. as an order.) and a complete carry and then a shift for each of the 39 steps.
For y 2 0, xy is p,,, for y <
0, xy obtains from p,, by injecting In a dynamic accumulator of the second kind just described we
a carry into the right-most stage of Ac and by placing a 1 into order in succession an addition of the multiplicand with sign
the sign digit in Ac. deleted or the sign of the multiplicand, a complete carry, and a
Sixth: This procedure applies for x 2 0. For x 0 it should < shift for each of the 39 steps. In a dynamic accumulator of the
also b e applied, since it makes use of x’s non-sign digits only, but first kind we can avoid losing the time required for completing
at the end y must be subtracted from the result. the carry (in this case an average of 5 pulse times, cf. above) at
This method of binary multiplication will be illustrated in some each of the 39 steps. We order an addition by the multiplicand
examples in 5.15. with sign deleted or the sign of the multiplicand, then order one
5.11. To complete our discussion of the multiplicative organs pulsing of the carry gates, and finally shift the contents of both
of our machine we must return to a consideration of the types the digit counters and the carry flip-flops. This process is repeated
of accumulators mentioned in 5.5. The static accumulator operates 39 times. A simple arithmetical analysis which may be carried out
as an adder by simultaneously applying static voltages to its two in a later report, shows that at each one of these intermediate
inputs-one for each of the two numbers being added. When stages a single carry is adequate, and that a complete set of carries
steady-state operation is reached the total sum is formed complete is needed at the end only. We then carry out the complement
with all carries. For such an accumulator the above discussion is corrections, still without ever ordering a complete set of carry
substantially complete, except that it should be remarked that such operations. When all these corrections are completed and after
a circuit requires at most 39 rise times to complete a carry. round-off, described below, we then order the complete carry
Actually it is possible that the duration of these successive rises mentioned above.
is proportional to a lower power of 39 than the first one. 5.12. It is desirable at this point in the discussion to consider
Each stage of a dynamic accumulator consists of a binary rules for rounding-off to n-digits. In order to assess the charac-
counter for registering the digit and a flip-flop for temporary teristics of alternative possibilities for such properly, and in par-
storage of the carry. The counter receives a pulse if a 1 is to be ticular the role of the concept of “unbiasedness”, it is necessary
added in at that place; if this causes the counter to go from 1 to visualize the conditions under which rounding-off is needed.
to 0 a carry has occurred and hence the carry flip-flop will be Every number x that appears in the computing machine is an
set. It then remains to perform the carries. Each flip-flop has approximation of another number x’, which would have appeared
associated with it a gate, the output of which is connected to the if the calculation had been performed absolutely rigorously. The
next binary counter to the left. The carry is begun by pulsing all approximations to which we refer here are not those that are
carry gates. Now a carry may produce a carry, so that the process caused by the explicitly introduced approximations of the numeri-
needs to be repeated until all carry flip-flops register 0. This can cal-mathematical set-up, e.g. the replacement of a (continuous)
be detected by means of a circuit involving a sensing tube con- differential equation by a (discrete) difference equation. The effect
nected to each carry flip-flop. It was shown in 5.6 that, on the of such approximations should be evaluated mathematically by the
average, five pulse times (flip-flop reaction times) are required for person who plans the problem for the machine, and should not
the complete carry. An alternative scheme is to connect a gate be a direct concern of the machine. Indeed, it has to be handled
Chapter 4 I Preliminary discussion of the logical design of an electronic computing instrument 105
by a mathematician and cannot be handled by the machine, since are not really of a statistical nature, but are due to the peculiarities
its nature, complexity, and difficulty may be of any kind, depend- (from our point of view, inadequacies) of arithmetic and of digital
ing upon the problem under consideration. The approximations representation, and are therefore actually rigorously and uniquely
which concern us here are these: Even the elementary operations determined. It seems, however, in the present state of mathe-
of arithmetic, to which the mathematical approximation-formula- matical science, rather hopeless to try to deal with these matters
tion for the machine has to reduce the true (possibly transcenden- rigorously. Furthermore, a certain statistical approach, while not
tal) problem, are not rigorously executed by the machine. The truly justified, has always given adequate practical results. This
machine deals with numbers of n digits, where n, no matter how consists of treating those digits which one does not wish to use
large, has to be a fixed quantity. (We assumed for our machine individually in subsequent calculations as random variables with
40 digits, including the sign, i.e. n = 39.) Now the sum and differ- equiprobable digital values, and of treating any two such digits
ence of two n-digit numbers are again n-digit numbers, but their as statistically independent (unless this is patently false).
product and quotient (in general) are not. (They have, in general, These things being understood, we can now undertake to dis-
2n or w-digits, respectively.) Consequently, multiplication and cuss round-off procedures, realizing that we will have to apply
division must unavoidably be replaced by the machine by two them to the multiplication and to the division.
different operations which must produce n-digits under all condi- Let x = (.t1. . . t,) and y = (.ql . . . q,) be unbiased approxi-
tions, and which, subject to this limitation, should lie as close as mations of x’ and y’. Then the “true” xy = (.tl . . . . . . t2,)
possible to the results of the true multiplication and division. One and the “true” x/y = (.a1. . . W,W,+~W,+~ . . . ) (this goes on ad
might call them pseudo-multiplication and pseudo-division; how- infinitum!) are approximations of x’y’ and x’/y’. Before we discuss
ever, the accepted nomenclature terms them as multiplication and how to round them off, we must know whether the “true” xy and
division with round-off. (We are now creating the impression that x/y are themselves unbiased approximations of x’y’ and x’/y’. xy
addition and subtraction are entirely free of such shortcomings. is indeed an unbiased approximation of x’y’, i.e. the mean of xy
This is only true inasmuch as they do not create new digits to is the mean of x( = x’) times the mean of y( = y’), owing to the
the right, as multiplication and division do. However, they can independence assumption which we made above. However, if x
create new digits to the left, i.e. cause the numbers to “grow out and y are closely correlated, e.g. for x = y, i.e. for squaring, there
of range”. This complication, which is, of course, well known, is is a bias. It is of the order of the mean square of x - x’, i.e. of
normally met by the planner, by mathematical arrangements and the variance of x. Since x has n digits, this variance is about 1/22n
estimates to keep the numbers “within range”. Since we propose (If the digits of x’, beyond n are entirely unknown, then our original
to have our machine deal with numbers between - 1 and 1, assumptions give the variance 1/12.22n.)Next, x/y can be written
multiplication can never cause them to “grow out of range”. as x.y-l, and since we have already discussed the bias of the
Division, of course, might cause this complication, too. The plan- product, it suffices now to consider the reciprocal y-’. Now if
ner must therefore see to it that in every division the absolute y is an unbiased estimate of y’, then y-l is not an unbiased estimate
value of the divisor exceeds that of the dividend.) of y’-’, i.e. the mean of y’s reciprocal is not the reciprocal of y’s
Thus the round-off is intended to produce satisfactory n-digit mean. The difference is - Y - ~ times the variance of y, i.e. it is
approximations for the product xy and the quotient x/y of two of essentially the same order as the bias found above in the case
n-digit numbers. Two things are wanted of the round-off: (1) The of squaring.
approximation should be good, i.e. its variance from the “true” It follows from all this that it is futile to attempt to avoid biases
xy or x/y should be as small as practicable; (2) The approximation of the order of magnitude 1/22n or less. (The factor Y12 above may
should be unbiased, i.e. its mean should be equal to the “true” seem to be changing the order of magnitude in question. However,
xy or x/y. it is really the square root of the variance which matters and
These desiderata must, however, be considered in conjunction -
d(Y12 0.3 is a moderate factor.) Since we propose to use n = 39,
with some further comments. Specifically: (a) x and y themselves therefore 1/278(-3 x is the critical case. Note that this
are likely to be the results of similar round-offs, directly or in- possible bias level is l/23y(-2 x 10-12) times our last significant
directly inherent, i.e. x and y themselves should be viewed as digit. Hence we will look for round-off rules to n digits for
unbiased n-digit approximations of “true” x’ and y’ values; (b) by the “true” xy = (.tl . . . t,[,+l . . . t2,) and x/y = (.wl . . .
talking of “variances” and “means” we are introducing statistical W,W,+~W,+~ . . . ). The desideratum (1) which we formulated
concepts. Now the approximations which we are here considering previously, that the variance should be small, is still valid. The
106 Part 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
multiplication time becomes much longer in a machine with one compares the signs of rn and d; if they are of the same sign,
floating binary since one must perform shifts and round-offs as well the dividend is repeatedly subtracted from the remainder until
as additions. It would seem reasonable in this case to place the the signs become opposite; if they are opposite, the dividend is
time of an addition as about y3 to '/z of a multiplication. At this repeatedly added to the remainder until the signs again become
rate it is clear that the number of additions in a problem is as like. In this scheme the digits that may occur in a given place
important a factor in the total solution time as are the number in the quotient are evidently kl,k 2 , . . . , k ( m - l),the posi-
of multiplications. (For further details concerning the floating tive digits corresponding to subtractions and the negative ones to
binary point, cf. 6.6.7.) additions of the dividend to the remainder.
5.14. We conclude our discussion of the arithmetic unit with Thus we have 2(m - 1) digits instead of the usual m digits.
a description of our method for handling the division operation. In the decimal system this would mean 18 digits instead of 10.
To perform a division we wish to store the dividend in SR, the This is a redundant notation. The standard form of the quotient
partial remainder in Ac and the partial quotient in AR. Before must therefore be restored by subtracting from the aggregate of
proceeding further let us consider the so-called restoring and its positive digits the aggregate of its negative digits. This requires
non-restoring methods of division. In order to be able to make carry facilities in the place where the quotient is stored.
certain comparisons, we will do this for a general base m = 2, We propose to store the quotient in AR, which has no carry
3, . . . . facilities. Hence we could not use this scheme if we were to
Assume for the moment that divisor and dividend are both operate in the decimal system.
positive. The ordinary process of division consists of subtracting The same objection applies to any base m for which the digital
from the partial remainder (at the very beginning of the process representation in question is redundant-i.e. when 2(m - 1) m. >
this is, of course, the dividend) the divisor, repeating this until >
Now 2(m - 1) m whenever m >
2, but 2(m - 1) = m for
the former becomes smaller than the latter. For any fixed positional m = 2. Hence, with the use of a register which we have so far
value in the quotient in a well-conducted division this need be contemplated, this division scheme is certainly excluded from the
done at most m - 1times. If, after precisely k = 0,1, . . . , m - 1 start unless the binary system is used.
repetitions of this step, the partial remainder has indeed become Let us now investigate the situation in the binary system. We
less than the divisor, then the digit k is put in the quotient (at inquire if it is possible to obtain a quasi-quotient by using the
the position under consideration), the partial remainder is shifted non-restoring scheme and by using the digits 1, 0 instead of 1,
one place to the left, and the whole process is repeated for the -1. Or rather we have to ask this question: Does this quasi-
next position, etc. Note that the above comparison of sizes is only quotient bear a simple relationship to the true quotient?
needed at k = 0, 1, . . . , m - 2, i.e. before step 1 and after steps Let us momentarily assume this question can be answered
1, . . . , m - 2. If the value k = m - 1, Le. the point after step affirmatively and describe the division procedure. We store the
m - I , is at all reached in a well-conducted division, then it may divisor initially in Ac, the dividend in SR and wish to form the
be taken for granted without any test, that the partial remainder quotient in AR. We now either add or subtract the contents of
has become smaller than the divisor, and the operations on the SR into Ac, according to whether the signs in Ac and SR are
position under consideration can therefore be concluded. (In the opposite or the same, and insert correspondingly a 0 or 1 in the
binary system, m = 2, there is thus only one step, and only one right-hand place of AR. We then shift both Ac and AR one place
comparison of sizes, before this step.) In this way this scheme, left, with electronic shifters that are parts of these two aggregates.
known as the restoring scheme, requires a maximum of m - 1 com- At this point we interrupt the discussion to note this: multipli-
parisons and utilizes the digits 0, 1, . . . , m - 1 in each place in the cation required an ability to shift right in'both Ac and AR (cf.
quotient. The difficulty of this scheme for machine purposes is that 5.8). We have now found that division similarly requires an ability
usually the only economical method for comparing two numbers to shift left in both Ac and AR. Hence both organs must be able to
as to size is to subtract one from the other. If the partial remainder shift both ways electronically. Since these abilities have to be
r, were less than the dividend d, one would then have to add d present for the implicit needs of multiplication and division, it is just
back into r, - d in order to restore the remainder. Thus at every as well to make use of them explicitly in the form of explicit orders.
stage an unnecessary operation would be performed. A more sym- These are the orders 20,21 of Table 1,and of Table 2, Part 11. It will,
metrical scheme is obtained by not restoring. In this method (from however, turn out to be convenient to arrange some details in the
here on we need not assume the positivity of divisor and dividend) shifts, when they occur explicitly under the control of those orders,
108 Pari 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
differently from when they occur implicitly under the control of a I x I < I y I), and if it is true for h = k - 1, then (4)extends it to
multiplication or a division. (For these things, cf. the discussion of h = k also, since rk-l and 0y have opposite signs. The last point
the shifts near the end of 5.8 and in the third remark below on one may be elaborated a little further: because of the opposite signs
hand, and in the third remark in 7.2, Part 11, on the other hand.)
Let us now resume the discussion of the division. The process
described above will have to be repeated as many times as the Hence we have always I rk 1 <
I y I ,and therefore afortiori I rk I 1, <
number of quotient digits that we consider appropriate to produce i.e. -1 < <
rk 1.
in this way. This is likely to be 39 or 40; we will determine the Consequently in equation (4)one summand is necessarily - 2 , >
exact number further below. <2, the other is 21, <1, and the sum is >-1, <l. Hence we
In this process we formed digits .$ = 0 or 1for the quotient, when may carry out the operations of (4)modulo 2, disregarding any
the digit should actually have been ti = - 1 or 1, with 5; = 2[: - 1. possibilities of carries beyond the 2 O position, and the resulting
Thus we have a difference between the true quotient z (based on rk will be automatically correct (in the range >-1, <1).
the digits ti)and the quasi-quotient z' (based on the digits ti),but Third: Note however that the sign of rk-l, which plays an
at the same time a one-to-one connection. It would be easy to important role in (4)above, is only then correctly determinable
establish the algebraical expression for this connection between z' from the sign digit, if the number from which it is derived is 2 - 1,
and z directly, but it seems better to do this as part of a discussion < l . (Cf. the discussion in 5.7.) This requirement however is met,
which clarifies all other questions connected with the process of as we saw above, by rk-l, but not necessarily by 2 r k p I .Hence the
division at the same time. sign of rk-l (Le. its sign digit) as required by (4),must be sensed
We first make some general remarks: before rk-l is doubled.
First: Let x be the dividend and y the divisor. We assume, of This being understood, the doubling of rk-l may be performed
course, - 1 x <
1, - 1 5 y <
1. It will be found that our pres- as a simple left shift, in which the left-most digit (the sign digit)
ent process of division is entirely unaffected by the signs of x and is allowed to be lost-this corresponds to the disregarding of
y, hence no further restrictions on that score are required. carries beyond the 2 O position, which we recognized above as being
On the other hand, the quotient z = x/y must also fulfil permissible in (4).(Cf. however, Part 11, Table 2, for another sort
- 15 z <
1. It seems somewhat simpler although this is by no of left shift that is desirable in explicit form, i.e. as an order.)
means necessary, to exclude for the purposes of this discussion Fourth: Consider now the precise implication of (4) above.
z = - 1, and to demand I z I <
1. This means in terms of the 5; = 1 or 0 corresponds to LE = - or +,
respectively. Hence
dividend x and the divisor y that we exclude x = - y and assume (4)may be written
1x1 <Y.
rk = 2rk-1 + (1 - 2t;)y
Second: The division takes place in n steps, which correspond
to the n digits ti, . . . , $.; of the pseudo-quotient z', n being yet to i.e.
be determined (presumably 39 or 40). Assume that the k - 1 first - 2-(k- l)rk-l +
2-kr (2-k - 2-(k-1)[!)
steps (k = 1, . . . , n) have already taken place, having produced k- k y
Fifth: If we do not wish to get involved in more complicated however, to re-emphasize two very distinctive features which it
round-off procedures which exceed the immediate capacity of the possesses:
only available adder Ac, then the above result suggests that we First: This division scheme applies equally for any combina-
should put n + 1 = 40, n = 39. The ti, . . . , ti9are then 39 digits tions of signs of divisor and dividend. This is a characteristic of
of the quotient, including the sign digit, but not including the the non-restoring division schemes, but it is not the case for any
right-most digit. simple known multiplication scheme. It will be remembered, in
The right-most digit is taken care of by placing a 1 into the particular, that our multiplication procedure of 5.9 had to contain
right-most stage of Ac. special correcting steps for the cases where either or both factors
At this point an additional argument in favor of the procedure are negative.
that we have adopted here becomes apparent. The procedure Second: This division scheme is practicable in the binary sys-
coincides (without a need for any further corrections) with the tem only; it has no analog for any other base.
second round-off procedure that we discussed in 5.12. This method of binary division will be illustrated on some
There remains the term -1. Since this applies to the final examples in 5.15.
result, and no right shifts are to follow, carries which might go 5.15. We give below some illustrative examples of the opera-
beyond the 2O position may be disregarded. Hence this amounts tions of binary arithmetic which were discussed in the preceding
simply to changing the sign digit of the quotient 3:replacing 0 sections.
or 1 by 1 or 0, respectively. Although it presented no difficulties or ambiguities, it seems
This concludes our discussion of the division scheme. We wish, best to begin with an example of addition.
0101
0101
1
101111
Correction lt 1 1
1.110111
Correction 2$ (Complement of the multiplicand). 0.010
1
0.001 111
t For the sign of the multiplicand $ For the sign of the multiplier. 5 Quotient digit
Chapter 4 I Preliminary discussion of the logical design of an electronic computing instrument 111
Note that this deviates byYG4,i.e. by one unit of the right-most nism is required which will, on receiving a twelve-digit binary
position, from the correct result -"/. This is a consequence of number, select the corresponding memory location.
our round-off rule, which forces the right-most digit to be 1 under The type of circuit we propose to use for this purpose is known
all conditions. This occasionally produces results with unfamiliar as a decoding or many-one function table. It has been developed
and even annoying aspects (e.g. when quotients like 0:y or y:y in various forms independently by J. Rajchman [Rajchman, 19431
are formed), but it is nevertheless unobjectionable and self- and P. Crawford [Crawford, Is??]. It consists of n flip-flops which
consistent on the basis of our general principles. register an n-digit binary number. It also has a maximum of 2n
output wires. The flip-flops activate a matrix in which the inter-
connections between input and output wires are made in such a
6. The control way that one and only one of 2" output wires is selected (Le. has
6.1. It has already been stated that the computer will contain a positive voltage applied to it). These interconnections may be
an organ, called the control, which can automatically execute the established by means of resistors or by means of non-linear ele-
orders stored in the Selectrons. Actually, for a reason stated in ments (such as diodes or rectifiers); all these various methods are
6.3, the orders for this computer are less than half as long as a under investigation. The Selectron is so designed that four such
forty binary digit number, and hence the orders are stored in the function table switches are required, each with a three digit entry
Selectron memory in pairs. and eight (23)outputs. Four sets of eight wires each are brought
Let us consider the routine that the control performs in direct- out of the Selectron for switching purposes, and a particular loca-
ing a computation. The control must know the location in the tion is selected by making one wire positive with respect to the
Selectron memory of the pair of orders to be executed. It must remainder. Since all forty Selectrons are switched in parallel, these
direct the Selectrons to transmit this pair of orders to the Selectron four sets of wires may be connected directly to the four function
register and then to itself. It must then direct the execution of table outputs.
the operation specified in the first of the two orders. Among these 6.3. Since most computer operations involve at least one
orders we can immediately describe two major types: An order number located in the Selectron memory, it is reasonable to adopt
of the first type begins by causing the transfer of the number, a code in which twelve binary digits of every order are assigned
which is stored at a specified memory location, from the Selectrons to the specification of a Selectron location. In those orders which
to the Selectron register. Next, it causes the arithmetical unit to do not require a number to be taken out of or into the Selectrons
perform some arithmetical operations on this number (usually in these digit positions will not be used.
conjunction with another number which is already in the arith- Though it has not been definitely decided how many operations
metical unit), and to retain the resulting number in the arith- will be built into the computer (Le. how many different orders
metical unit. The second type order causes the transfer of the the control must be able to understand), it will be seen presently
number, which is held in the arithmetical unit, into the Selectron that there will probably be more than Z5 but certainly less than
register, and from there to a specified memory location in the 26. For this reason it is feasible to assign 6 binary digits for the
Selectrons. (It may also be that this latter operation will permit order code. It thus turns out that each order must contain eighteen
a direct transfer from the arithmetical unit into the Selectrons.) binary digits, the first twelve identifying a memory location and
An additional type of order consists of the transfer orders of 3.5. the remaining six specifying an operation. It can now be explained
Further orders control the inputs and the outputs of the machine. why orders are stored in the memory in pairs. Since the same
The process described at the beginning of this paragraph must memory organ is to be used in this computer for both orders and
then be repeated with the second order of the order pair. This numbers, it is efficient to make the length of each about equivalent.
entire routine is repeated until the end of the problem. But numbers of eighteen binary digits would not be sufficiently
6.2. It is clear from what has just been stated that the control accurate for problems which this machine will solve. Rather, an
must have a means of switching to a specified location in the accuracy of at least or 2 F 3 is required. Hence it is preferable
Selectron memory, for withdrawing both numbers for the compu- to make the numbers long enough to accommodate two orders.
tation and pairs of orders. Since the Selectron memory (as tenta- As we pointed out in 2.3, and used in 4.2 et seq. and 5.7 et
tively planned) will hold 212 = 4,096 forty-digit words (a word is seq., our numbers will actually have 40 binary digits each. This
either a number or a pair of orders), a twelve-digit binary number allows 20 binary digits for each order, i.e. the 12 digits that specify
suffices to identify a memory location. Hence a switching mecha- a memory location, and 8 more digits specifying the nature of the
112 Part 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
operation (instead of the minimum of 6 referred to above). It is is provided for it. The storage means is called the Control Register,
convenient, as will be seen in 6.8.2. and Chapter 9, Part 11, to CR, and consists of 20 (or possibly 18) flip-flops, capable of re-
group these binary digits into tetrads, groups of 4 binary digits. ceiving a number from SR and transmitting a number to FR.
Hence a whole word consists of 10 tetrads, a half word or order As already stated (til),the control must know the location of
of 5 tetrads, and of these 3 specify a memory location and the the pair of orders it is to get from the Selectron memory. Normally
remaining 2 specify the nature of the operation. Outside the this location will be the one following the location of the two
machine each tetrad can be expressed by a base 16 digit. (The orders just executed. That is, until it receives an order to do
base 16 digits are best designated by symbols of the 10 decimal otherwise, the control will take its orders from the Selectrons in
digits 0 to 9, and 6 additional symbols, e.g. the letters a to f. Cf. sequence. Hence the order location may be remembered in a
Chapter 9, Part 11.) These 16 characters should appear in the twelve stage binary counter (one capable of counting 212)to which
typing for and the printing from the machine. (For further details one unit is added whenever a pair of orders is executed. This
of these arrangements, cf. Zoc. cit. above.) counter is called the Control Counter, CC.
The specification of the nature of the operation that is involved The details of the process of obtaining a pair of orders from
in an order occurs in binary form, so that another many-one or the Selectron are thus as follows: The contents of CC are copied
decoding function is required to decode the order. This function into FR, the proper Selectron location is selected, and the contents
table will have six input flip-flops (the two remaining digits of the of the Selectrons are transferred to SR. FR is then cleared, and
order are not needed). Since there will not be 64 different orders, the contents of SR are transferred to it and CR. CC is advanced
not all 64 outputs need be provided. However, it is perhaps by one unit so the control will be prepared to select the next pair
worthwhile to connect the outputs corresponding to unused order of orders from the memory. (There is, however, an exception from
possibilities to a checking circuit which will give an indication this last rule for the so-called transfer orders, cf. 3.5. This may
whenever a code word unintelligible to the control is received feed CC in a different manner, cf. the next paragraph below.) First
in the input flip-flops. the order in FR is executed and then the order in CR is transferred
The function table just described energizes a different output to FR and executed. It should be noted that all these operations
wire for each different code operation. As will be shown later, are directed by the control itself-not only the operations specified
many of the steps involved in executing different orders overlap. in the control words sent to FR, but also the automatic operations
(For example, addition, multiplication, division, and going from required to get the correct orders there.
the Selectrons to the register all include transferring a number from Since the method by means of which the control takes order
the Selectrons to the Selectron register.) For this reason it is pairs in sequence from the memory has been described, it only
perhaps desirable to have an additional set of control wires, each remains to consider how the control shifts itself from one sequence
of which is activated by any particular combination of different of control orders to another in accordance with the operations
code digits. These may be obtained by taking the output wires described in 3.5. The execution of these operations is relatively
of the many-one function table and using them to operate tubes simple. An order calling for one of these operations contains the
which will in turn operate a one-many (or coding) function table. twelve digit specification of the position to which the control is
Such a function table consists of a matrix as before, but in this to be switched, and these digits will appear in the left-hand twelve
case only one of the input wires are activated. This particular table flip-flops of FR. All that is required to shift the control is to transfer
may be referred to as the recoding function table. the contents of these flip-flops to CC. When the control goes to
The twelve flip-flops operating the four function tables used the Selectrons for the next pair of orders it will then go to the
in selecting a Selectron position, and the six flip-flops operating location specified by the number so transferred. In the case of the
the function table used for decoding the order, are referred to as unconditional transfer, the transfer is made automatically; in the
the Function Table Register, FR. case of the conditional transfer it is made only if the sign counter
6.4. Let us consider next the process of transferring a pair of the Accumulator registers zero.
of orders from the Selectrons to the control. These orders first go 6.5. In this report we will discuss only the general method
into SR. The order which is to be used next may be transferred by means of which the control will execute specific orders, leaving
directly into FR. The second order of the pair must be removed the details until later. It has already been explained (5.5)that when
from SR (since SR may be used when the first order is executed), a circuit is to be designed to accomplish a particular elementary
but cannot as yet be placed in FR. Hence a temporary storage operation (such as addition), a choice must be made between a
Chapter 4 I Preliminary discussion of the logical design of an electronic computing instrument 113
static type and a dynamic type circuit. When the design of the placed in parallel rows so that one can tell a t a glance (after the
control is considered, this same choice arises. The function of the machine has been stopped) where the discrepancies are.
control is to direct a sequence of operations which take place in The merits of any checking system must be weighed against
the various circuits of the computer (including the circuits of the its cost. Building two machines may appear to be expensive, but
control itself). Consider what is involved in directing an operation. since most of the cost of a scientific computer lies in development
The control must signal for the operation to begin, it must supply rather than production, this consideration is not so important as
whatever signals are required to specify that particular operation, it might seem. Experience may show that for most problems the
and it must in some way know when the operation has been two machines need not be operated in parallel. Indeed, in most
completed so that it may start the succeeding operation. Hence cases purely mathematical, external checks are possible: Smooth-
the control circuits must be capable of timing the operations. It ness of the results, behavior of differences of various types, validity
should be noted that timing is required whether the circuit per- of suitable identities, redundant calculations, etc. All of these
forming the operation is static or dynamic. In the case of a static methods are usually adequate to disclose the presence or absence
type circuit the control must supply static control signals for a of error in toto; their drawback is only that they may not allow
period of time sufficient to allow the output voltages to reach the the detailed diagnosing and locating of errors at all or with ease.
steady-state condition. In the case of a dynamic type circuit the When a problem is run for the first time, so that it requires special
control must send various pulses at proper intervals to this circuit. care, or when an error is known to be present, and has to be
If all circuits of a computer are static in character, the control located-only then will it be necessary as a rule, to use both
timing circuits may likewise be static, and no pulses are needed machines in parallel. Thus they can be used as separate machines
in the system. However, though some of the circuits of the com- most of the time. The essential feature of such a method of check-
puter we are planning will be static, they will probably not all ing lies in the fact that it checks the computation at every point
be so, and hence pulses as well as static signals must be supplied (and hence detects transient errors as well as steady-state ones)
by the control to the rest of the computer. There are many advan- and stops the machine when the error occurs so that the process
tages in deriving these pulses from a central source, called the of localizing the fault is greatly simplified. These advantages are
clock. The timing may then be done either by means of counters only partially gained by duplicating the arithmetic part of the
counting clock pulses or by means of electrical delay lines (an RC computer, or by following one operation with the complement
circuit is here regarded as a simple delay line). Since the timing operation (multiplication by division, etc.), since this fails to check
of the entire computer is governed by a single pulse source, the either the memory or the control (which is the most complicated,
computer circuits will be said to operate as a synchronized system. though not the largest, part of the machine).
The clock plays an important role both in detecting and in The method of localizing errors, either with or without a dupli-
localizing the errors made by the computer. One method of check- cate machine, needs further discussion. It is planned to design all
ing which is under consideration is that of having two identical the circuits (including those of the control) of the computer so
computers which operate in parallel and automatically compare that if the clock is stopped between pulses the computer will
each other’s results. Both machines would be controlled by the retain all its information in flip-flops so that the computation may
same clock, so they would operate in absolute synchronism. It is proceed unaltered when the clock is started again. This principle
not necessary to compare every flip-flop of one machine with the has already demonstrated its usefulness in the ENIAC. This makes
corresponding flip-flop of the other. Since all numbers and control it possible for the machine to compute with the clock operating
words pass through either the Selectron register or the accumu- at any speed below a certain maximum, as long as the clock gives
lator soon before or soon after they are used, it suffices to check out pulses of constant shape regardless of the spacing between
the flip-flops of the Selectron register and the flip-flops of the pulses. In particular, the spacing between pulses may be made
accumulator which hold the number registered there; in fact, it indefinitely large. The clock will be provided with a mode of
seems possible to check the accumulator only (cf. the end of 6.6.2). operation in which it will emit a single pulse whenever instructed
The checking circuit would stop the clock whenever a difference to do so by the operator. 13y means of this, the operator can cause
appeared, or stop the machine in a more direct manner if an the machine to go through an operation step by step, checking
asynchronous system is used. Every flip-flop of each computer will the results by means of the indicating-lamps connected to the
be located at a convenient place. In fact, all neons will be located flip-flops. It will be noted that this design principle does not
on one panel, the corresponding neons of the two machines being exclude the use of delay lines to obtain delays as long as these
114 Part 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
are only used to time the constituent operations of a single step, These facts would seem of themselves to justify adding the opera-
and have no part in determining the machine’s operating repeti- tions in question: plus and minus the absolute value. But it should
tion rate. Timing coincidences by means of delay lines is excluded be noted that these operations can be programmed out of the other
since this requires a constant pulse rate. operations of Table 1 with correspondingly few orders (three for
6.6. The orders which the control understands may be divided absolute value and five for minus absolute value), so that some
into two groups: Those that specify operations which are per- further justification for building them in is required. The absolute
formed within the computer and those that specify operations value order is frequently in connection with the orders L and R
involved in getting data into and out of the computer. At the (see 6.6.7), while the minus absolute value order makes the detec-
present time the internal operations are more completely planned tion of a zero very simple by merely detecting the sign of - J N J.
than the input and output operations, and hence they will be (If - JNI 2 0, then N = 0.)
discussed more in detail than the latter (which are treated briefly 6.6.2. The operation of S(x) .+R involves the following two
in 6.8). The internal operations which have been tentatively steps:
adopted are listed in Table 1. It has already been pointed out that First: Clear SR, and transfer S(x) to it.
not all of these operations are logically basic, but that many can Second: Clear AR and add the number in the Selectron register
be programmed by means of others. In the case of some of these into it. The operation of R + Ac merits more detailed discussion,
operations the reasons for building them into the control have since there are alternative ways of removing numbers from AR.
already been given. In this section we will give reasons for building Such numbers could be taken directly to the Selectrons as well
the other operations into the control and will explain in the case as into Ac, and they could be transferred to Ac in parallel, in
of each operation what the control must do in order to exe- sequence, or in sequence parallel. It should be recalled that while
cute it. most of the numbers that go into AR have come from the Selec-
In order to have the precise mathematical meaning of the trons and thus need not be returned to them, the result of a
symbols which are introduced in what follows clearly in mind, division and the right-hand 39 digits of a product appear in AR.
the reader should consult the table at the end of the report for Hence while an operation for withdrawing a number from AR is
each new symbol, in addition to the explanations given in the text. required, it is relatively infrequent and therefore need not be
6.6.1. Throughout what follows S(x) will denote the memory particularly fast. We are therefore considering the possibility of
location No. x in the Selectron. Accordingly the x which appears transferring at least partially in sequence and of using the shifting
in S(x) is a 12-digit binary, in the sense of 6.2. The eight addition properties of Ac and of AR for this. Transferring the number to
operations [S(x)+ Ac+, S(x)+ Ac--, S(x)+ Ah+, S(x)-t Ah-, the Selectron via the accumulator is also desirable if the dual
S(X)+ Ac + M, S(X)-+ Ac - M, S(X) + Ah + M, S(X)+ Ah - MI machine method of checking is employed, for it means that even
involves the following possible four steps: if numbers are only checked in their transit through the accumu-
First: Clear SR and transfer into it the number at S(x). lator, nevertheless every number going into the Selectron is
Second: Clear Ac if the order contains the symbol c; do not checked before being placed there.
clear Ac if the order contains the symbol h. 6.6.3. The operation S(x) x R --f Ac involves the following six
Third: Add the number in SR or its negative (Le. in our present steps:
system its complement with respect to 2l) into Ac. If the order does First: Clear SR and transfer S(x) (the multiplicand) into it.
not contain the symbol M, use the number in SR or its negative Second: Thirty-nine steps, each of which consist of the two
according to whether the order contains the symbol + or - . If the following parts: (a) Add (or rather shift) the sign digit of SR into
order contains the symbol M, use the number in SR or its negative the partial product in Ac, or add all but the sign digit of SR into
according to whether the sign of the number in SR and the symbol the partial product in Ac-depending upon whether the right-most
+ or - in the order do or do not agree. digit in AR is 0 or 1-and effect the appropriate carries. (b) Shift
Fourth: Perform a complete carry. Building the last four addi- Ac and AR to the right, fill the sign digit of Ac with a 0 and the
tion operations (those containing the symbol M) into the control digit of AR immediately right of the sign digit (positional value
is fairly simple: It calls only for one extra comparison (of the sign 2-l) with the previously right-most digit of Ac. (There are ways
in SR and the + or - in the order, cf. the third step above), and to save time by merging these two operations when the right-most
it requires, therefore, only a few tubes more than required for the digit in Ar is 0, but we will not discuss them here more fully.)
first four addition operations (those not containing the symbol M). Third: If the sign digit in SR is 1 (Le. -), then inject a carry
Chapter 4 I Preliminary discussion of the logical design of an electronic computing instrument 115
into the right-most stage of Ac and place a 1 into the sign digit depending on whether there was disagreement or agreement in
of Ac. (a). (c) Add or subtract the contents of SR into Ac, depending on
Fourth: If the original sign digit of AR is 1 (Le. -), then sub- the same alternative as above.
tract the contents of SR from Ac. Fourth: Fill the right-most digit of AR with a 1, and change
Fifth: If a partial carry system was employed in the main its sign digit.
process, then a complete carry is necessary at the end. For the purpose of timing the 39 steps involved in division a
Sixth: The appropriate round-off must be effected. (Cf. Chapter six-stage counter (capable of counting to 26 = 64) will be built
9, Part 11, for details, where it is also explained how the sign digit into the control. This same counter will also be used for timing
of the Arithmetic register is treated as part of the round-off the 39 steps of multiplication, and possibly for controlling Ac when
process.) a number is being transferred between it and a tape in either
It will be noted that since any number held in Ac at the begin- direction (see 6.8.).
ning of the process is gradually shifted into AR, it is impossible 6.6.5. The three substitution operations [At -+ S(x), Ap -+ S(x),
to accumulate sums of products in Ac without storing the various and Ap' + S(x)] involve transferring all or part of the number held
products temporarily in the Selectrons. While this is undoubtedly in Ac into the Selectrons. This will be done by means of gate tubes
a disadvantage, it cannot be eliminated without constructing a.n connected to the registering flip-flops of Ac. Forty such tubes are
extra register, and this does not at this moment seem worthwhile. needed for the total substitutions, At + S(x). The partial substitu-
On the other hand, saving the right-hand 39 digits of the answer tion Ap -+ S(x) and Ap' -+ S(x) requires that the left-hand twelve
is accomplished with very little extra equipment, since it means digits of the number held in Ac be substituted in the proper places
connecting the 2-39 stage of Ac to the 2-1 stage of AR during the in the left-hand and right-hand orders, respectively. This may be
shift operation. The advantage of saving these digits is that it done by means of extra gate tubes, or by shifting the number in
simplifies the handling of numbers of any number of digits in the Ac and using the gate tubes required for At -+ S(x). (This scheme
computer (cf. the last part of 5.12). Any number of 39k binary needs some additional elaboration, when the order directing and
digits (where k is an integer) and sign can be divided into k parts, the order suffering the substitution are the two successive halves
each part being placed in a separate Selectron position. Addition of the same word; i.e. when the latter is already in FR at the time
and subtraction of such numbers may be programmed out of a when the former becomes operative in CR, so that the substitution
series of additions or subtractions of the 39-digit parts, the carry- effected in the Selectrons comes too late to alter the order which
over being programmed by means of Cc+ S(x) and Cc'+ S(x) has already reached CR, to become operative at the next step in
operations. (If the 2" stage of Ac registers negative after the addi- FR. There are various ways to take care of this complication, either
tion of two 39-digit parts, a carry-over has taken place and hence by some additional equipment or by appropriate prescriptions in
2-39 must be added to the sum of the next parts.) A similar proce- coding. We will not discuss them here in more detail, since the
dure may be followed in multiplication if all 78 digits of the decisions in this respect are still open.)
product of the two 39-digit parts are kept, as is planned. (For the The importance of the partial substitution operations can
details, cf. Chapter 9, Part 11.) Since it would greatly complicate hardly be overestimated. It has already been pointed out (3.3) that
the computer to make provision for holding and using a 78 digit they allow the computer to perform operations it could not other-
dividend, it is planned to program 39k digit division in one of the wise conveniently perform, such as making use of a function table
ways described at the end of 5.12. stored in the Selectron memory. Furthermore, these operations
6.6.4. The operation of division Ac i S(x) + R involves the remove a very sizeable burden from the person coding problems,
following four steps: for they make possible the coding of classes of problems in contrast
First: Clear SR and transfer S(x) (the divisor) into it. to coding each individual problem separately. Because Ap -+ S ( x )
Second: Clear AR. and Ap' + S(x) are available, any program sequence may be stated
Third: Thirty-nine steps, each of which consists of the following in general form (that is, without Selectron location designations
three parts: (a) Sense the signs of the contents of Ac (the partial for the numbers being operated on) and the Selectron locations
remainder) and of SR, and sense whether they agree or not. (b) of the numbers to be operated on substituted whenever that se-
Shift Ac and AR left. In this process the previous sign digit of quence is used. As an example, consider a general code for nth
Ac is lost. Fill the right-most digit of Ac (after the shift) with a order integration of m total differential equations for p steps of
0, and the right-most digit of AR (before the shift) with 0 or 1, independent variable t, formulated in advance. Whenever a prob-
116 Part 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
lem requiring this rule is coded for the computer, the general point since a different scale factor does not need to be remembered
integration sequence can be inserted into the statement of the for each number.
problem along with coded instructions for telling the sequence To program a floating binary point involves detecting where
where it will be located in the memory [so that the proper S(x) the first zero occurs in a number in Ac. Since Ac has shifting
designations will be inserted into such orders as Cu + S(x), etc.]. facilities this can best be done by means of them. In terms of the
Whenever this sequence is to be used by the computer it will operations previously described this would require taking the given
automatically substitute the correct values of m, n, p and At, as number out of Ac and performing a suitable arithmetical operation
well as the locations of the boundary conditions and the descrip- on it: For a (multiple) right shift a multiplication, for a (multiple)
tions of the differential equations, into the general sequence. (For left shift either one division, or as many doublings (Le. additions)
the details of this particular procedure, cf. Chapter 13, Part 11.) as the shift has stages. However, these operations are inconvenient
A library of such general sequences will be built up, and facilities and time-consuming, so we propose to introduce two operations
provided for convenient insertion of any of these into the coded ( L and R ) in order that this (i.e. the single left and right shift)
statement of a problem (cf. 6.8.4). When such a scheme is used, can be accomplished directly. These operations make use of facili-
only the distinctive features of a problem need be coded. ties already present in Ac and hence add very little equipment
6.6.6. The manner in which the control shift operations to the computer. It should be noted that in many instances a single
[Cu + S(x), Cu' + S(x), Cc -+S(x), and Cc' + S(x)] are realized has use of L and possibly of R will suffice in programming a floating
been discussed in 6.4 and needs no further comment. binary point. For if the two factors in a multiplication have no
6.6.7. One basic question which must be decided before a superfluous zeros, the product will have at most one superfluous
computer is built is whether the machine is to have a so-called zero (if '/z X < 1 and '/z <
Y 1, then y45 XY 1).This is <
floating binary (or decimal) point. While a floating binary point similarly true in division (if '/4 5 X <
y2 and y2 _I Y 1, then <
is undoubtedly very convenient in coding problems, building it < <
y4 X/Y 1).in addition and subtraction any numbers growing
into the computer adds greatly to its complexity and hence a out of range can be treated similarly. Numbers which decrease
choice in this matter should receive very careful attention. How- in these cases, i.e. develop a sequence of zeros at the beginning,
ever, it should first be noted that the alternatives ordinarily con- are really (mathematically) losing precision. Hence it is perfectly
sidered (building a machine with a floating binary point vs. doing proper to omit formal readjustments in this event. (indeed, such
all computation with a fixed binary point) are not exhaustive and a true loss of precision cannot be obviated by any formal proce-
hence that the arguments generally advanced for the floating dure, but, if at all, only by a different mathematical formulation
binary point are only of limited validity. Such arguments overlook of the problem.)
the fact that the choice with respect to any particular operation 6.7. Table 1 shows that many of the operations which the
(except for certain basic ones) is not between building it into the control is to execute have common elements. Thus addition, sub-
computer and not using it at all, but rather between building it traction, multiplication and division all involve transferring a
into the computer and programming it out of operations built into number from the Selectrons to SR. Hence the control may be
the computer. (One short reference to the floating binary point simplified by breaking some of the operations up into more basic
was made in 5.13.) ones. A timing circuit will be provided for each basic operation,
Building a floating binary point into the computer will not only and one or more such circuits will be involved in the execution
complicate the control but will also increase the length of a num- of an order. The exact choice of basic Operations will depend upon
ber and hence increase the size of the memory and the arithmetic how the arithmetic unit is built.
unit. Every number is effectively increased in size, even though In addition to the timing circuits needed for executing the
the floating binary point is not needed in many instances. Further- orders of Table 1, two such circuits are needed for the automatic
more, there is considerable redundancy in a floating binary point operations of transferring orders from the Selectron register to CR
type of notation, for each number carries with it a scale factor, and FR, and for transferring an order from CR to FR. In normal
while generally speaking a single scale factor will suffice for a computer operation these two circuits are used alternately, so a
possibly extensive set of numbers. By means of the operations binary counter is needed to remember which is to be used next.
already described in the report a floating binary point can be in the operations Cu' -+ S(x) and Cc + S(x) the first order of a pair
programmed. While additional memory capacity is needed for this, is ignored, so the binary counter must be altered accordingly.
it is probably less than that required by a built-in floating binary The execution of a sequence of orders involves using the various
Chapter 4 1 Preliminary discussion of the logical design of an electronic computing instrument 117
Table 1
Symbolization
1 S(x) 4 Ac+ X Clear accumulator and add number located at position x in the Selectrons into it.
2 S(x) 4 Ac- X- Clear accumulator and subtract number located at position x in the Selectrons into it.
3 S(x) + AcM xM Clear accumulator and add absolute value of number located at position x in the Selectrons
into it.
4 S(x) 4 Ac - M x - M Clear accumulator and subtract absolute value of number located at position x i n the Selec-
trons into it.
5 S ( x ) 4 Ah+ xh Add number located at position x in the Selectrons into the accumulator.
6 S(x) + Ah- xh - Subtract number located at position x in the Selectrons into the accumulator.
7 S ( x ) 4 AhM xhM Add absolute value of number located at position x in the Selectrons into the accumulator.
a S(x)-t Ah - M x - hM Subtract absolute value of number located at position x in the Selectrons into the accumulator.
9 S(x) 4 R xR Clear register? and add number located at position x in the Selectrons into it.
10 R+A A Clear accumulator and shift number held in register into it.
11 S(x) x R + A xx Clear accumulator and multiply the number located at position x in the Selectrons by the num-
ber in the register, placing the left-hand 39 digits of the answer in the accumulator and the
right-hand 39 digits of the answer in the register.
12 A i S(x) + R X i Clear register and divide the number in the accumulator by the number located in position x
of the Selectrons, leaving the remainder in the accumulator and placing the quotient in the
register.
13 cu + S(X) xc Shift the control to the left-hand order of the order pair located at position x in the Selectrons.
14 Cu’ + S(X) X C Shift the control to the right-hand order of the order pair located at position x in the Selectrons.
15 c c + S(x) XCC If the number in the accumulator is 2 0, shift the control as in Cu 4 S(x).
16 CC’ + S(X) XCC‘ If the number in the accumulator is 2 0, shift the control as in Cu’ 4 S(x).
17 At -+ S(x) XS Transfer the number in the accumulator to position x in the Selectrons.
la Ap + S(x) XSP Replace the left-hand 12 digits of the left-hand order located at position x in the Selectrons by
the left-hand 12 digits in the accumulator.
19 Ap’ + S(x) xSp’ Replace the left-hand 12 digits of the right-hand order located at position x in the Selectrons
by the left-hand 12 digits in the accumulator.
20 L L Multiply the number in the accumulator by 2, leaving it there.
21 R R Divide the number in the accumulator by 2, leaving it there.
timing circuits in sequence. W h e n a given timing c i r c u i t has Second: Some v i e w i n g tubes f o r graphical portrayal o f results.
completed i t s operation, it emits a pulse w h i c h should go t o t h e Third: A t y p e w r i t e r for feeding data directly i n t o the com-
timing c i r c u i t t o b e used next. Since this depends u p o n t h e partic- puter, n o t t o b e confused with t h e equipment used for preparing
u l a r operation b e i n g executed, these pulses are r o u t e d according a n d printing f r o m magnetic wires. As presently planned t h e l a t t e r
t o the signals received f r o m the decoding a n d recoding function will consist o f modified Teletypewriter equipment, cf. 6.8.2 a n d
tables activated by t h e six b i n a r y digits specifying a n order. 6.8.4.
6.8. In this section w e will consider w h a t must b e added t o 6.8.1. Since there already exists a w a y o f transferring numbers
t h e control so t h a t it can direct t h e mechanisms for getting data between the Selectrons a n d Ac, therefore A c m a y b e used for
i n t o a n d o u t o f t h e computer a n d also describe t h e mechanisms transferring numbers f r o m a n d t o a wire. T h e latter transfer will
themselves. Three different kinds o f input-output mechanisms are b e done serially a n d will make use o f t h e shifting facilities o f Ac.
planned. Using A c f o r this purpose eliminates t h e possibility o f c o m p u t i n g
First: Several magnetic w i r e storage units operated by servo- a n d reading f r o m o r writing o n t h e wires simultaneously. However,
mechanisms controlled by the computer. simultaneous operation of the computer a n d the input-output
118 Part 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
organ requires additional temporary storage and introduces a syn- and transfer from Ac to wire. In addition, the wire must signal
chronizing problem, and hence it is not being considered for the the control as each digit is read and when the end of a number
first model. has been reached. Conversely, when recording is done the control
Since, at the beginning of the problem, the computer is empty, must have a means of timing the signals sent from Ac to the wire,
facilities must be built into the control for reading a set of numbers and of counting off the digits. The 26 counter used for multiplica-
from a wire when the operator presses a manual switch. As each tion and division may be used for the latter purpose, but other
number is read from a wire into Ac, the control must transfer it timing circuits will be required for the former.
to its proper location in the Selectrons. The CC may be used to If the method of checking by means of two computers operating
count off these positions in sequence, since it is capable of trans- simultaneously is adopted, and each machine is built so that it
mitting its contents to FR. A detection circuit on CC will stop can operate independently of the other, then each will have a
the process when the specified number of numbers has been placed separate input-output mechanism. The process of making wires
in the memory, and the control will then be shifted to the orders for the computer must then be duplicated, and in this way the
located in the first position of the Selectron memory. work of the person making a wire can be checked. Since the wire
It has already been stated that the entire memory facilities of servomechanisms cannot be synchronized by the central clock, a
the wires should be available to the computer without human problem of synchronizing the two computers when the wires are
intervention. This means that the control must be able to select being used arises. It is probably not practical to synchronize the
the proper set of numbers from those going by. Hence additional wire feeds to within a given digit, but this is unnecessary since
orders are required for the code. Here, as before, we are faced the numbers coming into the two organs Ac need not be checked
with two alternatives. We can make the control capable of exe- as the individual digits arrive, but only prior to being deposited
cuting an order of the form: Take numbers from positions p to in the Selectron memory.
p + s on wire No. k and place them in Selectron locations u to 6.8.2. Since the computer operates in the binary system, some
0 + s. Or we can make the control capable of executing some less means of decimal-binary and binary-decimal conversions is highly
complicated operations which, together with the already given desirable. Various alternative ways of handling this problem have
control orders, are sufficient for programming the transfer opera- been considered. In general we recognize two broad classes of
tion of the first alternative. Since the latter scheme is simpler we solutions to this problem.
adopt it tentatively. First: The conversion problems can be regarded as simple arith-
The computer must have some way of finding a particular metic processes and programmed as sub-routines out of the orders
number on a wire. One method of arranging for this is to have already incorporated in the machine. The details of these programs
each number carry with it its own location designation. A method together with a more complete discussion are given fully in Chap-
more economical of wire memory capacity is to use the Selectron ter 9, Part 11, where it is shown, among other things, that the
memory facilities to remember the position of each wire. For conversion of a word takes about 5 msec. Thus the conversion time
example, the computer would hold the number t, specifying which is comparable to the reading or withdrawing time for a word-
number on the wire is in position to be read. If the control is about 2 msec-and is trivial as compared to the solution time for
instructed to read the number at position p , on this wire, it will problems to be handled by the computer. It should be noted that
compare p , with t,; and if they differ, cause the wire to move the treatment proposed there presupposes only that the decimal
in the proper direction. As each number on the wire passes by, data presented to or received from the computer are in tetrads,
one unit is added or subtracted to t, and the comparison repeated. each tetrad being the binary coding of a decimal digit-the infor-
When p , = t, numbers will be transferred from the wire to the mation (precision) represented by a decimal digit being actually
accumulator and then to the proper location in the memory. Then equivalent to that represented by 3.3 binary digits. The coding
both t , and p , will be increased by 1, and the transfer from the of decimal digits into tetrads of binary digits and the printing of
wire to accumulator to memory repeated. This will be iterated, decimal digits from such tetrads can be accomplished quite simply
until t, + s and p , +
s are reached, at which time the control and automatically by slightly modified Teletype equipment, cf.
will direct the wire to stop. 6.8.4 below.
Under this system the control must be able to execute the Second: The conversion problems can be regarded as unique
following orders with regard to each wire: Start the wire forward, problems and handled by separate conversion equipment incor-
start the wire in reverse, stop the wire, transfer from wire to Ac, porated either in the computer proper or associated with the
Chapter 4 1 Preliminary discussion of the logical design of an electronic computing instrument 119
mechanisms for preparing and printing from magnetic wires. Such tape which can be used to operate a teletypewriter. (Studies are
converters are really nothing other than special purpose digital being undertaken to design equipment that will eliminate the
computers. They would seem to be justified only for those com- necessity for using paper tapes.)
puters which are primarily intended for solving problems in which As was shown in 6.6.5, the statement of a new problem on a
the computation time is small compared to the input-output time, wire involves data unique to that problem interspersed with data
to which class our computer does not belong. found on previously prepared paper tapes or magnetic wires. The
6.8.3. It is possible to use various types of cathode ray tubes, equipment discussed in the previous paragraph makes it possible
and in particular Selectrons for the viewing tubes, in which case for the operator to combine conveniently these data on to a single
programming the viewing operation is quite simple. The viewing magnetic wire ready for insertion into the computer.
Selectrons can be switched by the same function tables that switch It is frequently very convenient to introduce data into a com-
the memory Selectrons. By means of the substitution operation putation without producing a new wire. Hence it is planned to
Ap -+ S(x) and Ap' + S(x), six-digit numbers specifying the abscissa build one simple typewriter as an integral part of the computer.
and ordinate of the point (six binary digits represent a precision By means of this typewriter the operator can stop the computation,
of one part in 26 = 64, i.e. of about 1.5 per cent which seems type in a memory location (which will go to the FR), type in a
reasonable in such a component) can be substituted in this order, number (which will go to Ac and then be placed in the first
which will specify that a particular one of the viewing Selectrons mentioned location), and start the computation again.
is to be activated. 6.8.5. There is one further order that the control needs to
6.8.4. As was mentioned above, the mechanisms used for execute. There should be some means by which the computer can
preparing and printing from wire for the first model, at least, will signal to the operator when a computation has been concluded,
be modified Teletype equipment. We are quite fortunate in having or when the computation has reached a previously determined
secured the full cooperation of the Ordnance Development Divi- point. Hence an order is needed which will tell the computer to
sion of the National Bureau of Standards in making these modifi- stop and to flash a light or ring a bell.
cations and in designing and building some associated equipment.
By means of this modified Teletype equipment an operator first
prepares a checked paper tape and then directs the equipment
to transfer the information from the paper tape to the magnetic References
wire. Similarly a magnetic wire can transfer its contents to a paper BurkA62a, BurkA6Zb; Craw€'??; GoldHGSa, b, c, d; RajcJ43
The ISP of the PDP-8 Pc is about the most trivial in the book.
It has only a few data operators, namely, +-, +, - (negate), 7,
A, / 2, x 2, (optional) x , /, and normalize. It operates on words,
The DEC PDP-8 integers, and boolean vectors. However, there are microcoded
instructions, which allow compound instructions to be formed in
Introduction'
a single instruction.
The PDP-8 is a single-address, 12-bit-word computer of the second The computer is straightforward and illustrates the levels dis-
generation. It is designed for task environments with minimum cussed in Chap. 1. We can easily look at it from the "top down."
arithmetic computing and small Mp requirements. For example, The C in PMS notation is
it can be used to control laboratory devices, such as gas chromoto-
graphs or sampling oscilloscopes. Together with special T's, it is C('PDP-8; techno1ogy:transistors; 12 b/w;
programmed to be a laboratory instrument, such as a pulse height descendants:'PDP-8/S, 'PDP-8/1, 'PDP-8/L;
analyzer or a spectrum analyzer. These applications are typical antecedents: 'PDP-5;
Mp(core; #0:7; 4096 w; tc:1.5 p / w ) ;
of the laboratory and process control requirements for which the
machine was designed. As another example, it can serve as a -
Pc(Mps(2 4 w);
instruction length:lI2 w
message concentrator by controlling telephone lines to which
address/instruction: 1;
typewriters and Teletypes are attached. The computer occasion-
operations on d a t a / o d : ( t , +, 7,
A, -(negate), x 2,
ally stands alone as a small-scale general-purpose computer. Most
recently it was introduced as a small-scale general-purpose time- / 2, +1)
optional operations:( x , /, normalize);
sharing system, based on work at Carnegie-Mellon University and data-types:word, integer, boolean vector;
DEC. It is used as a KT(disp1ay) when it has a P(disp1ay; '338); operations for data access:4);
this C is discussed in Chap. 25. The PDP-8 has achieved a produc- P(disp1ay; '338);
tion status formerly reserved for ZBM computers; about 5,000 have P(c; 'LINC);
been constructed. S('I/O BUS; 1 Pc; 64 K);
PDP-8 differs from the character-oriented 8-bit computer in Ms(disk, 'DECtape, magnetic tape);
Chap. 10; it is not unlike the 16-bit computers, such as the IBM T(paper tape, card, analog, cathode-ray tube))
1800 in Chap. 33. The PDP-8 is typical of several 12-bit computers:
the early CDC-160 series (1960), CDC-6600 Peripheral and Con-
trol Processor (Chap. 39), the SDS-92, M.I.T. Lincoln Laboratory's ISP
Laboratory Instrument Computer LINC (1963), Washington Uni- The ISP is presented in Appendix 1 of this chapter (including the
versity's Programmed Console (1967), and the SCC 650 (1966). optional Extended Arithmetic Element/EAE). The 212-word Mp
The PDP-5 (transistor, 1963), PDP-8 (l965), PDP-8/S (serial, is divided into 32 fixed-length pages of 128 words each. Address
1966) and PDP-8/1 (integrated circuit, 1968),PDP-R/L (integrated calculation is based on references to the first page, Page-0, or to
circuit, 1968) constitute a series of computers based on evolving the current page of the Program Counter/PC. The effective-
technology. All of these have identical ISP's. Their PMS structures address calculation procedure provides for both direct and indirect
are nearly identical, and all components other than Pc and Mp reference to either the current page or the first page. This scheme
are compatible throughout the series. The LINC-8-338 PMS struc- allows a 7-bit address to specify local page addresses.
ture is presented in Fig. 1. A cost performance tradeoff took place A 215-word Mp is available on the PDP-8, but addressing
in the PDP-8 (parallel-by-word arithmetic) and PDP-8/S (serial- greater than 212words is comparatively inefficient. In the extended
by-bit arithmetic) implementations. A PDP-S/S is one-fifteenth of range, two 3-bit registers, the Program Field and Data Field
a PDP-8 at one-half the cost. The performance factors can be Registers, select which of the eight 2 1 2 - ~ o r dblocks are being
attributed to 8/1.5 or 5.3 for Mp speed and a factor of about 3 actively addressed as program and data.
for logical organization, even though the same 2-megahertz logic There is an array of eight registers, called the Auto-index
clock is used in both cases. The PDP-8 is about 6.7 times a PDP-5. registers, which resides in Page-0. This array (Auto,index[O:
'The initials in the title stand for Digital Equipment Corporation Pro- 11](0:7): = M[108:178](O:11))possesses the useful property that
grammed Data Processor. whenever an indirect reference is made to it, a 1 is first added
120
Chapter 5 I The DEC PDP-8 121
T.consol e -
Mp fJ'0;7) !- Sz-S-/c~S4 10 c h a r / s ; 8 b / c h a r ; 64 c h a r ) -
I
K
T
-
K
T-
ccpaper tape;
100 c h a r / s ) :
( r e a d e r ; 300 c h a r / s ) ) (punch:
8 b/char
i n c r e m e n t a l p o i n t p l o t : 300 p o i n t / s ;
in/point
3
3
.01
-
+
K-T(card; r e a d e r : 2001800 c a r d / m i n ) t
"16b
l i n e : p r i n t e r ; 300 l i n e / m i n :
cha r / c o I 3
120 c o l / l i n e : +
"1
K-
CRT: d i s p l a y : a r e a : I O x I O i n 2 1 5 x 5 i n 2 ;
30 u s / p o i n t :
T ( 1 i q h t : pen)>
.01 1.005 i n / p o i n t 3 +
( 1 2 , l p a r i t y ) b/w
-= P(display: '338) T ( # 0 : 3 ; CRT: d i s p l a y : a r e a : IO x IO inz)-)
T(#0:3; l i g h t : pen)>
T ( # 0 : 3 ; push b u t t o n s ; c o n s o l e ) +
T. conso I e
Ms #0:1; L I N C d a p e : a d d r e s s a b l e m a g n e t i c t a p e : -
to its contents. (That is, there is a side effect to referencing.) Thus, design to determine which instruction numbers are to be assigned
address integers in the register can select the next member of a to names and operations and instructions which are free to be
vector or string for accessing. assigned (or encoded).
The instruction-set-execution definition can also be presented There are eight basic instructions encoded by 3 bits, that is
as a decoding diagram or tree (Fig. 2). Here, each block represents op(O:2) : = i(0:2), where instruction/i(O:ll). Each of the first six
an encoding of bits in the instruction word. A decoding diagram instructions (where 0 5 op <
6) have the 4 address operand deter-
allows one more descriptive dimension than the conventional, mination modes (thus yielding essentially 24 instructions). The first
linear ISP description, revealing the assignment of bits to the six instructions are:
instruction. Figure 2 still requires ISP descriptions for Mp, Mps, data transmission: deposit and clear-accumulator/dca
the instruction execution, the effective-address calculation, and two’s complement add to the accumula-
the interpreter. Diagrams such as Fig. 2 are useful in the ISP tor/tad
Principle oddressable
instructions
Extended arithmetic
O
+P 0 ond - Operate microcoded instructions
6 7 8 9 in 11
time
-
1u f i q I Tal- rtl- I K I
I
\
EAE A I<]> A time [1,2,31
Operate, opr
instruction)
i
instruction i<O:ll> ! = op ib p page,oddress
Instruction word f o r m a t
binary arithmetic: two's complement add to the accumu- tion set-link +L t l is formed by coding the two microinstruc-
lator/tad tions, clear link, next, complement link.
binary boolean: and to the accumulator/and opr- 1 + (i(5) + L t 0; next
program control: jump/set program counter/jmp i(7) -+L t 1 L )
jump to subroutine/jms
Thus, in operate group 1, the instructions clear link, complement
index memory and skip if results are
zero/isz link, and set link are formed by coding instruction(5,7) = 10, 01,
and 11, respectively. The operate group 2 instruction is used for
testing the condition of the Pc state. This instruction uses bits 5,
Note that the add instruction, tad, is used for both data trans-
6, and 8 to code tests for the accumulator. The AC skip conditions
mission and arithmetic.
The subroutine-calling instruction, jms, provides a method for
-
are coded (0 7) as never, always, =0, #0, <0, 2 0 , 50, and >O.
If all the nonredundant and useful variations in the two operate
transferring a link to the beginning (or head) of the subroutine.
groups were available as separate instructions in the manner of
In this way arguments can be accessed indirectly, and a return
the first seven (dca, tad, etc.), there would be approximately
is executed by a jump indirect instruction to the location storing
7 + 12(0pr-l) + lO(0pr-2) + 6(EAE) = 35 instructions in the
the returned address. This straightforward subroutine-call mecha-
PDP-8.
nism, although inexpensive to implement, requires reentrant and
The optional Extended Arithmetic Element/EAE includes
recursive subroutine calls to be interpreted by software, rather
additional Multiplier Quotient/MQ and Shift Counter/SC regis-
than by hardware. A stack, as in the DEC 338 (Chap. 25), would
ters and provides the hardwired operations multiply, divide, logi-
be nicer.
cal shift left, arithmetic shift, and normalize. The EAE is defined
The input-output instruction/iot (:= op = 6) uses the re-
on the last page of Appendix 1.
maining 9 bits of the instruction to specify instructions to input/
output devices. The 6 io-select bits select 1 of 64 devices. The The interrupt scheme
3 bits, io-pl-bit, io-p&-bit, io,p4,bit, command the selected
External conditions in the input/output devices can request that
device by conditionally providing three pulses in sequence. The
Pc be interrupted. Interrupts are allowed if (Interrupt-state = 1).
instructions to a typical io device are:
A request to interrupt clears Interrupt-state (Interrupt-state
t 0), and Pc behaves as though a jump to subroutine 0 instruction,
io-pl-bit -+ (IO,skip,flag[io select] + (PC t PC + 1))
testing a condition of an IO device output to a device input jms 0, had been given. A special iot instruction (instruction =
from a device 6001,) followed by a jump to subroutine indirect to 0 instruction
(instruction = 5200,) returns Pc to the interruptable state with
io,p4,bit + (Output,data[io select] t AC)
Interrupt-state = 1. The program time to save M(processor
io-p2,bit + (AC c Input,data[io select]) state/ps) is 6 Mp accesses (9 microseconds), and the time to restore
Mps is 9 Mp accesses (13.5 microseconds).
There are three microcoded instruction groups selected by Only one interrupt level is provided in the hardware. If multi-
op = 7. The instruction decoding diagram (Fig. 2) and the ISP ple priority levels are desired, programmed polling is required.
description (Appendix 1 of this chapter) show the microinstruc- Most io devices have to interrupt because they do not have a
tions which can be combined in a single instruction. These instruc- program-controlled enable switch for the interrupt. For multiple
tions are: operate group 1 (: = (op = 7) A 1i(3)) for operating on devices approximately 3 cycles (4.5 ps) are required to poll each
the processor state; operate group 2 (: = (op = 7 ) A (i(3,ll) = interrupter.
10,)) for testing the processor state; and the extended arithmetic
element group (:= ((op = 7 ) A ( i ( 3 , l l ) = 11,))) for multiply,
divide, etc. Within each instruction the remaining bits, (4:lO) or PMS structure
(4:11), are extended instruction (or opcode) bits; that is, the bits The PMS structure of the LINC-8-338 consisting of a Pc('LlNC),
are microcoded to select instructions. In this way an instruction Pc('PDP-8), and P.display('338) is shown in Fig. 1. The PDP-8 is
is actually programmed (or microcoded). For example, the instruc- just a single Pc. The Pc('L1NC) is a very capable Pc with more
124 Part 2 1 The instruction-set processor: main-line computers Section 1 1 Processors with one address per instruction
instructions than the main Pc. It is available in the structure to the structure. Since it is a bus structure, the S can be expanded
interpret programs written for the C('LINC), a computer devel- into L's and simple S's as shown in Fig. 3. The termination of the
oped by M.I.T.'s Lincoln Laboratory as a laboratory instrument L in Pc is given in Fig. 3. The corresponding logic at a K is given
computer for biomedical and laboratory applications. Because of in Fig. 5 in terms of logic design elements (AND's and OR's).
the rather limited ISP in Pc, one would hardly expect to find all (Fig. 5 also shows the S('I/O Bus) structure of Figs. 1 and 3). The
the components present in Fig. 1 in an actual configuration. operation of S('I/O Bus) shown in Fig. 5 starts when Pc sends
The S between the Mp and the Pc allows eight Mp's. This S a signal to select (or address) a particular K, using the IO-select
is actually S('Memory Bus; 8 Mp; 1 Pc; (P requests); time-multi- (O:5) signals to form a 6-bit code to which K responds. Each
plexed; 1.5ps/w). Thus the switch makes Mp logically equivalent K is hardwired to respond to a unique code. The local control,
to a single Mp(32768 w). There are two other L's which are con- Kb], select signal is then used to form three local commands when
nected to the Pc, excluding the T.console. They are L('I/O Bus) ANDed with the three iot command lines from Pc, io-pl-bit,
and L('Data Break; Direct Memory Access). These links become io,p2,bit, and io,p4,bit. Twelve data bits are transmitted either
switches when we consider the physical structure. Associated with to or from Pc, indirectly under K s control. This is accomplished
each device is a switch, and the bus links all the devices; the by using the AND-OR gates in K for data input to Pc, and the
L('I/O Bus) is really an S('I/O Bus). Each time a K connects to AND gate for data input to K. The data lines are connected to AC
it, the S is included in the K. A simplified PMS diagram (Fig. 3) as shown in Fig. 4. A single skip input is used so that Pc can
shows the structure and the logical-physical transformation. Thus, test a status bit in K. A K communicates to Pc via the interrupt
the 1/0 Bus is request line. Any K wanting attention simply ORs its request signal
into the interrupt request signal. Program polling in Pc then selects
S('I/O Bus; duplex; bus; time-multiplexed, 1 Pc; 64 K; Pc
the specific interrupter. Normally, the K signal causing an inter-
controlled, K requests; t:4.5 ps/w)
rupt is also connected to the skip input.
The S('I/O Bus) is the same for the PDP-5, 8, 8/S, 8/I, and 8/L. The L('Data Break; Direct Memory Access) provides a direct
Hence, any K can be used on any of the above C's. The 1/0 Bus access path for a P or K to Mp via Pc. The number of access ports
is the link to the K's for Pc-controlled data transfers. Each word to memory can be expanded to eight by using the S('DMO1 Data
transferred is designated by a Pc instruction. However, the 1/0 Multiplexer). The S is requested from a P or K. The P or K supplies
Bus allows a K to request Pc's attention via the interrupt request an Mp address, a read or write access request, and then either
signal. The Pc polls the K's to find the requesting K if multiple accepts or supplies data for the Mp accessed word. In the config-
interrupt requests occur. A detailed structure of the Pc-Mp uration (Fig. l), P('L1NC) and P('338) are connected to S('DMO1)
(Fig. 4) shows these L('I/O Bus, 'Data Break) connections to the and make requests to Mp for both their instructions and data in
registers and control in the notation used by DEC. This diagram the same way as the Pc. The global control of these processor
is essentially a functional block diagram. programs is via the S('I/O Bus). The Pc issues start and stop com-
The S('I/O Bus) in Fig. 1 is only an abstract representation of mands, initializes their state, and examines their final state when
a program in the other P halts or requires assistance.
When a K is connected to L('Data Break) or to S('DMO1 Data
Multiplexer), the K only accesses Mp for data. The most complex
T.console-
I function these K's carry out is the transfer of a complete block
Mp(#O: c o r e ) - S- L-Pc-L I/O BUS)
I --PK-
L
( I
L ( ' D a t a Break)
of data between the Mp and an Ms or a T, for example,
I K('DECtape, disk). A special mode, the three-cycle data break,
is controlled by Pc so that a K may request the next word from
L
a queue in Mp. In this mode the next word is taken from the queue
MP (k'71-S 2 I
1
.
. (block) in Mp, and a counter is reduced each time K makes a
u
S('Mernory Bus)
L
LS-K-
request. With this scheme, a word transfer takes three Mp cycles:
U one to add one to the block count, one to add one to the address
S('I/O Bus)
pointer, and one to transmit the word.
The DECtape was derived from M.I.T.'s Lincoln Laboratory
Fig. 3. DEC PDP-8 PMS diagram (simplified). LINCtape unit. Data are explicitly addressed by blocks (variable
Chapter 5 I The DEC PDP-8 125
Skip Program
Peripherol counter
equipment C contro I
I/O Bus
I/O Bus
peripheral
equipment
using
programmed
-
t---
AC
data (12)
k-24
Data
Switches
4
=.
Doto
c
Address
Progrom
-
fronsfers Select counter
code Outpd Link
Teletype
model 33
(6)
bus
drivers * 1
4
* ASR -
Teletype
Accumulator
control Ooto (a)
*
*m
0 - 12
1
Peripheral
{
equipment Memory
I/O buffer
control register
register
1;
I.
C
Address oaepted MA
control
Peripheral
equipment-
1/0 Bus
Flow direction
P o r t of I S P 4 DEC stondord positive pulse ( - 3 volts t o ground 1
Transfer direction is i n t o POP-8
when -3 v o l t s , o u t of PDP - 8
4 DEC standard negotive pulse (ground t o -3 volt-)
when around DEC standard ground level signal
Oota break request is for three- DEC standard --3volt level signal
cycle breok when ground or one-
cycle break when -3 volts
-
about 0.3 per cent of Pc-Mp capacity per active line (each of
(used f o r AC-Input-dot0 [k])
I0,pulseYp4 A h-select
(used f o r o u t p u t u d o t o [k]-ACl 10 15 char/s). In general, the PDP-8 hardware controls are
minimal-in turn fairly elaborate control programs must be used
as part of them.
Computer levels
In this section we describe all the systems levels in the PDP-8
computer from the top down. The reader should already have a
sketchy knowledge of the PDP-8 because the registers and ISP
have been exposed. Here, we wish to clarify how it operates. A
map of the hierarchy is given in Fig. 6, starting from PMS to ISP
and down through logic design to circuit electronics. These de-
iiiiii
To next K
scription levels are subdivided to provide more organizational
detail. For example, the register-transfer level has the more de-
tailed registers, data operators, functional units, and macro logic
of the processor, whereas the next logic level below has sequential
and combinational networks, and the sequential and combinatorial
elements.
It should be apparent that the relationship of the various de-
I
1
I pc:= I
1 r
D 'Link/L. operotions'll-0: L-1 - 1 1
L ('Memory Bus):=
1,
I- I L"0 Bus1 :=
-L MB<O 11) data buffer/MB<O:ll>;flip flop1 ___ -L('AC; input,output, 1ZbI-
[ w t p u i broddms;;lZ b r
I
--cL I'IOuselect<O:5>1:=
['MEh.E>
(lInstructlon regster /IR<O:Z> .,f lip flop 1
MpsVProgrorn caunter/PC<O Il>;flip flop1 I
D ('IR; operations: ?IR-O:IR-M[MA] <O:Z> 11
I '
t-J
rTl'Senseuarnplifier <O'lD I
M ('Coreustack ;12b;40%w
i I -
I I -
Inputs t m ~ ~ [
K (ISP,Mp.S('I/O bus1.T.console 'Data
break;
} M (working;IStoteuregister,l IOuskipl, (output;
IO-pulre-pl,pP.p4 I
T (clock)
I T console ('Datauswitchesl-
7--------- ~ L('requert;direction,cycle-select<0:1> 1- I
-+ifaddress -occepted:word-oount_w: break-statel- 1
To MpIXl:71 --CLl'ME~O'll~;outputl- I
- Ll'DB-address<O.ll>; inputl- I
-L('DB-dato <0.11>; input)- I
I T. console Ilightsl-
only registers, operations, and L's are important at this level. We 64 x 64 1-bit core planes is needed. Such a diagram, though still
still lack information about the conditions under which operations a functional block diagram, takes on some of the aspects of a
are evoked. Figure 8 is a PMS diagram of Pc-Mp registers. Here circuit diagram because a core memory is largely circuit-level
we show considerably more detail (although we do not bother with details. The Mp (Fig. 9) consists of the component units: the two
electrical pulse voltages and polarities) than in Fig. 4. We declare address decoders (which select 1 each of 64 outputs in the X and
the Pc state (including the temporary register) within Pc. The Y axis directions of the coincident current memory); selection
figure also gives the permissible data operations, D, which are switches (which transform a coincident logic address into a high-
permitted on the registers. It should be clear from this that the current path to switch the magnetic cores); the 12 inhibit drivers
logical design level for the registers and the operators can easily be (which switch a high current or no current into a plane when
reached. The K logic design cannot be reached until we use the either a 0 or 1 is rewritten); 12 sense amplifiers (which take the
programming level constraints (ISP), thus defining the conditions induced low sense voltage from a selected core from a plane being
for evoking the data operators. switched or not switched and transform it into a 1 or 0); and the
core stack, an array M[0:7777,](0:11). Since this is the only time
The core memory. The Mp structure is given in Fig. 8. A more the Mp is mentioned, Fig. 9 also includes the associated circuit-
detailed block diagram which shows the core stack with its twelve level hardware needed in the core-memory operation, such as
Chapter 5 I The DEC PDP-8 129
power supplies, timing, and logic signal level conversion amplifiers. have selection current. Only one core in each plane is
The timing signals are generated within Pc(K) and are shown selected since Ix = Iy = Iswitching/2, and the current at
together with Pc’s clock in Fig. 10. the selected intersection = Ix + Iy = Iswitching.
The process of reading a word from memory is:
4 If a core is switched to 0 (by having Iswitching amperes
1 A 12-bit selection address is established on the MA(0:ll) through it), then a 1 was present and is read at the output
address lines, which is 1 of 10000, (or 4096,,) unique num- of the plane (bit) sense amplifiers. A sense amplifier receives
bers. The upper 6 bits, (0:5), select 1 of 64 groups of Y an input from a winding that threads every core of every
addresses and the lower 6 bits, (6:11), select 1 of 64 groups bit within a core plane [0:7777,].All 12 cores of the selected
of X addresses. word are reset to 0. The sense time at which the sense
amplifier is observed is tms (memory strobe), and the strobe
2 The read logic signal is made a 1.
in effect creates hlB t M[MA].
3 A high-current path flows via the X and Y selection
switches. In each of the X and Y directions 64 x 12 cores 5 The read current is turned off.
To
MB data
inputsC0 1
12
Fig. 9. DEC PDP-8 four-wire coincident current (three dimensions) core-memory-logic block diagram.
130 Part 2 I The instruction-set processor: main-line computers Section 1 1 Processors with one address per instruction
ME-
ememory
I
t1-( t1-(
ljmpvirnsl
(IR=lO'#'I-;
It05 MArl7)-I
ME-MB +'I11;
AC-0)); ,MB9777)4
Pc-Pc+~lll;
t m d 4
5:11>- MB<5:ll>; MA-ME);
E<4> -MA<0:4>-0 1;
d p s v dco v
ISZI-LI
M [MAI-MEN.
irns-(
PC<S'lD-MB<5.1v;
ME-PC;
~MB<4>-PC<0.9-01:
O;Stote, register- 0
state,
Q
/tM2Bfb;IR-O;
Stoteuregister-Ol; (to FOI
( t o EO1 ( t o EO1
for the description. The State-register values 0, 1, and 2 corre- Logic design level (registers and data operations)
spond to fetching, deferring (indirect addressing, i.e., fetching an Proceeding from the register-transfer and ISP descriptions, the
operand address), and executing (fetching or storing data, then next level of detail is the logic module. Typical of the level is the
executing) the instruction. The state diagram does not describe 1-bit logic module for an accumulator bit, AC(j), illustrated in
the Extended Arithmetic Element/EAE operation, the interrupt Fig. 12. The horizontal data inputs in the figure are to the logic
state, and the data break states (these add 12 more states). The module from AC(j), MB(j), IO Bus(j), and Data,switch(j). The
initialization procedure, including the T.console state diagram, is vertical control signal inputs command the register operations (Le.,
also not given. One should observe that when t2 occurs at the the transfers); they are labeled by their respective ISP operations
beginning of the memory cycle, a new State-register value is (for example, AC c MB A AC, A C c AC x 2 {rotate}). The
selected. The State-register value is always held for the remainder sequential network Pc(K) (Fig. 8) generates these control signal
of the cycle; Le., only the sequences (FO + F1+ F2 + F3 or inputs.
DO + D1+ D2 -+ D3 or EO -+ E l + E2 -+ E3) are permitted.
Logic design level (Pc control, Pc(K) sequential network)
Figure 8 alludes to Pc(K), that is, the sequential network used
for controlling Pc. The inputs and the present state (including The output signals from the Pc(K) (Fig. 8) can be generated in
clocks) determine the operations to be issued on the registers. a straightforward fashion by formulating the boolean expressions
132 Pari 2 1 The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
Bus t o each b i t of AC
r 'AC-0 := (
( t l A (1R = 111) A (7 MB<3> A MB<4> A 7 MB<6>) A (State,register=O)) v
( t l A ( I R = 111) A (MB<3> A MB<ll> A MB<4>) A (State-register=O)) v
_I
,IR<O>
IR<1>
R<2>
(State-register = 2 )
LAC-AC/2 ( r o t a t e } ,
LAC-Carry (AC,MB)
'AC- AC x 2 { r o t a + e ) ,
'AC-AC t 1 is formed by AC<12> carry input
Fig. 13. DEC PDP-8 Pc(K) 'AC t 0 signal-logic equations and diagram.
-15V
-
Direct
clear
Direct
DirectA
set NOR ~ o u ~ p u t
set-clear
f l i p - flop
Direct output
clear
tlOV
Flip-flop circuit Combinatorial logic equivalent Direct set-clear flip-flop
of flip-flop sequential logic element
Table of circuit input-output Table of flip-flop input-output
outputs (at t) Inputs Outputs ( a t t+)' outputs (at f ) Inputs o u t p u t s ( a t t+1'
1 0 Direct Direct 1 0 1 0 Direct Direct 1 0
set clear set clear
_________
0 -3 -3 -3 0 -3 1 0 0 0 1 0
-3 0 -3 -3 -3 0 0 1 0 0 0 1
-3 0 -3 0 -3 0 0 1 0 1 0 1
0 -3 -3 0 -3 0 1 0 0 1 0 1
-3 0 0 -3 0 -3 0 1 1 0 1 0
0 -3 0 -3 0 -3 1 0 1 0 4 0
'Note; This i s not an "ideal" sequential circuit element, because there IS no delay in the output.
Circuit level
-15 VOI t S The final level of description is the circuits which form the logic
- 3volts
lnoui Input
-15vo1ts functions of storage (flip-flops)and gating (NAND gates). Figures
14 and 15 illustrate some of these logic devices in detail.
In Fig. 14 a direct set and direct clear flip-flop, a sequential-
Inputs
i Node
NAND logic element NOR logic element
logic element, is described in terms of circuit implementation,
combinational logic equivalent, a table of its behavior, and its
algebraic behavior. Note that this is not an ideal element, be-
Multiple input inverter circuit cause it has no delay and responds directly and immediately to
an input. Some idealized sequential logic elements are used in
Table of circuit Table of NAND Table of NOR the PDP-8 (but not illustrated), including the RS (Reset-Set),
Input
1 2 3
1
behavior
Output Input
1 2 3
1
behavior
Output Input
1 2 3
I
behavior
Output T(Trigger),JK, and D(De1ay).A delay in the flip-flops makes them
0 0 0 -3 0 0 0 1 1 1 1
behave in the same way as the ideal primitives in sequential-
0 0-3 -3 0 0 1 1 1 1 0 circuit theory. The outputs require a series delay, At, such that,
0-3 0 -3 0 1 0 1 1 0 1
0 -3 -3 -3 0 1 1 1 1 0 0 if the inputs change at time t, the outputs will not change until
-3 0 0
-3 0 - 3
-3
-3
1
1
0
0
0
1
1
1
0
0
1
1
1
0
t+ At. In fact, the PDP-8 uses capacitor-diode gates at the flip-
-3 -3 0 -3 1 1 0 1 0 0 1 flop inputs to delay the inputs.
-3 -3 -3 0 1 1 1 0 0 0 0
Figure 15 illustrates the combinatorial logic elements used in
the PDP-8. The circuit selection is limited to the inverter circuit
Fig. 15. DEC PDP-8 combinational element circuit and logic diagrams. with single or multiple inputs. These are more familiarly called
NAND gates or NOR gates, depending on whether one uses posi-
tive and/or negative logic-level definitions.
directly from the state diagram in Fig. 11. For example, the
AC t 0 control signal is expressed algebraically and with a com-
binatorial network in Fig. 13. Obviously these boolean output Conclusion
control signals are functions which include the clock, the We could continue to discuss the behavior of the transistor as it
State-register, and the states of the arithmetic registers (for is used in these switching-circuit primitives but will leave that
example, A = 0, L = 0, etc.). The expressions should be factored to books on semiconductor electronics and physics. It is hoped
and minimized so as to reduce the hardware cost of the con- that the student has gained a grasp of how to think about the
trol for the interpreter. Although we are rather cavalier about hierarchical decomposition of computers into particular levels of
Pc(K), it constitutes about one-half the logic within Pc. analysis (and synthesis).
134 Part 2 I The instruction-set processor: main-line computers Section 1 1 Processors with one address per instruction
Appendix 1
Pr S t n t e
A C d : I I>, Accnmulator
L L i n k h i t / k C e r i e n s i o q ;'or overylcw and carry
P C d : I I> Progr'an Counter
Run I i~hev.?c I s i n t e m r e t i n g i n s t r u r t i o n s o r "runn:ng"
I nt e r rupt-s t a t e 1 ohen fc can be i n t e r r u p t e d ; under programmed control
Io-putse-l ; IOSulseJ; IO,pulse,4 I3 p u l s e s t o I O ?evi?es
I$ S t a t e
E s t e n d e d mernorg is not i r c l u d e j.
M[O:777i8l<0:ll>
Page,O[O: 17i81d: I I > := M[O: 177 Id:
8
I I> s m c i a l array of directlg addressed memory r e g i s t e r s
Auto,index[O: 7l.a: I I> := Page-0 [IO ,I 7 Id:
8' 8
I I> s-pecial arrap when a ldressed i n d i r e c t l y , i s incrernented bg
Fc ('o~soie YCtnte
Keys for start, step, coy,t-'nue, e z m i v ; e (loa? frw memoc4), and deposii- ( s t o r e i n merory! a r e not i n c l u d e d .
Data s w i t c h e s d : l l > data enterec' via console
I n s t m e t i o n Format
instruction/ii0:ll>
o p 4 : 2; := i 4 : 2 ; op code
i n d i rect,b it / i b := i<3; 0, direct; : i n d i r e c t rnemcry redfererce
page,O,bi t/p := i<4> 0 s e l e c t s p a g e 0; 1 s e l e c t s t h i s page
page-add ress<O : 6> := i<S:Il;
t h i s,page<O: 4> := P C ' d : 4 >
PC'<O: I I > := (PC<O:II> - 1 )
IO,select<O:5> := i<3:8> s e l e c t s a 1" or ?.'s d e v i c e
io,pl,bit : = i<lI> t h e s e 3 bits c o n t r o l the s e l e c t i v e g e n e r a t i o n o f - 3 v o l t s ,
i op,Zb, it : = i<IO>
0 . 4 1~sp u l s e s t o I/O devi.-es
io,p4,b it := id>
s ma := i<5> w h i t for> ski? on m".yus A?, o p e r a t e 2 g~oup
s za := i<6> h i t r o r s k i p on ze?o AC
sn 1 := i<7> b i t .+'or s k i p ox n m z e r o L i n k
_.
F-'.'ectiue Ai:.iress ( ' n l c ~ , Z a t l c n,Process
z<O:II> := (
7i b -> z " ;
ib A (lo8 c z" i 178) i (M[z"] +M[z"] + 1 ; next);
ib 2 M[z"])
z'<O:ll; : = (- i b i z " ; i b -iM[z"])
z"<O:Il> : = (page,O,bit i this,pageopage,address;
,page,O,bi t -) Onpage-address)
p microcoded i n s t r u c t i o n o r i n s t r u c t i o n b i t ( s ) w i t h i n an i n s t r u c t i o n
Chapter 5 1 The DEC PDP-8 135
n i.rterpri.tati( n P'rocess
Run A ((nterrupt,request h Interrupt-state) -> (
i n s t r u c t i o n <-M[PC]; PC <-PC + I ; n e x t
instruction,execution);
Run Interrupt-request A I n t e r r u p t u s l a t e -> (
M[O] <-PC; Interruptustale to; PC <- I)
i o t (:= op = 6) 4 (
io,pl,bit -> IO,pulse,l c I; next
r a t e insti.uc: ions: operate grour I , operaie g m u p 2, and csi e arit,*met;c ure ric'ined as a separate
Operate,execution := (
c l a (:= i<4> = I) i(AC c- 0); clear Lr. ( ' o v ~ nLO a l l operate < n s t r u c t i o n s .
opr,l ( : = i<3> = 0) + ( operate groun I
c11 ( : = i<C = 1 ) -> ( L <. 0 ) ; next p clear. l i n k
cma ( : = id>= I ) -> (AC <-7 AC): u complernmt A C
cml ( : = i<7> = I) + (L <-? L ) ; next IL compZernent L
i a c ( : = i < I I > = I ) -> (LWC r-LWC + I ) ; next u. ircrement PC
ral (:= i<8:10> = 2 ) + (Ln4C + L m C x 2 {rotate)); u r o t a t e left
rtl (:= i<B:lO> = 3 ) -'(LoAC <-LOAC X 2' (rotate?); u rotate twice l e f t
rar ( : = i < E : l O > = 4) --f (LOAC t L O A C / 2 ( r o t a t e ) ) ; u. r o t a t e r i g h t
rtr (:= i<8:lO> = 5) + (LOAC t L O A C / Z 2 { r o t a t e ] ) ) ; u. r o t a t e t w i c e raight
KT and W s Z t a t e
Fach K map have ariy or a l l of t h e following registers. There can he UP to 64 o p t i o n a l P I S .
In p u t d a t a [O :77 8 1 4 : 1 1> 64 innut b u f f e r s
O~tput~ata[O:77~14:11> 64 outpul hu.f.fws
IOdkipflag LO: 7781 64 t e s t conditions
IO,interrupt,reques t [O: 77 1 1 s i p n i , f i e s a reauest. 1.finterrunt,qtate = 1 , then ai
8
i n t e r r u n t occurs.
1 n t e r r u p t ,reques t := ( ''or ,! o,f a l l r e o u e s t s .from each IO device
max( I 0 - i nterrupt,reques t [o: n83))
M Q Q : 11> ? Q l t i r l i e r Q u o t i e ni
SC<D:L> S h i f t rounter
I n s t r u c t i o n Format and Data
m d s d : 11>
s a : & > := rnds<7:ll> shift count Darameter
I n s t r u c t i o n S e t f o r t'AE
EAE,instruction&xecution := ( n e x t
=
5)
6)
7)
-
+ (LoACoMQ t L o A C o M Q x Z S + l :
( L o A C o M Q t L o A C o M Q / 2'+l:
+ (LoACoMQ t L o A C o M Q
SC
SC
/ 2s+1{loqical);
to);
<-O):
s h i , f t left
sh?:,ft r i g h t
loqical s h i , f t
sc + - 0 )
1 eniJ i n s t r u c t i o n execution
Chapter 6
R . R . Everett
Project Whirlwind is a high-speed computer activity sponsored chosen as the minimum that would provide a usable single-address
at the Digital Computer Laboratory, formerly a part of the Servo- order, in this case five binary digits for instruction and 11 binary
mechanisms Laboratory, of the Massachusetts Institute of Tech- digits for address. In a future machine we would probably increase
nology (M.I.T.) by the Office of Naval Research (O.N.R.) and the this register length to 20 or 24 binary digits to get additional order
United States Air Force. The project began in 1945 with the flexibility; the increased numerical precision is less important.
assignment of building a high-quality real-time aircraft simulator. For scientific and engineering calculation, greater than 16-digit
Historically, the project has always been primarily interested in precision is often required. There is available a set of multiple-
the fields of real-time simulation and control; but since about the length and floating point subroutines which make the use of
7 most of its efforts have been devoted to the greater precision very easy. It is true that these subroutines are
design and construction of the digital computer known as Whirl- slow, bringing effective machine speed down to about that ob-
wind I (WWI). This computer has been in operation for about tained by acoustic memory machines. It is much more efficient
1 year and an increasing proportion of project effort now is going occasionally to waste computing time this way than continuously
into application studies. to waste a large part of the storage and computing equipment of
Applications for digital computers are found in many branches the machine by providing an unnecessarily long register.
of science, engineering, and business. Although any modern gen-
eral-purpose digital computer can be applied to all these fields, High operating speed
a machine is generally designed to be most suited to some particu- WWI performs 20,000 single-address operations per second. Con-
lar area. Whirlwind I was designed for use in control and simula- trol and simulation problems require very high speeds. The neces-
tion work such as air traffic control, industrial process control, and sary calculations must be carried out in real time; the more com-
aircraft simulation. This does not mean that Whirlwind will not plex the controlled system is, the faster the computer must be.
be used on applications other than control. About one-half the There is no practical upper limit to the computing speed that
available computing time for the next year will be assigned to could be used if available.
engineering and scientific calculation including research in such Where the problems are large enough, and these problems are,
uses supported by the O.N.R. through the M.I.T. Committee on one high-speed machine is much better than two simpler machines
Machine Methods for Computation. of half the speed. Communication between machines presents
These control and simulation problems result in a specialized many of the same problem that communication between human
emphasis on computer design. beings presents.
Great effort was put into WWI to obtain high speed. The target
Short register length
speed was 50,000 single-address operations per second, and all
WWI has 16 binary digits and the control problems are usually parts of the machine except storage meet this requirement. The
very simple mathematically. Furthermore, the computer is almost actual WWI present operating speed of 20,000 single-address
always part of a feedback rather than an open-ended system. operations per second is on the lower edge of the desired speed
Consequently, roundoff errors are seldom troublesome and the range.
register length can be shortened to something comparable to the
sensitivity of the physical quantities involved, perhaps five decimal Large internal storage
places or less. WWI now has 1,280 registers. A large amount of high-speed in-
WWI has a register length of 16 binary digits including sign ternal storage is needed since it is not in general possible to use
or about four and one-half decimals. The register length was slow auxiliary storage because of the time factor. In many cases
lAIEE-IRE Conf., 70-74 (1951) a magnetic drum can be useful since its access time is short com-
137
138 Part 2 1 The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
pared to the response times of real systems. Even with a drum Tube count 5,000, mostly single pentodes
there is considerable loss of computing and programming efficiency Crystal count 11,000
due to shuffling information back and forth between drum and
computer. There are 32 possible operations, of which about 27 are as-
WWI is designed for 2,048 registers of storage. Until recently signed. They are of the usual types: addition, subtraction, multi-
there has been available only about 300 registers. This number, plication, division, shifting by an arbitrary number of columns,
while small, has been adequate for much useful work. Very re- transfer of all or parts of words, subprogram, and conditional
cently a second bank of new-model storage tubes has been added. subprogram. There are terminal equipment control orders and
These new tubes operate at 1,024 spots per tube bringing the total there are some special orders for facilitating double-length and
WWI storage to 1,280 registers. These tubes have been in the floating-point operations.
computer and under test for 2 months and in active use for about One way to increase the effective speed of a machine is to
2 weeks. In the next few months the tubes in the first bank will provide built-in facilities for operations that occur frequently in
be replaced by new-model storage tubes bringing the total storage the problems of interest. An example is an automatic co-ordinate
to 2,048. This number is on the lower end of what the project transformation order. The addition of such facilities does not affect
considers desirable. What the computer business needs, has the general-purpose nature of the machine. The machine retains
needed, and will probably always need is a bigger, better, and its old flexibility but becomes faster and more suited to a certain
faster storage device. class of problems.
Extreme reliability From March 14, 1951, at which time we began to keep detailed
records, until November 22, 1951 a total of 950 hours of computer
In a system where much valuable property and perhaps many
time were scheduled for applications use. The machine has been
human lives are dependent on the proper operation of the com-
running on two shifts or a total of about 3,000 hours during this
puting equipment, failures must be very rare. Furthermore, check-
interval. The two-thirds time not used for applications has been
ing alone, however complete, is inadequate. It is not enough
used for machine improvement, adding equipment, and preventive
merely to know that the equipment has made an error. It is very
maintenance.
unlikely that a man, presumably not too well suited to the work
Of the 950 hours available, 500 have been used by the scientific
during normal conditions, can handle the situation in an emer-
and engineering calculation group, the rest for control studies. The
gency. Multiple machines with majority rule seem to be the best
limited storage available until recently has been admittedly a
answer. Self-correcting machines are a possibility but appear to
serious handicap to the scientific and engineering applications
be too complicated to compete, especially as they provide no
people. There has not been room in storage for the lengthy sub-
standby protection.
routines necessary for convenient use of the machine. The largest
The characteristics of the Whirlwind I computer may be re-
part of their time has been spent in training, in setting up pro-
capitulated as follows:
cedures, and in preparing a library of subroutines.
Register length 16 binary digits, parallel A partial list of the actual problems carried out by the group
includes:
Speed 20,000 single-address operations per
second
An industrial production problem for the Harvard Eco-
Storage capacity Originally 256 registers nomics School
Recently 320 registers
Magnetic flux density study for our magnetic storage work
Presently 1,280 registers
Target 2,048 registers Oil reservoir depletion studies
Order type Single-address, one order per word Ultra-high frequency television channel allocation investi-
gation for Dumont
Numbers Fixed point, 9’s complement
Optical constants of thin metal films
Basic pulse 1 megacycle
Computation of autocorrelation coefficients
repetition 2 megacycles (arithmetic element only)
frequency Tape generation for a digitally-controlled milling machine
Chapter 6 1 The Whirlwind I computer 139
I, I1
A 1h
Control
The WW control is divided into several parts, as shown in Fig. 3.
Central control
The central control of the machine is the master source of control
pulses. When necessary the central control allows one of the other
controls to function. In general there is no overlapping of control
operation; except for terminal equipment control, only one of the
controls is in operation at any one time.
Storage control
Storage control generates the sequence of pulses and gates that
operate the storage tubes. Central control instructs the storage
control either to read or to write.
Arithmetic control
Arithmetic control carries out the details of the more complex
Fig. 1. Sample computer output. arithmetic operations such as multiplication and division. The
140 Part 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
SWITCH
combined with the central control. The major reason they are not
I I I I
is that they were designed at different times. The arithmetic ele-
ment and its control came first, followed by central control. At
the time central control was designed, the necessary characteristics
of storage control were unknown. In fact, the machine was de-
signed so that any parallel high-speed storage could be used. The
form of terminal equipment control was also unknown at this time.
Since flexibility was a prime specification, it was felt preferable
to build separate flexible controls for the various parts of the
computer than to try to combine all the needed flexibility in one
central control.
In a new machine we would attempt to combine control func-
tions where possible, hoping to have enough prior knowledge Fig. 4. Operation control.
Chapter 6 1 The Whirlwind I computer 141
Central control
The Central Control of the machine is shown in Fig. 5 . The control
switch is in the foreground with the operation matrix to the right.
Electrostatic storage
The electrostatic storage shown in Fig. 6 consists of two banks
of 16 storage tubes each. There is a pair of 32-position decoders
w CLOCK
I PULSEs
viewing wave forms on the fly anywhere in the machine.
Arithmetic element
The arithmetic element, see Fig. 7 , consists of three registers, a
counter, and a control.
The first register is an accumulator (AC) which actually consists
of a partial-sum or adding register and a carry register. The accu-
mulator holds the product during multiplication.
The second or A-register holds the multiplicand during multi-
plication. All numbers entering the arithmetic element do so
through AR.
The third or B-register holds the multiplier during multiplica-
tion. The accnmulator and B-register shift right or left. A high-speed
carry is provided for addition. Subtraction is by 9’s complement
and end-around-carry. Multiplication is by successive additions,
division by successive subtractions, and shift orders provide for
shifting right or left by an arbitrary number of steps, with or
without roundoff.
The arithmetic element is straightforward except for a few
special orders and the high speed a t which it operates. Addition
takes 3 microseconds complete with carry; multiplication, 16
microseconds average including sign correction.
In Fig. 8 are shown several digits of the arithmetic element.
The large panels are accumulator digits. Above the accumulator
is the B-register, below it the A-register.
Test control
Test control, shown in Fig. 9, is used a t present both for operating
and for trouble shooting the computer. The control includes: Fig. 8. View of arithmetic element. . i d ~\L.-I[ r c(
Chapter 6 I The Whirlwind I computer 143
This great complexity of terminal equipment requires a flexible general the computer continues to run during terminal equipment
switching system. There is a single in-out register (IOR) through wait times; suitable interlocks are provided to prevent trouble.
which most of the data passes. This complete equipment has not yet been fully installed.
There is a switch which is set up by an order to select the
desired piece of terminal equipment. Other orders put data into References
IOR or remove data from IOR. The in-out control provides the whirlwind: EverR51; SerrR62; TaylN51.
necessary control pulses to go with each type of equipment. In EdSAC: SamuA57; WilkM56.
Chapter 6 I The Whirlwind I computer 145
Note: In operations mr, mh, dv, sir, srr, srh, sf, the C(BR) is assumed to be
the magnitude of the least significant part of AC + BR. For the ab and dm oper-
ations, the BR is treated just as any storage register.
Whirlwind I Instruction Code came from "Comprehensive System Manual, A
System of Automatic Coding for the Whirlwind Computer," published by Massa.
chusetts Institute of Technology, Digital Computer Laboratory, Cambridge, Mass.
Some aspects of the logical design of
a control computer: a case study1
Summary Some logical aspects of a digital computer for a space vehicle reason for a given choice is that it is the same as, or the logical
are described, and the evolution of its logical design is traced. The intended next step to, a choice that was made once before.
application and the characteristics of the computer’s ancestry form a frame- A recent conference on airborne computers [Proc. Con.. Space-
work for the design, which is filled in by accumulation of the many decisions borne Computer Eng., Anaheim, Calif., Oct. 30-31, 19621 affords
made by its designers. This paper deals with the choice of word length, a view of how other designers treated two specific problems: word
number system, instruction set, memory addressing, and problems of multi-
length and number system. All of these computers have word
ple precision arithmetic.
lengths of the order of 22 to 28 bits, and use a two’s complement
The computer is a parallel, single address machine with more than
system. The AGC stands in contrast in these two respects, and
10,000 words of 16 bits. Such a short word length yields advantages of
efficient storage and speed, but at a cost of logical complexity in connection our reasons for choosing as we did may therefore be of interest
with addressing, instriiction selection, and multiple-precision arithmetic. as a minority view.
146
Chapter 7 I Some aspects of the logical design of a control computer: a case study 147
OUT -
-
SEQUENCE
GENERATOR
-LysyI
T1 INSTRUCTION
MICROPROGRAM
USES
- -, I ARITHMETIC UNIT
ADDER
I
L --_------____ _J
The only logical difference between the two memories is the M B both of which were mentioned above. There is also a block
inability to change the contents of the fixed part by program steps. of addressable registers called “central and special registers,”
Each word in memory is 16 bits long (15 data bits and an odd which will be discussed later, an arithmetic unit, and an instruc-
parity bit). Data words are stored as signed 14 bit words using tion decoder register SQ.
a one’s complement convention. Instruction words consist of 3 The arithmetic unit has a parity generating register and an
order code bits and 12 address code bits. adder. These two registers are not explicitly addressable.
The contents of the address register S uniquely determine the The SQ register bears the same relation to instructions as the
address of the memory word only if the address lies between octal S register bears to memory locations; neither S nor SQ are ex-
0000 and octal 5777, inclusive. If the address lies between octal plicitly addressable.
6000 and octal 7777, inclusive, the address in S is modified by the The central and special registers are A, Q, 2, LP, and a set of
contents of the memory bank register M B . The modification con- input and output registers. Their properties are shown in Table 1.
sists in adding some integral multiplies of octal 2000 to the address
Sequence generator
in S before it is interpreted by the decoding circuitry. The memory
bank register M B is itself addressable; its address, however, is not The sequence generator provides the basic memory timing, the
modified by its own contents. sequences of control pulses (microprograms) which constitute an
Transfers in and out of memory are made by way of a memory instruction, the priority interrupt circuitry, and a number of scal-
local register 6. For certain specific addresses, the word being ing networks which provide various pulse frequencies used by the
transferred into G is not sent directly, but is modified by a special computer and the rest of the navigation system.
gating network. The transformations on the word sent to G are Instructions are arranged so as to last an integral number of
right shift, left shift, right cycle, and left cycle. memory cycles. The list of 11 instructions is treated in detail in
Sec. 6. In addition to these there are a number of “involuntary”
Central section sequences, not under normal program control, which may break
The middle part of Fig. 1 shows the central section in block form. into the normal sequence of instructions; these are triggered either
It consists of the address register S and the memory bank register by external events, or by certain overflows within the AGC, and
148 Part 2 1 The instruction-set processor: main-line computers Section 1 1 Processors with one address per instruction
Table 1 Special and central registers rather than between memory cycles. An interruption consists of
storing the contents of the program counter and transferring con-
Octal
Register ( s ) address Purpose and/or properties trol to a fixed location. Each interrupt line has a different location
associated with it. Interrupting programs may not be interrupted,
A 0000 Central accumulator. Most instructions refer but interrupt requests are not lost, and are processed as soon as
to A .
the earlier interrupted program is resumed. Calling the resume
0001 If a transfer of control (TC) occurred at L , sequence, which restores the program counter, is initiated by
0 referencing a special address.
( Q ) = L + 1.
3 Instruction word format. Division of instruction words into must be extended into a second register, either by programmed
two fields, one for operation code and one for address. scanning of the counter register, or by using a second counter
register to receive the overflows of the first. Whether programmed
As a start, the choice of word length (15 bits) for two previous
scanning is feasible depends largely on how frequently this scan-
machines in this series was kept in mind as a satisfactory word
ning must be done. The cost of using an extra counter register
length from the point of view of mechanization; i.e., the number
is directly measured in terms of the priority circuit associated
of sense amplifiers, inhibit drivers, the carry propagation time, etc.,
with it.
were all considered satisfactory. The act of “choosing” word length
In the AGC, the equipment saved by reducing the word length
really meant whether or not to alter the word length, at the time
below 15 bits would probably not match the additional expense
of change from MOD 3C to the AGC, and in particular whether
incurred in double-precision extension of many input variables.
to increase it. The influence of the three principal factors will be
The question is academic, however, since a lower bound on the
taken up in turn.
word length is effectively placed by the format of the instruction
Precision of data words word.
The data words used in the AGC may be divided roughly into
Instruction word format
two classes: data words used in elaborate navigational computa-
tions, and data words used in the control of various appliances An initial decision was made that instructions would consist of
in the system. Initial estimates of the precision required by the an operation code and a single address. The straightforward
first class ranged from 27 to 32 bits, 0(108”). The second class choices of packing one or two such instructions per word were
of variables could almost always be represented with 15 bits. The the only ones seriously considered, although other schemes, such
fact that navigational variables require about twice the desired as packing one and a half instructions per word, are possible
15-bit word length means that there is not much advantage to [England, 19621. The previous computers MOD 3s and MOD 3C
word sizes between 15 and 28 bits, as far as precision of represen- had a 3-bit field for operation codes and a 12-bit field for addresses,
tation of variables is concerned, because double-precision numbers to accommodate their 8 instruction order codes and 4096 words
must be used in any event. Because of the doubly signed number of memory. In the initial core-transistor version of the AGC (i.e.,
representation for double-precision words, the equivalent word MOD 3C), the 8 instruction order codes were in reality augmented
length is 29 bits (including sign), rather than 30, for a basic word by the various special registers provided, such as shift right, cycle
length of 15 bits. left, edit, so that a transfer in and out of one of these registers
The initial estimates for the proportion of 15-bit vs 29-bit would accomplish actions normally specified by the order code
quantities to be stored in both fixed and erasable memories indi- (see Sec. 6).These registers were considered to be more economical
cated the overwhelming preponderance of the former. It was also than the corresponding instruction decoding and control pulse
estimated that a significant portion of the computing had to do sequence generation. Hence the 3 bits assigned to the order code
with control, telemetry and display activities, all of which can be were considered adequate, albeit not generous. Furthermore, as
handled more economically with short words. A short word length will be seen, it is possible to use an indexing instruction so as to
allows faster and more efficient use of erasable storage because increase to eleven the number of explicit order codes provided
it reduces fractional word operations, such as packing and editing; for.
it also means a more efficient encoding of small integers. The address field of 12 bits presented a different problem. At
the time of the design of MOD 3C we estimated that 4000 words
Range of input variables would satisfy the storage requirements. By the time of redesign
As a control computer, the AGC must make analog-to-digital it was clear that the requirement was for lo5 words, or more, and
conversions, many of which are of shaft angles. Two principal the question then became whether the proposed extension of the
forms of conversion exist: one renders a whole number, the other address field by a bank register (see Sec. 7) was more economical
produces a train of pulses which must be counted to yield the than the addition of 2 bits to the word length. For reasons of
desired number. The latter type of conversion is employed by the modularity of equipment, adding 2 more bits to the word length
AGC, using the counter incrementing feature. would result in adding 2 more bits to all the central and special
When the number of bits of precision required is greater than registers, which amounts to increasing the size of the nonmemory
the computer’s word length, the effective length of the counter portion of the AGC by 10 per cent.
150 Pari 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
In summary, the 15-bit word length seemed practical enough for input conversions from such devices as pattern generators,
so that the additional cost of extra bits in terms of size, weight, geared encoders, or binary scalers. Sign reversal is awkward, how-
and reliability did not seem warranted. A 14-bit word length was ever, since a full addition is required in the process.
thought impractical because of the problems with certain input The choice in the case of the AGC was to use one’s complement
variables, and it would further restrict the already somewhat arithmetic in general processing, and two’s complements for cer-
cramped instruction word format. Word lengths of 17 or 18 bits tain input angle conversions. Since the only arithmetic done in
would result in certain conceptual simplicities in the decoding the latter case is the addition of plus or minus one, the two’s
of instructions and addresses, but would not help in the represen- complement facility is provided simply by suppressing end around
tation of navigational variables. These require 28 bits, and so they carry and using the proper representation of minus one. The latter
must be represented to double precision in any event. is stored as a fixed constant, so that no sign reversal is required.
- ~-
STANDARD MODIFIED
-
S I 4 3 2 1 3 2 1
EXAMPLE 5: Operands have opposite sign; Sum positive. Identical results i.1 both 1 1 1 1 0 1 1 1 1 1 0
systems. 0 0 0 1 1 0 0 0 0 1 1
0 0 0 0 1 0 0 0 0 0 1
1 carry 1 carry
0 0 0 1 0 0 0 0 0 1 0
5. Multiple precision arithmetic representation in which the sign bits of all component words agree.
A short word computer can be effective only if the multiple- The method used in the AGC allows the signs of the components
precision routines are efficient corresponding to their share of the to be different.
computer’s word load. In the AGC’s application there is enough Independent signs arise naturally in multiple-precision addition
use for multiple-precision arithmetic to warrant consideration in and subtraction, and the identical sign representation is costly
the choice of number system and in the organization of the instruc- because sign reconciliation is required after every operation. For
tion set. Although the limited number of order codes prohibits example, ( 6, 4) + + + +
( - 4, - 6) = ( 2 , - 2), a mixed sign repre-
multiple-precision instructions, special features are associated with +
sentation of ( +1, 8). Since addition and subtraction are the most
the conventional instructions to expedite multiple-precision opera- frequent operations, it is economical to store the result as it occurs
tions. and reconcile signs only when necessary. When overflow occurs
in the addition of two components, a one with the sign of the
Independent sign representation
overflow is carried to the addition of the next higher components.
A variety of formats for multiple-precision representation are The sum that overflowed retains the sign of its operands. This
possible; probably the most common of these is the identical sign overflow is termed an interflow to distinguish it from an overflow
152 Part 2 I The instruction-set processor: main-line computers Section 1 1 Processors with one address per instruction
that arises when the maximum multiple-precision number is ex- Storing in memory
ceeded. Negating (complementing)
The independent sign method has a pitfall arising from the fact
that every number has two representations, either one of which Combining two operands (e.g., addition)
may occur as a sum. There are some numbers for which one of Address modification (more generally, executing as an in-
the representations exceeds the capacity of the most significant struction the result of arithmetic processing)
component. The overflow is false in the sense that the double- Normal sequencing (to each location from which an instruc-
precision capacity is not exceeded, only the single word capacity tion can be executed there corresponds one location whose
of the upper component. Sign reconciliation can be used in this contents are the next instruction)
case to yield an acceptable representation. This problem can be
Conditional sequence changing, or transfer of control
avoided if all numbers are scaled so that none are large enough
to produce false overflows. Such a restriction is not necessary, Input
however, since the false overflow condition arises infrequently and output
can be detected at no expense in time. The net cost of reconcilia-
tion is therefore very low. An instruction can, of course, provide several of these facilities.
For instance, some computers have an instruction that subtracts
Multiplication and division
the contents of a memory location from an accumulator and leaves
For triple and higher orders of precision, multiplication and divi- the result in that memory location and in the accumulator; this
sion become excessively complex, unlike addition and subtraction instruction fulfills all of requirements 1-4 above. Requirement 5
where the complexity is only linear with the order of precision. is met in a somewhat primitive manner if instructions can be
The algorithm for double-precision multiplication is directly executed from erasable memory, and is met elegantly by the use
applicable to numbers in the independent sign notation. False of index registers. Still another scheme, somewhat similar to one
overflow does not arise, and the treatment of interflow is simplified used in the Bendix G-20, is employed in the AGC. Requirement
by an automatic counter register which is incremented when 6 is usually fulfilled by having an instruction location counter
overflow occurs during an add instruction. The sign of the counter which contains the address of the next instruction to be executed,
increment is the same as the sign of the overflow; and the incre- and is incremented by one when an instruction is fetched. Alter-
ment takes place while one of the product components of next natively, each instruction may include the address of the next
higher order is stored in that counter. instruction, as is often done in machines having drum memories.
Double-precision division is exceptional in that the independ- In the AGC, as in most short-word computers, the former method,
ent sign notation may not be used; both operands must be made with one single-address instruction per word, is clearly the simplest
positive in identical sign form, and the divisor normalized so that and cheapest. Requirement 7 is generally met by examining a
the left-most nonsign bit is one. condition such as the s i p of an accumulator and, if the condition
is satisfied, either incrementing the instruction location counter
Triple precision
(skipping), or using an address included in the instruction as that
A few triple-precision quantities are used in the AGC. These are of the next instruction (conditional transfer of control). An uncon-
added and subtracted using independent sign notation with inter- ditional transfer of control is usual but not necessary, since any
flow and overflow features the same as those used for double- desired condition can be forced. Most machines have special
precision arithmetic. input-output instructions to satisfy requirements 8 and 9. In the
AGC, however, since input and output is through addressable
registers, input is subsumed under fetching from memory, and
6. Instruction set
output under storing in memory. Counter incrementing and pro-
Basic design criteria gram interruption aid these functions also.
The implicit requirements for any von Neumann-type machine
Further criteria
demand that facilities exist for:
The major goals in the AGC were efficient use of memory, reason-
1 Fetching from memory able speed of computing, potential for elegant programming, effi-
Chapter 7 I Some aspects of the logical design of a control computer: a case study 153
cient multiple precision arithmetic, efficient processing of input operation code there should be only one code devoted entirely to
and output, and reasonable simplicity of the sequence generator. branching, if at all possible. It is inefficient to program a zero test
The constraints affecting the order code as a whole were the word using only a sigmtesting code; it is even more inefficient to pro-
length, one’s complement notation, parallel data transfer, and the gram a sign test using only a zero-testing code. This instruction
characteristics of the editing registers. The ground rules governing was therefore designed to test both types of conditions simultane-
the choice of instructions arose from these goals and constraints. ously. It has to be a four-way branch, and since there is only one
address per instruction, it follows that CCS must be a skipping-
a Three bits of an instruction word are devoted to operation type branch.
code. The function of ( K ) delivered to A is the diminished absolute
b Address modification must be convenient and efficient. value (DABS). It serves two primary purposes: to do most of the
work in generating an absolute value, and to apply a negative
c There should be a multiply instruction yielding a double
increment to the contents of a loop-counting register, so that CCS
length product.
has some of the properties of TIX in the IBM 704.
d Treatment of overflow on addition must be flexible.
L: INDEX K ; Index using K
e A Boolean combinatorial operation should be available. Use ( L + 1) + ( K ) as the next instruction.
In a short-word machine where there is no room in the instruc-
f No instruction need be devoted to input, output, or shifting.
tion word to specify indexing or indirect addressing, this code
This list is by no means complete, but gives a good indication of meets requirement 5 in a way far superior to forming an instruction
what kind of computer the AGC has to be. In the following para- and placing it in A or in erasable memory for execution. INDEX
graphs the ways in which the instructions fulfill the above require- operates on whole words, so that the operation code as well as
ments are described. the address may be modified. It may be used recursively (consider
the implications of several INDEX’S in succession, assuming that
no operation codes are modified). Finally, it permits more than
Details of the instruction set
8 operation codes to be specified in 3 bits, since overflow of the
In the listing that follows, L denotes the location of the instruction; indexing addition is detectable.
K denotes the data address contained in the instruction. Paren- L: XCH K ; Exchange
theses mean “content of,” and the leftward arrow means that the (A)*(K).
register named at the arrowhead is set to the quantity named to This instruction meets requirements 1, 2, and 8. When K is
the right. in fixed memory, it is simply a data-fetching (clear and add) code.
L: TC K ; Transfer Control Its use with erasable memory aids efficiency by reducing the need
Q c L + 1; go to K . for temporary storage. XCH is also an important input instruction
This is the primary method of transferring control to any stated in a machine where addressable counters, incremented in response
location, and thus meets part of requirement 7 . The setting of the to external events, are an input medium, because a counter can
return address register Q renders complex subroutines feasible. TC be read out and reset (to zero or any desired value) by XCH with
Q may be used to return from a subroutine (with no other TC’s) no chance of missing a count.
because the binary number “ L + 1” is the same as the binary word L: CS K ; Clear and Subtract
“TC L + 1,” by virtue of the TC code being all zeros. TC A A c -(K).
behaves like an “execute” instruction, executing whatever instruc- CS is the primary means of sign-changing and logical negation,
tion is in A, because Q follows A in the address pattern, see and so fulfills requirements 1 and 3. Since there is no clear and
Table 1. add instruction, it is the usual operation for nondestructive readout
L: CCS K ; Count, Compare, and Skip of erasable memory in simple data transfers, that is, when no
>
If ( K ) +0, A c ( K ) - 1,no skip; if ( K ) = +0, A t +0, skip addition or other arithmetic is required. Usually the programming
+ < +
to L 2 ; if ( K ) -0, A t 1 - (K), skip to L 3; if ( K ) = can be arranged so that complementing during transfer is accept-
-0, A t +0, skip to L + 4. able; otherwise the CS can be followed by CS A before storing.
This instruction fulfills the remainder of requirement 7 and L: TS K ; Transfer to Storage
provides several features. It is clear that in a machine with a 3-bit K +(A); if (A) includes ? overflow, A c 5 1 , skip to L 2. +
154 Part 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
This instruction is the primary means of transfers to memory clude the property of skipping on overflow, although it did have
and output, satisfying requirements 2 and 9. It is also the most properties which aided masking.
convenient method of testing for overflow. Since A and the other After the design of MOD 3C was completed, it was discovered
central registers have two sign positions, overflow indication is that the INDEX instruction could be used to expand the instruc-
retained in a central register. TS always stores (A) and tests tion set beyond eight instructions by producing overflow in the
whether overflow is present. If K is in erasable memory and is instruction word following the INDEX. For example, the addition
not a central register, the lower-order sign bit SI is not transmitted; of octal 47777 to the instruction word “CS K” in the course of
this is the process or overflow correction. If positive overflow an INDEX instruction will cause negative overflow, producing MP
indication is present in A, TS skips over the next instruction and K , a multiply instruction with operand address K .
sets A t +1 ( + 1 denotes octal 000001); if negative overflow is In order to implement the extracodes in the AGC, it was
present, TS skips over the next instruction and sets A t - 1 ( - 1 necessary to provide a path from the high-order 4 bits of the adder
denotes octal 177776);otherwise (A) are unchanged. The sequence to the unaddressable sequence selection register SQ. Part of this
path is the unaddressable buffer register B ; these requirements
TS K
helped to suggest the benefits of retaining two sign bit positions
XCH ZERO (ZERO in fixed memory)
in all the central registers.
suffices to store in K an overflow-corrected word of a multiple- In principle, eight additional instruction codes can be obtained
precision sum and leave in A the interflow to the next higher-order by causing overflow, but we did not feel obliged to use them all.
part. TS A skips if either type of overflow is present, but leaves Because every extracode must be indexed, the instructions chosen
all 16 bits of (A) unchanged. for this class had two properties to some degree: they are normally
Finally, a computed transfer of control may be achieved by indexed, or they take long enough so that the cost of indexing
TS Z because Z is the program counter; only the low-order 12 without address modification is small. All the extracodes are com-
bits of (A) are significant, being the address of the instruction to binatorial, and therefore relate to requirement 4.
which control is transferred. Overflow in (A) in this case does not L: M P K ; Mul t i pl y
affect the transfer but sets A t 5 1 . -
A t upper part, LP t lower part, of (A) ( K ) ;the two words
L: AD K; Add of the product agree in sign, which is determined strictly by the
A + ( A ) + ( K ) ; if the final (A) includes 2 overflow, sign bits of the operands.
OVCTR t (OVCTR) t l . Experience with MOD 3C showed that it was worthwhile
Addition is the most frequently used combinatorial operation making a completely algebraic, self-contained multiply instruction,
(requirement 4). The property of OVCTR is used chiefly in devel- especially in doing double-precision multiplication whose oper-
oping double-precision products and quotients, partly because the ands have independent signs. The AGC multiply is much faster
additions in these processes are less susceptible to false overflow than that of MOD 3C, being limited by adder carry propagation
than are multiple-precision additions. time rather than core-switching time.
L: MASK K ; Mask L: DV K ; Di v i de
A t (A) n ( K ) . A t quotient, Q t - 1 remainder 1, of (A)/(K); LP t nonzero
This is the only combinatorial Boolean instruction, and may number with the sign of the quotient.
be used with CS to generate any Boolean function. Many facets of AGC design originally adopted for other reasons
combined to make a divide instruction inexpensive. The foremost
Ex tracodes of these is the nature of the editing registers, which are in the
The AGC instruction set was carried over in large part from its standard erasable memory and have no special wiring. The special
ancestor, MOD 3C [Alonso et al., 19611. All instructions of MOD properties of these registers are supplied by a shift or cycle of the
3C were retained in the AGC, modifications and additions being word being written into the memory local register G, when the
adopted where a substantial increase in computing power could address of an editing register is selected. The central loop of DV
be obtained at small cost. The MOD 3C instruction set was like selects such an address and inhibits memory operations, so that
the one described above for the AGC with two major exceptions: all the left shifts required in division are accomplished in the G
first, instead of a mask instruction, MOD 3C had a multiply in- register while the editing register itself remains unchanged. The
struction. Second, the transfer to storage instruction did not in- microprogrammed nature of order construction makes a restoring
Chapter 7 I Some aspects of the logical design of a control computer: a case study 155
algorithm more efficient than a nonrestoring one. The quotient Table 2 Address part of an instruction word
delivered to A has a sign determined according to normal algebraic
(Decimal)
rules by the signs of (A) and ( K ) ; the same sign is available in LP
to aid in determining the correct sign of the remainder from those 0-3071 Fixed and erasable memory: unambiguous addresses.
of the divisor and quotient in case the quotient has been absorbed 3072-4095 Fixed memory, ambiguous address. Contents of MB
by subsequent processing. DV is not usually indexed, but it pays used to resolve the ambiguity. Up to 32 such banks
are possible.
such large benefits in space and time, especially in double-pre-
cision division, that the cost of extracode indexing is negligible.
If the divisor is less in magnitude than the dividend, or is zero,
the quotient has correct sign and, in general, maximum magnitude. ical difference between fixed and erasable memory. Since all data
No infinite loop results in any case. other than constants are concentrated in the erasable memory,
L: SU K ; Subtract these had to be exempt from modification by the MB register. An
A c (A) - ( K ) ; if the final (A) includes 2 overflow, alternative arrangement, whereby only the addresses of instruc-
OVCTR t (OVCTR) 21. tions (as opposed to the addresses within an instruction word) are
The primary justification for this instruction is that it allows modified, would be deficient in that it would allow only instruc-
multiple-precision addition subroutines to be changed into multi- tions to be stored in banks; there would be no way to refer to
ple-precision subtract subroutines merely by changing the indexing constants stored in banks, or to use bank addresses to store argu-
quantity. There are occasions in the middle of involved calcula- ments of arithmetic operations. The possibility of using two bank
tions where it is clumsy to construct a subtraction out of comple- registers is worthy of serious consideration [Casale, 19621, but it
mentations and additions, especially when the sign of an overflow did not occur to us.
is of interest. Since SU differs from AD only in that the operand In addition to the addresses in erasable, it is necessary to
from memory is read out of the complement side of the buffer exempt the addresses of interrupting programs ( i e . , the addresses
register B rather than the direct side, its cost is virtually zero. to which a program interrupt transfers control) from the influence
This last is not necessarily true when using core-transistor logic, of the MB register. It was clear that it would be valuable to have
or two’s complement notation. a large body of unambiguous addresses for use in executive and
dispatcher programs.
The most frequent and critical applications of bank changing
7. Expansion of memory addressing are in the AGC’s interpretive mode. Most of the programs relevant
The AGC’s 12-bit address field is insufficient for specifying directly to navigation are written in a parenthesis-free pseudocode notation
all the registers in its memory. This predicament seems increas- for economy of storage. An interpretive program executes these
ingly to afflict most computers, either because indirect addressing pseudocode programs by performing the indicated data accesses
is assumed as a necessary evil from the start or, as was our case, and subroutine linkages.
because our earliest estimates of memory requirements were wrong The format of the notation permits two macrooperators (e.g.,
by a factor of two or three. The method of indirect addressing “double-precision vector dot product”) or one data address to be
we arrived at uses a bank register MB, but with an important stored in one AGC word. Thus data addresses appear as full 15-bit
modification: the 5-bit number stored in M B has no effect unless words, potentially capable of addressing up to 32,768 registers.
the address is in the range (octal) 6000 to 7777. The MB register Each such address is examined in the interpreter and the contents
contents are not interpreted as higher-order bits of the address; of the bank register are changed if necessary; preparation is also
they are interpreted as integers which specify which bank of 1024 made for subsequent return if a subroutine call is being made.
words is meant in the event of the address part of the instruction The structure of the interpretive program, and its relationship
being in the ambiguous range. The over-all map of memory is to the computer characteristics discussed in this paper will not
shown in Table 2. The unambiguous, fixed memory addresses be taken up here except to point out that parenthesis-free notation
domain has come to be known as “fixed-fixed.” is particularly valuable in a short-word computer such as the AGC.
It is interesting that this method of extending the addressing It permits a very substantial expansion of the address and pseudo-
capability was not the result of trying to improve upon more operation fields without sacrificing efficiency in program storage
conventional methods, but was almost a consequence of the phys- [Muntz, 19621.
156 Part 2 I The instruction-set processor: rnain-line computers Section 1 I Processors with one address per instruction
The conversion of a 15-bit address into a bank number and an memory addresses is the possible requirement for a routine in one
ambiguous 12-bit address is as follows: the top 5 bits correspond bank to have access to large amounts of data stored in another.
directly to the desired bank number. The remaining lower-order There are many programming solutions to this problem, obviously
10 bits, logically added to octal 6000, form the proper ambiguous a t a cost in operating speed; a better solution would be to have
address. If the 15-bit address is less than octal 6000, however, the two bank registers. No problems of this nature have yet material-
address is in erasable or fixed-fixed memory. In this case the logical ized, however.
addition of octal 6000 is suppressed.
It is possible to have a program in one bank call a closed
subroutine in another bank, and then have control returned to the References
proper place in the bank of origin. This is done by means of a AlonR63; AlonR6O; AlonR61; AlonR62; ReckF61; CasaC62; EnglW62;
short bank switching routine which is in fixed-fixed memory. HopkA63; MuntC62; RichR55; WaleW62; Proc. Conf. Spaceborne C m -
One potential awkwardness about this method of extending puter Eng.; Anaheim, Calif., Oct. 30-31, 1962.
MOD 1, F:448 11 and parity 4 plus involuntary Feasibllity Prototype Counter increments,
1960 E: 64 Interrupts,
Core-Transistor Logic,
Pulse rate outputs,
Editing registers,
Wired-in fixed memory,
Interpretive programs.
MOD 2, about 4000 total 23 and parity 16 plus indirect Unmanned Space Probe “Extended Operation” subroutine
not built linkages (only instance).
MOD 3S, F: 3584 15 and parity 8 Earth Satellite Modified one’s complement,
1962 E: 512 Parallel adder,
Addressable central registers.
MOD 3C, F: greater than 104 15 and parity 8 and involuntary Apollo Guidance CCS, INDEX, MULTIPLY in-
1962 E: greater than 103 structions,
Overflow counter,
Bank switching.
AGC, F: greater than 104 15 and parity 11 and involuntary Apollo Guidance DV, SU, MSK instructions,
1963 E: greater than 103 Editing memory buffer,
All transistor NOR logic instead of
core-transistor logic,
Extracodes,
Parenthesis-free interpreter.
The UNIVAC system1
Organization of the UNIVAC system which all data must pass during transfer between any arithmetic
In March 1951, the first UNIVAC2 system formally passed its register and the main memory or between the memory and the
acceptance tests and was put promptly into operation by the input-output registers. The arithmetic registers are shown along
the bottom of diagram each connected to the high speed bus
Bureau of the Census. Since the UNIVAC is the first computer
which can handle both alphabetic and numerical data to reach system.
The L-, F-, X-, and A-registers are each of one word or 12-
full-scale operation so far, its operating record and a review of
character capacity and are directly concerned with the arithmetic
the types of problems to which it has been applied provide an
interesting milestone in the ever-widening field of electronic digi- operations. The V- and Y-registers are of 2- and 10-word capacity,
tal computers. respectively. They are used solely for multiple word transfers
The organization of the UNIVAC is such that those functions within the main memory. Associated with the arithmetic registers
which do not directly require the main computer are performed are the algebraic adder (AA), the comparator (CP),and the multi-
by separate auxiliary units each having its own power supply. Thus plier-quotient counter (h4QC).
the keyboard to magnetic tape, punched card to magnetic tape Addition-subtraction instructions
and tape to typewritten copy operations are delegated to auxiliary
The addition-subtraction operations are performed in conjunction
components.
with the comparator since all niimerical quantities are absolute
The main computer assembly includes all of those units which
magnitudes with an algebraic sign attached. Before either an
are directly concerned with the main or central computer opera-
addition or subtraction is performed, the two quantities, one
tions. A block diagram of this arrangement is shown in Fig. 1. All
already in the A-register and the other either from the memory
of the elements shown are contained within the central computer
or from the X-register, depending upon the particular instruction,
casework except the supervisory control desk (SC) and the Uni-
are compared for magnitude and sign. The adder inputs can then
servos,2to which the lines in the upper right section of the diagram
be switched so as always to produce a noncomplemented result
connect.
for any operation. The choice of adder input arrangement is there-
The supervisory control, in addition to all the necessary control
fore under the control of the comparator. The comparator also
switches and indicator lights, contains an input keyboard. Also
determines the proper sign for the result according to the usual
cabled to the supervisory control is a typewriter which is operable
algebraic rules.
by the main computer. By means of these two units, limited
One additional function performed by the comparator for addi-
amounts of information can be inserted or removed either at the
tion and subtraction is to control the complementer. This deter-
will of the operator or by the programmed instructions.
mination is based upon which operation (+, or -) is indicated,
The input-output circuits operate on all data entering or leav-
and, whether the signs are like or unlike. For a subtract instruction,
ing the computer. The input and output synchronizers properly
the sign of the subtrahend is reversed before entering the com-
time the incoming or outgoing data for either the Uniservos (tape
parator. The comparator then compares the signs of the quantities
devices) or the supervisory control devices. The input and output
in order to determine whether the two quantities are subtracted
registers (I and 0) are each 60 word (720 characters) temporary
or added.
storage registers which are intermediate between the main com-
puter and the input-output devices. Multiplication instruction
The high-speed bus amplifier is a switching central through The multiplication process requires the services of the adder, the
'AZEE-IRE Conf., 6-16, December, 1951. comparator, the multiplier-quotient counter and the four arith-
2Registered trade mark. metic registers. During the first step of multiplication the X-reg-
157
158 Part 2 1 The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
STANDARD PULSES
TO ALL UNITS
- \
( 1 A A A i A
i ~ i i i i i
TO AND FROM
UNISERVOS
CYCLlWG UNIT
INPUT-
OUTPUT
CONTROL SIGNALS FROM TO CONTROL
TO GATES ALL UNITS CIRCUITS
r --
I
(1000 WORDS)
I
I __- t---,
I I
I I
I
I
I I I
I I I
I I I
I
I INSTRUCTION M E M LOCATION
1 OlGlTS OlGlTS
I
I
I
CONTROL
-+--- / T STATIC
} - REGISTER
+ S I G S
INPUT-
OUTPUT
TO
L--*-
CONTROL
DISTRIBUTOR LINE
r--- CIRCUITS
I
I
1
TIME OUT
- -.
(TO)
CHECK
CIRCUITS
-- - -----
INPUT FROM REGS. I
AND OTHER UNITS I LEGEND:
I----*-
I
-INFORMATION SIGNALS
;SIGNAL
I -3
;SIGNAL
---- CONTROL SIGNALS 8 PULSES
ister receives the multiplier from the memory and the comparator position to the left. The multiplier-quotient counter counts each
determines the sign of the final product by comparing the signs transfer, thereby building up the first quotient digit. As soon as
of the multiplier and multiplicand. During the next three steps the quantity in the A-register, (neglecting its original sign) goes
the multiplicand, which has been stored in the L-register by some negative, the digit in the multiplier-quotient counter, not counting
previous instruction, is transferred three times to the A-register the transfer which causes the remainder to go negative, is trans-
through the algebraic adder. The result, three times the multi- ferred to the X-register and the remainder in the A-register is
plicand, is then stored in the F-register. During the next 11 steps shifted one place to the left. The divisor is then added to the
of multiplication, the successive multiplier digits, beginning with A-register until the quantity becomes positive. This time the
the least significant, are transferred from the X-register to the multiplier-quotient counter must give the complement of the
multiplier-quotient counter. The multiplier-quotient counter then number of transfers for the real quotient digit. Special comple-
determines whether each particular multiplier digit is less than menting read-out gates provide this method of interpreting the
three, or greater than or equal to three. multiplier-quotient counter.
If the former, the L-register releases the multiplicand to the The X-register therefore collects the quotient, digit by digit,
A-register via the adder, and the multiplier-quotient counter is from the multiplier-quotient counter until the full 11 digits have
stepped downward one unit. If the multiplier digit is equal to or been obtained. The quotient is then transferred to the A-register
greater than three, the multiplier-quotient counter sends a signal and the sign from the comparator (CP) is affixed during the final
to the F-register which releases three times the multiplicand to stage of the divide instruction.
the A-register and the multiplier-quotient counter is stepped three The other internal operations of the UNIVAC include many
times. Thus a multiplier digit of seven would be processed as two transfer instructions by which words may be moved among the
transfers from the F-register to the A-register and one transfer from registers and memory with and without clearing, the extraction
the L-register to the A-register, or a total of three transfers. instruction by which certain digits of a word may be extracted
When the multiplier-quotient counter reaches zero, the next into another word according to the parity of the corresponding
multiplier digit is brought in from the X-register, while the A-reg- digits of an extractor word; shift instructions; and special control
ister, containing the first partial product, is shifted one position instructions such as breakpoint, transfer of control, (explained in
to the right. subsequent paragraphs) and stop.
During the final step of multiplication, the sign is attached to
the product which has been built up in the A-register. One of the Basic operating cycle
several available multiplication instructions causes the least sig- The basic operating cycle of the UNIVAC is founded upon single
nificant digits, as they are shifted beyond the limits of the A-reg- address instructions which specify the memory location of one
ister, to be transferred to the X-register where they replace the word. In the case of the arithmetic instructions which require two
multiplier digits as they are moved to the multiplier-quotient operands, one of the operands must be moved into the proper
counter. Thus 22 place products can be obtained as well as 11 register by some previous instruction. In order to control the
place. sequence of instructions, a special counter, called the control
counter (CC),retains the memory location from which the succeed-
Division instruction ing instruction word is to be obtained. Each time a new instruction
The division operation is performed by a nonrestoring method. The word is received from the memory, the quantity in the control
divisor is stored in the L-register by some previous instruction and counter is passed through the adder where a unit is added to it.
the dividend is brought from the memory and put in the A-register Therefore the normal sequence is to refer to successive memory
during the first step of the division instruction. As in multiplica- locations for successive instruction words. Initially the control
tion, the signs of the two operands are compared in the comparator counter is cleared to zero and the first group of instructions must,
at this time and the sign of the quotient is then stored in the therefore, be placed in memory locations from zero upward. A
comparator pending completion of the division operation. The transfer of control instruction enables the programmer to change
principal stages of division consist of transferring the divisor from the control counter reading whenever desired and thus shift from
the L-register to the A-register through the complementer and one sequence to another. After a transfer of control takes place,
adder as many times as required to produce a quantity less than the new number in the control counter is increased by unity each
zero in the A-register, the dividend having been first shifted one time a new instruction word is obtained from the memory.
160 Part 2 I The instruction-set processor: main-line computers Section 1 1 Processors with one address per instruction
Transfer of control instructions the function table. By virtue of this relation, CY develops signals
The transfer of control instructions are of three types, the uncon- in addition to those developed by the instruction, which, for ex-
ditional transfer which changes the control counter reading with- ample, can cause the control register to transfer the second half
out question, and two conditional instructions which require that of the instruction word into the static register when the first half
either equality or a specific inequality exists between the words has been completed. Similarly, after the second half instruction
is finished the cycle counter causes the reading of the control
in the A-register and the L-register. In the former case the quan-
counter to pass into the memory location section of the static
tities must be identical for transfer of control to occur and in the
register and thus cause the next instruction word to be transferred
latter the quantity in the A-register must be greater than the
from the memory to the control register. When the word reaches
quantity in the L-register for the control counter reading to be
the control register, the cycle counter also causes the control
changed.
counter reading to be increased by unity. The four cycles are
Since the UNIVAC can handle alphabetic as well as numerical
designated by the first four Greek letters a (transfer CC to SR),
data, these conditional transfer instructions are as useful for alpha-
,8 (transfer memory to CR), y (perform first instruction), and S
betizing as they are to determine if a certain iterative arithmetic
(perform second instruction).
process has been performed often enough to come within specified
numerical tolerances.
Program counter
Control register The multistage instructions, such as multiplication, are guided
Since six characters (intermixed alphabetic and numerical) are through their various steps by the program counter (PC). The
sufficient to specify an instruction and there are 12 characters per program counter has four stages or 16 positions. All multistage
word, each instruction word can represent two independent in- instructions can be performed within this number of steps.
structions. A 1-word register, called the control register (CR),has
been provided which stores each instruction word as it comes from Checking circuits
the memory. Thus one memory referral is sufficient for a pair of
The checking circuits of the UNIVAC are of two main types,
instructions and the control register stores both halves so that the
odd-even checkers and duplicated equipment with comparison
second instruction is available as soon as the first has been com-
circuits. The odd-even checker depends upon the design of the
pleted.
pulse code used within the computer. This code provides seven
The general term control circuits includes all those elements
pulse positions for every character. Six of the seven positions are
which work together to process the instruction routine. As each
significant as the actual code while the seventh is the odd-even
instruction word reaches the control register, the first half of it
channel. If the number of pulses or ones within the first six chan-
is passed immediately into the static register (SR). The static
nels of any character is even, a one is placed in the seventh channel
register drives the main function table and memory switch. The
to make the total odd. Thus, the total number of ones across the
instruction digits are translated by the function table into the
seven channels is always odd. By means of a binary counter and
appropriate control signals for the instruction called for. The
a few gates, an odd-even checker has been constructed which
memory switch selects the location called for by the memory
examines every seven pulse group which passes through the high
location digits and opens the proper memory channel to the high-
speed bus amplifier. In this connection, mention must be made
speed bus system at the proper time. Since the memory is con-
of the periodic memory check which interrupts operation every
structed of 100 channels, each holding ten words, the memory
five seconds to pass the entire contents of the memory over the
switch is a combination of spatial and temporal selection.
high speed bus system and, consequently, through the odd-even
checker. Any discrepancy is immediately signalled to the super-
Cycle counter
visory control and further operation ceases.
Implicit within each instruction, as translated by the function The duplicated equipment type of checking consists of dupli-
table, is an ending signal which causes the computer to move on cating the most essential part of the arithmetic circuits and their
to the next instruction. The key to this sequence is the cycle controls and producing simultaneously independent results, which
counter (CY), which is advanced by the ending pulse. The cycle can then be compared for equality. For this type of checking, the
counter is a 2-stage 4-position counter, which is connected into A-, F-, X-, and L-registers, algebraic adder, comparator, multi-
Chapter 8 I The UNIVAC system 161
plier-quotient counter, and the high speed bus amplifier are dupli- If the previous direction does not agree with the new direction
cated. called for, the input-output control circuits generate the proper
The memory is not duplicated, but is checked by the periodic signals to prepare the Uniservo to move in the opposite direction.
memory check mentioned previously. Various sections of the con- If the instruction is to rewind a Uniservo, the input-output control
trol circuits are duplicated such as the program counter and cycle circuits then direct the center drive of the selected Uniservo to
counter. rewind the tape to the beginning and stop.
As soon as the instruction has proceeded to the point where
Timing pulse generator and cycling unit the input-output control circuits need no further information from
The timing pulse generator and cycling unit ( C U ) are the source the function table, the instruction ending signal is generated
of the basic timing signals throughout the computer. The timing and the internal circuits proceed to the next instruction, even
pulses occur at 2.25 megacycles per second. The cycling unit while the reading, writing or rewinding continues. The UNIVAC
subdivides this rate into the character rate and word rate. The can process an input, an output and several rewind operations
character rate is one seventh of the basic pulse rate since there while simultaneously carrying on internal computation.
are seven pulses for each character. There are 12 characters per So far the method by which the words are transferred from
word but space for a 13th character is included in a word time the I-register to the memory has not been mentioned. This opera-
and is called the space between words. This time is used for tion is combined with certain read instructions in a manner not
switching purposes. immediately obvious. There are two instructions which read from
The cycling unit, therefore, develops the word signals at the tape to the I-register, one causing the tape to move forward,
y7 x yl3 or yS1of the basic pulse rate. Within the cycling unit the other causing it to move backward. There are two other input
( C U ) are numerous duplications and comparisons to ensure com- instructions similar to those just mentioned, but they have the
plete reliability. additional operation of first reading from the I-register to the
memory and then reading a new group of 60 words from tape into
Input-output circuits the I-register. Thus the first type of input instruction reads from
tape to the I-register only. It must be followed by the second type
The operation of the input-output system is dovetailed as effi-
of instruction in order first to clear the I-register and then read
ciently as possible with the operation of the arithmetic circuits.
in the second block of 60 words.
Whenever possible, parallel operations are allowed to proceed so
The output instructions do not operate in this way but instead
as to minimize the time lost on internal operation while the slower
read directly from memory to the 0-register and then to the tape
input-output operations are taking place.
as one instruction.
The principal input-output instructions are handled in a man-
A third type of checking circuit occurs in the input-output
ner identical to that for the internal operations, except that now
control circuits which counts the number of characters transferred
the function table develops signals which bring the input-output
from the tape in each block. Since there must always be 720
control circuits into operation. The information supplied to the
characters per block, the 720 checker signals any discrepancy to
input-output control circuits by the function table includes the
the supervisory control.
following:
One other phase of the input-output operation concerns the
two supervisory control input-output instructions. One of them
1 Which of the ten possible Uniservos is being called on
permits a single word to be typed in from the input keyboard and
2 Whether it is a read or write, that is, an input or output the other causes a single word to be typed out automatically.
operation
3 If it is “read,” the direction in which the tape is to move Auxiliary equipment
The two principal auxiliary devices mentioned earlier were the
The input-output control circuits, therefore, begin by testing Unityper,l which converts keyboard operations to tape recording,
whether or not the Uniservo indicated now is in use or not. If and the Uniprinter,l which converts magnetic recording to type-
it is already in use, everything else waits until that Uniservo is written copy.
free. Next, the input-output control circuits test to determine
whether the Uniservo selected last moved backward or forward. lRegistered trade mark.
162 Part 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
Unityper. A simple block diagram of the Unityper is shown in Fig. are both stepped backward until a stop punch (usually associated
2. Each keyboard operation pulses the input to an encoding func- with each forced check) is encountered. Thus the entire erroneous
tion table which, in turn, drives the appropriate heads for record- item is erased, and at a much higher rate than that at which the
ing the particular combination on the tape. Simultaneously, the backspace key can be operated. The backspace, incidentally, can-
same pulse triggers a motor delay flop which operates the tape not cancel an error indication, but it can be used to correct a
motor for an interval sufficient to move the tape across the head wrongly typed character if the typist recognizes it.
for the distance required to record one character. However, there The third function of the loop system is to enter, automatically,
is a punched paper loop system associated with the Unityper for various fill-in characters. Under one such system of operation, the
the purpose of providing the typist with various guideposts individ- loop control records the characters only at the behest of the oper-
ually set up for each problem. The loop control system serves three ator. This function is useful where individual entries, such as
distinct control functions. First, it allows the programmer to set personal names, do not fill out all of the space allotted. The other
up various numbers of characters for the individual items being operation is fully automatic in which the loop assumes full control
entered for a given problem. If the typist ever enters other than to record, for example, a group of fill-in characters later to be
the specified number of characters, the loop control signals an replaced by computed data within the central computer.
error. Although the basic word length is 12 characters, the pro- The block diagram therefore shows the loop motor connected
grammer may subdivide or group the words to suit any length of to the same delay flop that steps the tape motor. The same signal
item. The loop can then be punched with what are called “force which moves the two motors also sets a second delay flop (DF2)
check” punches. Whenever the typist completes a correctly en- which produces a delayed probing pulse. The probing pulse exam-
tered item, she must operate a release key before entering the next ines the paper loop photoelectrically for the new combination.
item. If the forced check is released too early an error is created, A third delay flop (DF3)produces another probing pulse after the
or if an additional character is typed after the forced check should relays associated with the loop photocells have had time to set
have been released, an error is similarly indicated. up. If any automatic function is indicated by the photocells, the
The second function of the loop is to control the erase opera- probing pulse passes through the interpreting relays, enters the
tion. The erase operation is the only way in which an error can encoding function table to generate the fill-in characters, and thus
be recalled. When the erase key is operated, the loop and tape starts the cycle over again. All automatic functions take place at
about 22 characters per second.
Numerous odd-even checks are introduced in the Unityper to
provide checks on tape and loop motion and on the recorded code
El-i
l
ENCODING
KEYBOARD FUNCTION
RECORDING combination.
HEAD
TABLE
Uniprinter. The Uniprinter is shown in simplified block diagram
in Fig. 3. Its operation is a simple cycle which is initiated by a
start button. The start button triggers the motor flip-flop (MFF).
I
The motor pulls the tape across the reading head until a combina-
I tion is detected. The presence of pulses on any of the seven lines
between the reading head and the relay decoding function table
is sufficient to restore the motor flip-flop ( M F F ) and stop the tape
motion. Simultaneously a print delay flop (DF1) is triggered.
During the delay flop interval, the decoding relays are given time
to set up. When the delay flop recovers, a pulse is sent through
the relay table which reappears at one of the typewriter magnetic
actuators. As the typebar reaches the platen, a printer action
RELAYS switch (PAS)is operated which pulses the motor flip-flop and starts
a new search for the next character on the tape. The odd-even
properties of the UNIVAC pulse code are utilized for checking
Fig. 2. Simplified block diagram of Unityper. purposes.
Chapter 8 I The UNIVAC system 163
2 Arranging these groups by cities, counties insertions, printing, calculations, and unityping is 9 seconds per
3 Assembling from the tabulations the data required for each item.
table
Logistical problems
The raw data were prepared in the form of a punched card In the field of logistics, five major studies have been conducted,
for each individual in the United States. The data from these four of these resulting in actual problems executed on the com-
enumeration cards are then transcribed onto magnetic tape. From puter.
these tapes, the computer processes the data sequentially through The first is the type of computation in which the basic purpose
the three steps, producing output tapes from which the tables are is to determine quantitively whether a given operational or mobi-
printed on Uniprinters. The only manual operations encountered lization plan can be logistically supported. The ultimate desired
in this entire procedure are the handling of the original punched is to find, by calculation, the optimum program for carrying out
cards, mounting and demounting tape reel (the equivalent of 9,700 such plans. At the time of writing, only a small model has been
cards), and the removal of the printed tables from the Uniprinters. actually run on UNIVAC, but full size models will be run within
The most important feature of the present procedure is the elim- the next few weeks. Two computations have been executed, one
ination of handling and sorting tremendous quantities of punched a set of three tables of thousands of lines each, giving a detailed
cards. Each handling of the card stacks is a source of potential breakdown of machine deployment, fuel requirements, and over-
error and delay. The UNIVAC memory permits the simultaneous haul requirements. The other problem was a computation of the
accumulation of the 580 tallies which describe our population amounts of critical raw materials required to construct a given
for each local area being studied by the UNIVAC system. number of each type of equipment, these requirements being
phased by quarters over a 2-year period. The fourth problem,
Commercial problems which was actually computed, was a sample of a similar calcu-
In the commercial field, the UNIVAC system has handled premium lation in which every pound of critical raw material required each
billing for a life insurance company. This program produces pre- month for the ultimate construction of a complete building pro-
mium notices, dividends, and commissions. In a particular example gram was computed.
worked out, approximately 1,000,000 bills, 340,000 dividends, and The UNIVAC program which was prepared is capable of
100,000 commissions have to be produced monthly. The necessary accommodating every type of equipment, individually tailored
information for processing a particular policy is contained in 240 construction schedules, detailed hills of materials running into the
digits, or, in special cases, 480. This compactness is made possible millions of items and of determining the actual amounts of alloy
by a logical system of 40 symbols, comprising both alphabetic and elements based on thousands of tables of percentages for the many
numeric characters, which denote over 90 definitions. The UNI- alloys employed. The demonstration showed that this computation
VAC processes the policies as directed by the symbols, policy for 400 pieces of equipment of a given type could be executed
dates, and policy numbers. in three hours of computer time. The last problem in this field
The problem includes inserting over 250,000 changes each has not yet been run, but the study has shown that the entire
month before further handling is done. After this step, the policies gamut of stock control for a large supply office can be covered
to be processed are selected from a file of 1,500,000 items. Next, by the computer in approximately 3 weeks time.
a list is produced of the cases which have symbols indicating that This program involves the maintenance of stock balances of
special notices must he sent to the policyholders. Following the hundreds of thousands of stock items for many service points and
calculation of dividends and commissions, additional lists are pro- provides for the preparation of stock transfer orders, purchase
duced: one group contains information pertaining to commissions requisitions, critical lists and summary reports.
and agents; another contains information regarding dividends; and
finally, there is a listing of option changes for later insertion into Performance record of the UNIVAC
the policy files. Policies requiring premium notices are then edited
Acceptance tests
and the notices are automatically printed from the data contained
on magnetic tapes. The Acceptance Tests, prepared jointly by the Bureau of Standards
The UNIVAC time needed for a program of this proportion and Bureau of Census, are fully discussed in the following paper
is about 135 hours a month. The average computer time per policy by Dr. Alexander and Mr. McPhers0n.l However, a few comments
processed is less than 0.5 second. The average time for all change lPaper not included in this book. See McPherson and Alexander [1951].
166 Part 2 I The instruction-set processor: main-line computers Section 1 I Processors with one address per instruction
concerning them from the engineering point of view are appro- accuracy and other criteria of reproducing ability which defined
priate. “good” reels. In 10 hours, the converter had prepared over 15 reels,
The Census computer was given two tests; the first, a test of 14 reels had been tested, 11 of the 14 were found satisfactory and
its computational ability; the second, a test of its input-output the converter was accepted for payment.
system which particularly stressed the tape reading and recording Although the test was run on only one of two converters, the
abilities. Bureau of Census put both card-to-tape machines into operation
The Central Computer Acceptance Test A consisted of two and after six months of use, the acceptance test was run on the
parts. During Part 1, every available internal operation, except second card-to-tape converter. This test differed to some extent
input-output operations, was performed. Among these operations from the first test in that the Census Bureau was satisfied with
were addition, subtraction, comparisons, division, and three the reading ability of the machines and did not require a digit-by-
different types of multiplication operations. Each of the arith- digit verification of the information. However, a new stipulation
metic operations handled a pair of 11-decimal digit quantities. was added that, after the engineers had checked the converter
Altogether there were about 2,500 operations in the routine, yet out preparatory to running the test, the converter was to be used
the entire routine required only 1.26 seconds to do. The routine in actual operation for eight hours before doing the remainder of
was performed 808 times in 17 minutes making a total of about the test with no engineering intervention between the two portions
2,000,000 operations in all. of the test. The first part was run on Friday, October 5, 1951; the
The second part of Test A included the solution of a heat device remained idle Saturday and Sunday and was turned on
distribution equation, a short routine involving the input-output Monday morning to complete the test. It passed with flying colors,
device and a sorting routine. The sorting routine arranged ten preparing ten acceptable reels (out of ten reels) plus two decks
numerical quantities each containing 12 decimal digits in correct of check cards in slightly less than 7 hours. Both card-to-tape
numerical order in about 0.2 second. All three routines took a total converters now are in Washington and the remainder of the system
of 1% minutes to perform. They were performed twice for each is in operation by the Bureau of the Census on the Eckert-Mauchly
test and when added to Part 1 made a total of 20 minutes for premises in Philadelphia.
unit test A.
The Acceptance Test B examined the input-output tape devices
Reliability and factors affecting performance
(Uniservos). During the first part of Test B, 2,000 blocks or about The first UNIVAC system now has been operating for approxi-
1.4 million digits, which included every available character mately 8 months. In that time, much has been learned about how
(numeric and alphabetic) were recorded on a tape and then read UNIVACs should be operated and maintained. The situation has
back into the computer with the tape moving backward. The been somewhat complicated by having to shake down the equip-
information read back was then compared with the original data ment while in the customer’s possession; that is, there were certain
read out. The recording operation required about 4 minutes while faults in the system from both engineering and production stand-
reading back and comparison required about 8 minutes. The sec- points which could only become apparent in the course of time
ond part of Test B consisted of recording and reading over one and under actual operation conditions. For example, weak tubes
spot of tape for 700 passes in order to determine the readability or faulty solder joints did not reveal their presence at the time
of tape as it wears. This test required 13 minutes and when com- of installation. Another type of difficulty only became apparent
bined with Part 1, made a total of approximately 25 minutes for under certain duty cycle conditions imposed by various types of
Test B. This test was repeated 19 times. problems. Because only certain problems present this particular
The first test run passed in 6.6 hours (minimum theoretical duty cycle, these troubles remained in the machine causing inter-
time: 6.0 hours) and the second test was passed in 9.47 hours mittent stoppages until they could be tracked down.
(minimum theoretical time: 7.45 hours). Of the 2.02 hours down Patient isolation and elimination of such problems, most of
time, 1.45 hours were accumulated at one time with the remaining which have occurred only with conditions of operation infre-
0.58 hours spread over the rest of the test. quently encountered, is a powerful, though sometimes painful
The Uniprinter test required that a block of information (60 proving ground for the engineering group charged with such re-
words) be printed 200 times in tabular form. The minimum time sponsibility. The experience and depth of judgment acquired by
for printing was five hours. The test was passed in 6.16 hours. such a group in the course of performing such work have become
The card-to-tape test required that ten good reels of tape be unmistakably apparent in the already noted improved performance
produced in 12 hours. There were certain restrictions as to reading of following UNIVACs and generally advanced ability to predict
Chapter 8 1 The UNIVAC system 167
and realize performance in any large scale and complex apparatus the engineering time spent on the computer system during typical
of the same character. weeks of operation. The figures are given both in hours and per-
Some of the troubles encountered are interesting to study in centages. Both nonscheduled engineering time as well as preven-
detail. On a rather complicated routine requiring the use of a tive maintenance time are shown. The sum of the two gives the
number of Uniservos, all ran smoothly for 15minutes. At that time, total engineering time spent on the computer per week. It should
one of the Uniservos executing a backward read somewhere in the be noted that this is actual engineering time and does not include
middle of the reel, did not stop at the end of the block but con- time that the computer may have been shut down while waiting
tinued to run until it ran off the end of the tape. After much work, for an engineer to report. According to our maintenance contract,
it was shown that a cycling unit signal was being overloaded this must be within a half hour during regular working hours and
because it was being used both by a multiplication instruction and within two hours at all other times. Attention should be given to
the backward read which were occurring simultaneously. The the fact that the preventive maintenance time does not total
input precessor loop was cleared as a result and the count of the exactly 32 hours each week. This is due in part to a half-hour
pulses coming off the tape was thereby lost. Once the trouble was period each morning devoted to checking and cleaning the
found, it was simple to remedy. mechanical portions of Uniservos. It is expected that this work
Another rather interesting case occurred intermittently over will be taken over by the UNIVAC operators since the procedures
an extended period. Normally when reading out of the memory, and the techniques involved are quite simple.
the contents should not be cleared. Occasionally, however, reading In addition, one extra shift was required the week ending June
from the memory also caused the contents to be cleared. As the 3 and three extra shifts the week ending October 7 , 1951. These
trouble only remained for a period of seconds or, at most, a few shifts were required to incorporate engineering changes which had
minutes, it was somewhat difficult to localize. Of course, parasitic been developed over a period of time and could not be incor-
oscillations of some sort were suspected and, in fact, the trouble porated in the equipment during the normal preventive main-
was traced to the actual source on a logical basis; but the source,
a high power cathode follower, showed no evidence of oscillation.
Before the problem was remedied, various combinations of para- Table 1
sitic suppressors were tried; the trouble would vanish for perhaps
?btd
a week and then return. The oscillation finally cropped up during Week Nomcheduled Precentiue engineering Rrcentuge of
a maintenance shift, was found to be in the suspect tube at 100 ending enginvering muintenunce tinw nonscheduled
megacycles and was eliminated rather easily. 19.51 Z;lours Per Cent Hours Per Cent Hours Per Cent engineering
-
Other types of troubles that have occurred include intermittent 58.9 35.1 14.8
June 3 18.9 11.3 40 23.8
parasitic oscillations in other circuits, bounce in Uniservo relay 26 20.5 12.2 34 20.2 54.5 32 15.3
circuits, various mechanical problems in Uniservos, time constants July 14 14.7 8.8 33 19.6 47.7 28 10.9
not consistent with the longest duty cycle signals, and various 21 19.4 11.6 34.5 20.5 53.9 32 14.5
types of noise in the input circuits. The tubes, which initially were 28 39.2 23.3 34.5 20.5 73.7 43.8 29.4
Aug. 4 26.2 15.6 33 19.6 59.2 35.2 19.4
bothersome, have now stabilized to the point where two tubes
Sept. 2 28.8 17.1 34.5 20.5 63.3 37.7 21.6
per week (on the average) stop the computer during computation. 9 16.1 34.5 20.5 50.6 30 12.1
9.6
All of the above troubles and others not discussed here have 16 22.6 13.5 33 19.6 55.6 33 16.7
contributed to lost computing time on the UNIVAC. However, 23 42.3 25.2 34.5 20.5 76.8 45.7 31.7
they cannot influence future operation because the reasons for 30 21.8 13.0 34.5 20.5 56.3 33.5 16.3
them have been found and eliminated. The fact that these troubles Oct. 7 15.9 9.5 56 33.3 71.9 42.8 14.2
14 14.0 8.3 34.5 20.5 48.5 28.9 10.5
will not occur in future UNIVACs cannot be emphasized too
21 10.4 6.2 34.5 20.5 44.9 26.7 7.8
strongly. 28 20.8 12.4 33 19.6 53.8 32 15.4
Under a contract with the Bureau of Census, Eckert-Mauchly Nov. 4 40.4 24.0 34.5 20.5 74.9 44.6 30.3
Computer Corporation maintains the Census installation. This 11 10.1 6.0 34.5 20.5 44.6 26.5 7.6
system is operated 24 hours a day, seven days a week, except for 18 30.5 18.2 34.5 20.5 65 38.7 22
four 8-hour preventive maintenance shifts each week. This allows 25 13.7 8.2 34.5 20.5 48 28.6 10
Dec. 2 14.8 8.7 34.5 20.5 49.3 29.3 12.6
approximately 32 hours for regular maintenance and 136 hours
9 19.6 11.7 34.5 20.5 54.1 32.2 14.7
for operation or 21 and 79 per cent respectively. Table 1 shows
168 Part 2 I The instruction-set processor: main-line computers Section 1 1 Processors with one address per instruction
tenance time. The nonscheduled engineering time has varied from have, by virtue of duplication, immediately stopped the computer.
as little as 10.1 hours or 6 per cent to 42.3 hours or 25 per cent. Although checking by means of inverse operations can provide
The last column in the Table shows the amount of nonscheduled operational checks on the arithmetic circuits, there is some ques-
engineering time as compared to the allowable operating time tion as to whether it provides as good a check as duplication.
(total time less preventive maintenance time). Here there is a However, in connection with odd-even codes, it may conceivably
variation of from 7.6 to 31.7 per cent and an average for the weeks be comparable. It should be remembered, however, that this is
shown of 16.9 per cent. It is believed that these figures, while good from an operational standpoint and not a maintenance standpoint.
for the first months of operation of a new piece of equipment, will When the control equipment is considered it is difficult to visualize
show definite improvement over the next year. a check that is as good as duplicated equipment. Other checks
Although the opportunity to prove or disprove the following ed in UNIVAC include the periodic memory check,
theory of operation has not presented itself, it is believed logical intermediate line function table checker, function table output
that optimum use of the UNIVAC equipment might b e obtained checker, memory switch checker, and 720 checker.
by means of scheduling preventive maintenance only at such times As explained earlier in the paper, the periodic memory check
as it is indicated in the judgment of competent operators. In other is accomplished by reading out of all memory channels sequen-
words, there are many occasions preceding a scheduled main- tially and performing an odd-even check on each digit as it passes
tenance shift when the system is performing very well. At such through the high speed bus amplifier. The period at which the
times, it is extremely inefficient to shut down the operation in check is repeated may be varied over a large interval. At present,
order to provide maintenance. For many reasons, however, it has it is set at 5 seconds, the check taking 52 milliseconds or about
been impossible t o operate and maintain the first system in this 1 per cent of the computing time.
way. It is hoped that such operation will be possible in following The function table has a check at the very input by bringing
installations. in the check pulse in each character so that if an odd-even error
It should be realized that the UNIVAC system requires a super- occurs between the control register and the static register, no order
visor of the same caliber as the one required for a large punched will be set up and the computer will grind to a halt! If the input
card installation. However, the large group of operating personnel sets u p properly but an error occurs farther on in the table, but
would be replaced by a small group of well-trained extremely not ahead of the intermediate lines (the linear set into which the
competent people thoroughly familiar with the details of the input combinations are decoded), the error is caught at this point.
computer and associated equipment. The time spent in providing The intermediate lines are broken into groups in such a way that
a high degree of training for these people is more than repaid in an error is indicated when more than one line is set u p in one
increased operating efficiency and consequently higher work out- group or the entire set. There is an exception to this in some groups
put. For example, situations arise in the course of running a prob- where no error is indicated by this checker if more than one line
lem where a correct operational decision can save hours of elapsed is set up within the group.
computation. Also, a competent operator will recognize malfunc- This has been allowed only in those cases where it has been
tions sufficiently early to prevent serious delays. He is capable of shown that setting u p two or more lines will cause some other
deciding whether to continue with machine operation or to stop checker or checkers to indicate the trouble.
to diagnose. The second UNIVAC system which is ready for If the error occurs beyond the intermediate lines, the output
installation in Washington, will be operated by a group of engi- checker then comes into play. This checker makes an odd-even
neers who have been trained in operation and maintenance. This count on the number of gates used on each instruction: dummy
procedure, it is believed, will result in the UNIVAC system being lines having been added so that the count is normally always odd.
of maximum benefit to the Air Comptroller’s Office. The memory switch or tank selector checker ensures that one
and only one memory channel is selected on any instruction. It
checks each of the two digit positions separately indicating which
Evaluation of UNIVAC design
if either, is in error.
Checking features The 720 checker counts the digits coming off the tape and if
Maintenance of the UNIVAC has been vastly simplified by use there are either more or less than 720 in one block, the computer
of duplicate arithmetic and control equipment and other checking stops; by examining the indicators on the supervisory control
methods. Many factors which would have led to undetected errors console, the operator can determine the number of digits actually
Chapter 8 1 The UNIVAC system 169
read. By means of some rather simple manipulations, the operator of its designers, is extremely good. Certain phases of its design
can then reread the block without losing his place in the routine; exceeded expectations, while of course, other phases were some-
and if the information is then read correctly, he may again start what disappointing. The first eight months of actual operation
the computer on the routine. The same procedure may be followed have taught more than years of experimentation with laboratory
if an odd-even error is made in reading from the tape. models. Many improvements have already been conceived of this
Many checks other than those mentioned before have been experience and are continuing daily to increase reliability.
built into the UNIVAC. On the basis of operating experience, the The other major factor influencing computer design, cost, has
engineers cannot recommend too strongly the use of built-in been duly considered in the UNIVAC design; and it is being met
checking facilities. All in all, the faith that can be put into results with plans for a continuing full-scale production of UNIVAC sys-
obtained from an unchecked computer comparable in size to tems. As the production techniques are developed concurrently
UNIVAC is in the writers’ opinion exceedingly low. with the engineering design details, the UNIVAC becomes the
More than this, however, the methods by which the UNIVAC realization of a hope which has long been in the minds of its
is checked have been of extreme usefulness in trouble shooting. designers: An economical, completely reliable commercial com-
The duplication of circuits has amply repaid the increase of space puter for performing the routine mental work of the world much
and the number of components required by this checking system. as automatic machinery has taken over the routine mechanical
work of the manufacturer.
General comments
After evaluating UNIVAC performance over a period of eight References
months, the over-all picture of the UNIVAC design, in the minds McPhJ51.
Section 2
170
Chapter 9
Summary The paper gives an historical account of the development of too can be taken from stock or established production lines to make
the packaged method of construction of computers, and the advantages other computers.
of this method are discussed. The packages used in the computer Pegasus In commissioning a computer, because all the packages have
are described from both an electronic and a mechanical point of view. The been pretested, when power is first applied to the complete
specification of the machine is given and the arguments which led to this
machine it is known that a large part is already fault-free. It
specification are discussed. The detailed logical design procedure leading
remains to detect a few errors which may have been made in the
from the specification to the wiring lists is described. The method of
maintenance and some reliability fipres are given. interconnections.
Perhaps an even more important consideration is ease and
speed of maintenance. Test programmes will usually indicate the
Introduction part of the machine in which a fault is occurring. Several monitor
sockets are located on the front of each package, and by inspection
The development of standard plug-in unit circuits (‘packages’) for
digital computers began in this country [England] in 1947, and the faulty package is speedily found and replaced.
some of the advantages of the method have been discussed in The package method has been criticized on the grounds of the
earlier papers [Elliott, 1951; Johnston, 1952; Elliott et al., 1952; cost and questionable reliability of plugs and sockets, and some
redundancy of components.
Elliott et al., 19531. The advantages start in the design stage of
a new computer project and follow through production and com- The authors believe that the many advantages far outweigh
the cost of plugs and sockets. The present trend is to use copper-
missioning to maintenance.
etched printed circuits, and these fall naturally into the plug-in
In the design stage, what is known as ‘logical’ design is sepa-
unit idea, the plug contacts being part of the printed wiring; there
rated from engineering design. Once the packages have been
has been no trouble in Pegasus from plugs and sockets. Component
designed by electronic engineers and the rules for their inter-
redundancy in Pegasus is about 10% of the diodes and a few
connection have been laid down, the ‘logical designers’ (usually,
resistors, the cost of redundant components being about 2 150.
but not necessarily, mathematicians) can begin organizing the
packages into various computers to carry out different functional
requirements. The electronic and mechanical design work invested Electrical design of the packages
in the packages is thus drawn on for more than one computer
Circuits used for arithmetic and switching operations
design, and each computer can be assembled from stock parts
without further engineering effort. Design time and cost are there-
Historical. A previous data-processing machine [Elliott et al.,
fore much reduced.
1952; Elliott e t al., 1956bl used 330 kc/s serial-digital circuits; they
In production, whether we consider one design of computer
had originally been designed for 1 Mc/s operation, but 330 kc/s
or several designs using the same packages, costs and time are also
waschosen to suit an anticipation-pulse cathode-ray-tube store. This
much reduced. Quantity production lines for the relatively few
frequency has been retained to the present time because it suits
types of standard package are set up, and are common to different
the magnetostriction delay-line store [Fairclough, 19561 and the
computer designs, thus reducing inspection and planning costs.
Standard cabinet work has been designed for Pegasus, and this magnetic-drum store [Merry and Maudsley, 19561. Experience
with the data processor led to work (commenced in 1951) on a
‘PRJC.IEE, pt. €3, vol. 103, supp. 2, pp. 188-196, 1956. new set of circuits [Elliott et al., 19521, particular emphasis being
171
172 Part 2 1 The instruction-set processor: main-line computers Section 2 I Processors with a general register state
laid on flexibility of use and ability to work without error in high B is inverted (forming B, or ‘not B’) and is used to gate pulse A
electrical interference fields. These circuits form the basis of those and prevent its passage. The inverted pulse will be a little late
in Pegasus. on B, which also may have been later than A, as shown in Fig.
IC; thus when A and B are ‘anded’ together a spike may be pro-
Operations to be carried out. The following well-known opera- duced, as shown in Fig. le. This spike, however, lies between clock
tions are used to build up the logical structure of the computer: pulses and so will be rejected on clocking.
The pulse system used allows several logical operations to be
(I ‘And.’ This operation, which may be carried out between performed in cascade without any loss in nominal timing, so easing
two or more input serial trains of pulses, produces an output the problem of logical design (particularly by permitting after-
train in which pulses occur only when pulses are present thoughts). The maximum number of logical operations performed
at the same time on all inputs.
b ‘Or.’ This operation produces an output train in which
pulses occur at all times when a pulse is present on any
m
of a number of inputs.
c ‘Not.’ 1’s are changed into 0’s and 0’s into 1’s; this is
achieved by inverting the pulse train. + 2 to +3 volts
current in L. When VI is cut off at the end of the digit, this current
+zoo volts t 2 0 0 volts + 200 VOI t S flows through diodes D, and charges up a storage condenser, C,
which is discharged at the end of the next clock pulse by a ‘reset’
&--....,
470 k f i
“2
pulse applied through D,. The reset pulse supply is a common
-output 1 computer supply whose amplitude and phasing relative to the
Input clock
,output 2 clock pulse is shown in Fig. 3.
+zoo Volts +zoo Volt*
It will be noted that the reset pulse is also present at a time,
330kSZ ,,
just after V, is cut off, when the current in the inductor is about
to charge the storage condenser. This merely has the effect of
(bl Reset -150 -150--150
“Dits YOltS Y O l t S deferring the charging of C until the end of the reset pulse, the
Clock -150
Volts
(ai
The logicul circuits. Each of the logical packages has more than
one circuit unit. A circuit unit is defined as that part of a package
which has input and output pins, and no connections to other parts
of the package other than supplies. We may make the following -10 v o l t s (C)
generalizations:
The staficizor. The function of a staticizor is to remember the The adder/subtracter. Figure 5 shows an adder/subtracter unit
fact that a digit occurred at a particular time, for an indefinite with inputs X and Y and an output X + Y for the sum or X - Y
period, the method generally used in Pegasus being shown in Fig. for the difference. There are two further input control leads
4. A digit delay with a twin ‘and’ gate input has its output con- marked ‘add’ and ‘subtract’. If the ‘add’ lead is held positive
nected to one of its inputs. It is turned on by gate 1, which causes while the ‘subtract’ lead is held negative, the unit acts as an adder.
a digit to circulate as long as the inputs to gate 2 remain positive. If the ‘subtract’ lead is held positive and the ‘add’ lead negative,
the unit acts as a subtracter. Carry suppression is controlled by
the lead marked ‘carry suppression’. Carries are allowed to propa-
gate when this lead is held positive, so that a negative signal on
S t a t i c i z o r is t u r n e d this lead will snppress carry.
S t a t i c i r o r is s e t if /off i f e i t h e r of t h e s e
t h e s e leads a r e Table 1 gives the digits appearing at the outputs of logical
leads i s n e g a t i v e
positive elements in the adder/subtracter unit for all combinations of input
and carry digits when the unit is operating as an adder.
Table 1 Digits at various internal points of the adder/subtracter unit 1 and 2. The circuit units based on package type 1 can perform
when set to add, for all combinations of the input and carry digits all the functions of those on type 2. However, there are many uses
for a digit-delay circuit with a single ‘and’ gate input (package
Present Digits at internal points
Inputs digits carry type 2), and since three units of this kind (instead of two for a
digit A B c D E F 2- ‘and’-gate input delay) can be based on one package, a saving
(Sum) (Next can be effected. In Pegasus this saving amounts to 32 packages,
X Y Z carry) which is considered to be well worth an extra package type.
0 0 0 1 0 1 0 0 In addition to the five logical packages, a further 16 types (three
0 0 1 1 0 1 1 0 of which are peculiar to each computer) are required. The numbers
0 1 1 1 0 1 1 0 used for the various functions are given below:
0 1 0 1 1 0 1 0
1 0 1 1 0 1 0 1
1 0 Number
0 0 1 1 1 1
1 1 0 0 1 1 1 1
i
Type 1 113
1 1 1 1 1 0 1 1 Type 2 64
Note.-A and C a r e at the grids of the digit delay units. Logical types Type 3 55
Type 4 45
Type 8 37
problem then was to arrange the various circuits in such a way Nickel line 1 word store 61
as to enable a computer to be designed using a minimum total Drum-store packages (8 types) 38
number of packages without too many types. Five types were Input/output packages (3 types) 17
Clock and reset waveforms (3 types) 14
arrived at and these are shown in Fig. 6. ~
Total 444
As an example of the factors involved, consider package types
W W (U)
NOTE Clock connections
are not shown, they are
implied whenever a delay
symbol is used.
Fig. 6. Contents of logical packages. The arrowhead on an output lead denotes the presence of an OR crystal connection.
176 Part 2 I The instruction-set processor: main-line computers Section 2 I Processors with a general register state
The magnetic-drum store and the circuit packages used with This combination of plug and socket has a consistently low
it are described in another paper [Merry and Maudsley, 19561, contact resistance (0.003 ohm at 1 amp); the insertion and with-
as is the nickel-line store [Fairclough, 19561. drawal force is about 4 oz per contact.
i1 1
0 ALWAYS ZERO
WORD TRANSFE
ACCUMULATORS 2 - characters on tape can be checked by a similar process.
(ORXREGISTER 3 - BLOCK T R W Y E R S
THESE ARE 4 - T O AND FROM
THE REGISTERS 5 -
-
MAIN STORE The considerations which led to the
USED FOR
MODIFICATION
6
7 - ]DOUBLE LENGTH
- specification and the logical design
I:“
HAND SWITCHES (20 DIGITS)
- INPUT/OUTPUTCHECKED (5DIGITS)
The main features of the design are
SPECIAL l7
-UNCHECKED ( 5 DIGITS)
I: =
REGISTERS 32 - ALWAYS- 1.0 a The use of a computing store from which all orders and
33 - ALWAYS % numbers are taken while computing
ALWAYS ‘Tl0
ALWAYS 2-13
b The provision of multiple accumulators
BCOCK 0 c The provision of special orders and facilities for dealing
0’7 easily with ‘red tape’l
10
BLOCK 1
’ROGRAMMERS The computing store. The use of a fast-access store from which
NOTATION
In
BLOCK2
all numbers and orders are taken increases the speed of the
E-
+
8
a BLOCK3
machine and eliminates the need for optimum programming. It
is this computing store which makes it possible to use an inexpen-
z sive magnetic drum (with a relatively long access time) as the main
9 BLOCK4 store, and yet have a machine which is fast and relatively simple
8 to programme. On the other hand, programmes have more ‘red
BLOCK 5
tape’ and are not as simple as with single-level storage.
Transfer between levels is in blocks of eight words; this is a
simplification and saves time. One block holds a reasonable amount
of programme and other blocks hold data. Four blocks in all (32
Fig. 8. Allocation of addresses in store.
words) would be just sufficient, and Pegasus was originally de-
signed with this number. The design was subsequently modified
obeyed, the part chosen depending on the function of the order to six blocks, which is quite adequate, in conjunction with the
to be modified. Figure 9 gives a schematic representation of the seven accumulators. Any further increase in the size of the com-
modification process. The effect of modifying an order depends puting store would be achieved by increasing the size, not the
on the function of the order and can be to make the effective order number, of blocks. As it is there is an economic balance between
length 22 digits. This extension is necessary when specifying an the usefulness and the cost of the computing store.
address in the main store.
“Red tape’ is an expression for the non-arithmetic orders in a programme.
Transfers of information can take place between the computing
store and the main store, and vice versa, either in single words
or in blocks of eight words. For single-word transfers, only the
register with address 1 in the computing store is involved. For
block transfers the address on the drum of the first word of the
block must be divisible by eight, and the registers in the computing FUNCTIONS 0 0 3 7 .......... SHADED PORTION IS ADDED
TO THE ORDER. THE FULL
13 DlGlTS ALWAYS APPEAR
I N X REGISTERS I N
store that are involved will be one of the discrete blocks indicated FUNCTIONS 40.61- SIGNIFICANCE SUCH THAT
THE MOST S I O N I F I U N T
F U N C l l O N S lO,lI,74.13
in Fig. 8. DIGIT CORRESPONDS TO
2-I ( A N D LEAST SIGNIFICANT
fUNCTlONS 7 . ? , 1 3 , 10 6 , 7
. 1 ~
Input and output is by means of punched paper tape. An ‘exter- TO 2-13.)
The procision of several accumulators. This is the most novel Having a large number of jump instructions greatly helps in
feature of the logical design of Pegasus. It is generally agreed that organizing a programme. In particular, one order enables a jump
the simplest order code from the user’s aspect is the 3-address code to be made depending on the condition of an accumulator (being
with orders of the form, A +
B+ C. An examination of this zero, for example), and another order on the complementary con-
form of code, however, shows that in many cases two of the ad- dition (being not zero). When only one of these orders is available
dresses are the same, so that the order takes the 2-address form, it is necessary to think ahead to see whether or not the correct
A + H 4 A. A further examination shows that in a large propor- condition will be satisfied. Although the eight jump instructions
tion of cases the address A is confined to a very few addresses. included in the code were felt initially to be enough, it is now
This leads to the suggestion of a code of the form N +
X--t X, suggested by programmers that even more such orders would be
where X covers only a small part of the store while N covers the helpful.
whole store. This will have the advantage of yielding a reasonably The logical shift orders, 52 and 53, are also included to simplify
short order. In Pegasus two such orders are incorporated in one ‘red tape’. In particular, they are used for packing and unpacking
word, leaving sufficient digits to specify a modification register (a words holding several items of information.
Mancunian B-line) in each order. As a result of including these various orders, the order code
The extreme case of this code is, of course, the single-address of Pegasus is quite large. It is worth remarking, however, that by
code, where X is confined to one address, the accumulator. How- a sensible grouping of the orders in the code the remembering
ever, experience had convinced the programmers collaborating in of the code is a very simple task. A sensible arrangement of the
the design of Pegasus that, with single-address codes, a large code tends to reduce the amount of equipment needed to engineer
number of orders are concerned 50kly with transfers of numbers it. For example, when the equipment for dealing with group 0
from one register to another; the single accumulator is a restriction of the code has been allocated, groups 1 and 4 require the addition
through which all numbers must pass and in which all operations of only three gates.
have to be performed.
Facilities for checking programmes. The features mentioned above
In the Manchester University computer the B-lines serve two
make the computer easier to programme, and there are other
very valuable but distinct purposes: they allow order modification
facilities in Pegasus that make it easier to check out and develop
and rudimentary arithmetic (such as counting) to be done without new programmes. These include causing the machine to stop
disturbing the accumulator. It was felt that fuller arithmetic and obeying orders, either under programme control or when the
logical facilities on these B-lines would have been extremely valu- programme is in error. In particular, the machine stops if an order
able. The seven accumulators in Pegasus, used for modification for writing in the main store is reached and an overflow indicator
and arithmetic, are a development of the B-line concept.
is set. A further aid when testing new programmes is the automatic
punching out of all main-store addresses appearing in block-
Special facilities for dealing with ‘red tape’. The difficulties asso- transfer orders. When this information is examined an indication
ciated with the 2-level storage system have been greatly reduced of the course of a programme is readily obtained. The punching
by having an order-modification procedure which depends on the can be inhibited by a switch when a return to full-speed running
function of the order (Fig. 9). This method of modifying orders, is needed.
used in conjunction with order 66 of the code (the unit-modify
order), enables the counting through blocks of information to be Machine rhythm
done with relative ease. The logical design of Pegasus is built around a nucleus that deals
The use of the group-4 orders of the code enables counters to with the simple arithmetic orders, groups 0, 1 and 4, of the code.
be set conveniently and a constant (up to 127) to be placed in This nucleus contains the control section, i.e. the order register
an accumulator, the constant being the value of the N-digits of and order decoding equipment, and the mill in which these orders
the order. Order 67 (the unit-count order) enables the counting are executed. The design of this nucleus could not begin until a
of cycles of operations to be dealt with in a simple way. A jump basic rhythm for dealing with the extraction from the computing
to another part of the programme can be programmed to take store and the execution of such a pair was determined. When the
place automatically when the required number of cycles has been outline of this nucleus was clear, the equipment for dealing with
performed. the remaining orders in the code was designed to fit it.
Chapter 9 I The design philosophy of Pegasus, a quantity-production computer 179
The following arguments led to the basic rhythm. Since the Times for typical operations
orders of groups 0, 1 and 4 are similar in many respects, for The times for the various arithmetic operations are:
definiteness, it will be sufficient to consider a particular order, 11
of the code, say. This is an order which takes two numbers from millisec
the computing store and replaces one of them by their sum. It Addition and subtraction . . . . . . . 0.3
would take a prohibitive amount of equipment to extract these Multiplication . . . . . . . . . . . . . . 2.0
numbers, add them together and have the least significant digit Division . . . . . . . . . . . . . . . . . . 5.4
of the sum available for replacing in the store in the same digit These times include an allowance for the time to extract the
time as the least significant digits of the two components taken
orders.
out of the store. In practice, some four digit times at least would Some times for standard subroutines are:
be needed for this sequence of operations. Thus, it would be im-
possible to return the sum to the store in the same word as the millisec
operands are extracted without having an entry point to each Exponential function . . . . . . . . . . 29
register which is in a different timing from the normal circulation Sine function . . . . . . . . . . . . . . . 24
entry. To produce two such entry points to each register would Logarithmic function . . . . . . . . . . 34
mean more equipment associated with each register, which was Finally, to give some indication of the time for a typical prob-
considered an uneconomical use of extra equipment. Instead, it lem, a set of 50 simultaneous equations (with a single right-hand
was decided to delay the sum so that it could enter the register side) takes about 10y4 min. Of this time, 3 min 8 sec is for input,
in the computing store in the next word time in standard timing.
7 min 17 sec is for calculation and 18 sec is for output.
This involves one common delaying circuit instead of one for every
register. Such an order therefore takes two word times to execute.
It may be argued that this second word time could be made t o
Realizing the specification
overlap with the first word time for the next order. Two reasons
oppose this: the new contents of the register being changed might The detailed logical design
be required by the next order; and two different sets of equipment It would take too long to describe fully the detailed logical design.
for selecting a storage register would be needed if numbers were One aspect is worth mentioning, however, namely the avoidance
to be extracted from one and replaced in another register in the of all ‘exceptions’ in the results of orders. As an example of an
same word time. exception consider the overflow indicators, which should be set
Thus, the execution of a pair of orders taken from the comput- whenever the final result of an order is outside the permissible
ing store requires four word times. The reasons for opposing the range of numbers. In multiplication this can occur only when both
overlapping of the execution of two orders also oppose the extrac- the multiplier and the multiplicand are - 1, and this is likely to
tion of an order pair while the previous pair is being dealt with. occur very infrequently. Rather than provide equipment to sense
Five word times are therefore needed for the process of extracting this infrequent case, it is easier to put a footnote in the program-
and obeying a pair of simple arithmetic orders. More time may ming manual, where the overflow indicator is described, pointing
be needed for some of the other orders in the code. out the exception. It was felt, however, that such exceptions should
The basic 3-beat rhythm is thus established: be avoided even at the expense of extra equipment or extra com-
plication. For this and other reasons concerned with facilitating
a Extract the order pair from the computing store. machine use, the logic of Pegasus is quite complicated.
The end-product of the detailed logical design is a series of
h Obey the first order of the pair.
diagrams with symbols corresponding to the circuit units of the
c Obey the second order. packages, as shown, for example, in Fig. 5. The inputs and outputs
of the units on these diagrams correspond to the pins of the sockets
The duration of beat (a) is one word time; beats ( b ) and (c) into which the packages plug. Thus, the wiring lists of connections
are each two word times long for orders in groups 0, 1, 4 and 6 of these pins can be produced from these logical diagrams. The
of the code, but may be longer for other orders. first step in the production of these lists is to allocate a position
180 Part 2 1 The instruction-set processor: main-line computers Section 2 I Processors with a general register state
in the cabinets to each logical circuit in such a way as to reduce screening is necessary between any packages, a special metal plate
the amount of wire needed. When the layout has been completed, is inserted in slots in the cast rack and is fixed by a single screw
the last stage of producing the wire lists can proceed. in the back panel. Coded aluminium strips containing coloured
plastic studs which identify the position of each package are fixed
General construction of machine to the front of each casting.
The main units are shown in Fig. 10.
Arrangement of the packages. There are 200 packages per cabinet,
The package frame. This unit is a simple light-alloy frame sup- arranged in ten horizontal rows of 20 units per row. The metal
porting diecast light-alloy frame racks to which the back socket valve panels are placed so that the edges almost touch. The com-
panels are fixed. The packages slide into grooves in the rack and ponent panel of each unit is in register with the unit in the corre-
plug into sockets at the back, a polarizing feature preventing the sponding position in each of the other rows, thereby providing
insertion of a package upside down. If electrical or magnetic vertical chimneys for cooling the components secured to these
BAY I
LOGlC PACKAGES
\ 8AV 2
.OGlC PACKAGES
I
BAY 3
INPUT
EQUIPMENT
panels. Warm air from the main source of heat, the valves, is Fault location
prevented by the valve panels from reaching the more tempera- There are parity-checking circuits on both the main and the high-
ture-sensitive components, such as diodes, secured to the com- speed stores. Errors of a single digit in the stores stop the machine.
ponent panel. The fault can then b e quickly located by examination of the
monitors.
The back panel wiring. For locating long signal wires between For other faults the general method is to run a test programme
sockets a system of plastic strips is used, which hold the wires (assuming the fault is not in the main control) which will indicate
at definite positions given by the instructions on the wiring lists. the area of the fault. Detailed examination can then b e carried
The exact route of every wire is predetermined, thus making out with the monitors.
wiring and inspection more reliable and fault finding and mainte- All outputs of circuit units are readily accessible at monitoring
nance easier. sockets on the front of each package, and in addition about 80
points can be directly selected by switches from the monitoring
Final assembly. The completely wired frame is assembled in its position: these include all store lines and a number of key wave-
cabinet, which has already been fitted with the control and auxili- forms. Fault-finding is normally a matter of tracing 0’s and 1’s
ary supply circuit unit, heater transformers, fuses, cooling assembly through the machine with reference to logical diagrams rather
and cablefornis. The work of connecting the cableforms, heaters than electronic circuit diagrams.
and earths can be done by relatively unskilled labour working to A variety of triggers can be selected for the monitor time-bases,
clearly written instructions and diagrams. these including
The cooling system. Each cabinet has its own cooling system as
a Trigger at any word position within a drum revolution (128
an integral part of the construction; there is therefore no difficulty
different times selectable by switches)
in cooling cabinets added to existing computers. Two axial-flow
turbo blowers are mounted in the base beneath an airtight pressure h Trigger at any word time of any selected order
chamber, each providing 300 ft3/min of air a t a total pressure head
of 1 in (water gauge). The maximum temperature rise is 10” C. These triggers and some other monitoring facilities are pro-
duced by 19 standard packages and are found to be well worth
The power supply. A separate cubicle houses metal rectifiers, shunt the extra equipment.
stabilizing valves and control circuits. The power is obtained from
the mains through a motor-alternator set, the output of which is Fault repair
stabilized to 2%, the main purpose of this set being to act as a
Once a faulty package has been located, the machine can be got
buffer against switching surges and other mains voltage variations.
working again immediately by replacement of the package with
The valve heaters in the computer are energized from the stabi-
a spare; repair of the faulty package can be done at leisure with
lized alternator output, which is expected to extend the valve life.
the aid of a package tester. With this equipment a package can
quickly be given a series of standard tests; each is selected by
Maintenance switches, and the performance is measured either by observation
General of meters or a built-in oscillograph.
During commissioning not one case was found of the first
All digital computers so far have a fault rate which cannot be
machine doing other than what one would expect from the logical
ignored. When the best has been done in the choice of components,
diagram (except for a very few cases of incorrect wiring).
circuits and mechanical construction, attention must be paid to
the following points to get the best out of a machine:
Preuentiue maintenance
a Rapid fault location
The machine h.t. supplies are reduced while the test programmes
b Getting the machine working again as soon as possible after are being run. This marginal testing shows up incipient faults such
locating a fault as deterioration in valves, crystal diodes or resistors. The machine
c Preventive maintenance is at present kept in good running order down to 10% margins
182 Part 2 I The instruction-set processor: main-line computers Section 2 I Processors with a general register state
(the supplies are normally controlled to about 1% of nominal), for 55”/, hours’ running. The majority of package replacements are
although correct running a t about 20% reduction has been ob- done during routine maintenance.
served. The packaged method of construction of computers has proved
to have great advantages in design, construction and operation.
Conclusions
The first machine has been computing regularly for only a few References
months and has been on regular preventive maintenance (about ElliW56a; ~ l b o ~ 5 ElliW51,
3; 52, 53, 56b; FairJ56; JohnD52; MerrI56;
1 hour per day) for a few weeks. Error-free runs of over 30 hours Pegasus Programming Manual, Ferranti Ltd., London: Pegasus Mainte-
are common, and a t the time of writing there has been no error nance Manuals, Ferranti Ltd., London.
APPENDIX
00
01
x’ = n
x’=x+n
26 q‘ + 2-38(:) = x;
n
-y2 5 p’/n < Y2 (rounded single-
length division
02 x‘ = -n
27 Not allocated
03 x‘=x-n
04 x’=n-x
05 x’ = x & n 30
06 x’ = x $ n 31
07 Not allocated 32
33
,Not allocated
10 n’ = x 34
11 n’=n+x 35
12 n’ = -x 36
13 n’=n-x 37
14 n ’ = ~ - n
15 n’ = n & x 40 x‘ = c
16 n’=nfx 41 X ’ = X + C
17 Not allocated 42 x’ = - e
43 = - .c = ~ 2 - 3 8
44 x’=c--x
45 x’=x&c
46 X ’ = X ~ C
I
this order assumes that any
overflow is due to opera-
23 (nq)’ = n + 2-3xy tions in 7. Clears overflow 50 x‘ = ZNx single-length arith-
unless n‘ overflows 51 x’ = 2-lVx (rounded) metical shifts Note: x’ = x
52 Shift x up N places single-length logical
0 2 p’/n < 1 (unrounded 53 Shift x down N places ] shifts
77 stop
An 8-bit-character computer
(2)
cbd
f(r.i.N,Z,C+(P-
(31
f(s.d)l
C"r
(2)
cnd
(3)
ad odc sb sbc
ble Mp to 216 (or 65,384) characters. An alternative design might loo D!-A+R A'-A+ R tC A'-A -R A'tA-R-C
allow the maximum addressable Mp to be zz4 words, or, alter- (11 (1) (11 (11
mu, muf dii dif
natively, it could be variable. Although 24-bit operations are 101 A'-AXR(i) A8-AxR(ffr) A'-A/R(I) A'-A / R (fr}
111 (1) (1) (11
defined, their implementation might be expensive. Aligning the and or XO, cmpr
110 A-A A R A-A" R A-A@ R N.2-A-R
24-bit words on 32-bit-word boundaries would simplify the address (1) (11 (1) (1)
Id St shift SI.
calculation hardware. 111 B ' c R R- A A'-A X 2' L- r
(0 ~ (1) (1) (11
Instructions Formots
FOrmat Chorocler l e n g t h __
Name Behavior'
~
and data formats are of variable length, instructions being 1, 2, IOP I r 1 d J 3 Direct address c
0 7 23
3,4, and 5 characters long, and data being 1 , 2 , 3 , and 4 characters m-
- - ---I 1r - - - 7---7
2-5 Immediate d o t o d
long. The Pc state contains -35 characters, which are organized 0 7- 15 23L--31Lp-z
' ( 1 encloses instruction length in characters shown In formats toble
to be dealt with as eight 8-, 16, 24-, or 32-bit registers (shown 'See Stote diogrom, Fig. 2
184
Chapter 10 I An 8-bit-character computer 185
r
I n s t r u c t i o n lengths
1 character 2 chorocters 3 Characters
Appendix 1
An 8 Bit C h a r a c t e r Computer I S P D e s c r i p t i o n
Pc State
The f o l l o w i n g array of 8 general r e g i s t e r s , R, are mapped i n t o t h e f i r s t 8 x ( I h I ) c e l l s . The r e g i s t e r l e n g t h i s
&:6 x i L + l l ) - I>. The f i r s t r e g i s t e r o f each array, R [ O l i s an accumulator, m d has s p e c i a l p r o p e r t i e s .
R[O:71<0: (8 x L ' ) - I > := M[O:7][0:L1<0:7> General R e g i s t e r s o f l e n g t h IL+lI x 8 b i t s
A4:(8 x L') -I> := R[O]63:(8 X L ' ) -1> AccumuLator IgeneralZyJ
RQ[O :7 1 4 : 31> := M[O:71 [0:31<0:7> Quadruple R e g i s t e r s
AQ<0:31> := RQ[0]<0:31> Quadruple Accumulator
RT[O: 71<0:23> := M [ O : 7 1 10:21<0:7> T r i p l e Registers
AT<O:23> := RT[01<0:23> T r i p l e Accumulator
R O [ 0: 7 1 4 : 15> := M [ O : 7 1 [0:11<0:7> Double R e g i s t e r s
AD<0:15> := RD[Ol<O:15> Double Accumulator
RSEO: 7 1 G : 7> := M[0:71 [o:oIUl:7> Single Registers
AS<O:7> := R S [ O l < O : P S i n g l e Accumulator
Mp S t a t e
M[O:17777781<0:h primary memory
I n s t r u c t i o n Format
i [ 0 :41<0 : n 1 t o 5 character i n s t r u c t i o n
op<0:4> := i [ 0 1 4 :4> Op Code
r<O:2> := i[O]<5:h r e g i s t e r address
s<O:p := i [ 1 ] signed i n t e g e r f o r s h i f t s
&O :IS>:= i[1:2] address i n t e g e r
i60:(8 x L ' ) -I> := i [ l : L ' ] d ) : P v a r i a b l e l e n g t h innnediate data
I n s t r u c t i o n I n t e r p r e t a t i o n Process
((instruction[O:4]*J:D cM[P:P+k]; P t P + 1); next fetch
((op = Oil*) v ( o p = 1@11)v (op = 1001)) 4 (p t P t2))
((op = 1 M O ) v (op = 1 0 1 0 ) ) + cp t p + I);
(op = 010$) 4 (P t P + L+]): next
I n s t r u c t ion-execut i o n ) execute
Chapter 10 I An 8-bit-character computer 187
189
Section 1
]
which it influenced include the nearly identical English Electric
32 b/w; 16 t r a c k s / p o s i -
Deuce, the Bendix G-15, and the Packard Bell PB-250.' The 16 p o s i -
tion; 32 w / t r a c k ;
PMS structure does not strictly follow our lattice model (page [t i o n s
65). The Deuce PMS structure is given in Fig. 1. A 32-word
-
block in Mp.delay-line can be transferred to Ms.drum in one 'Mp(delay l i n e ; c y c l i c ; 32 1024 ps/w;
- 32 w; 32 b/w)
~~ ~ ~~
'H. D. Huskey was involved in the design of ACE, G-15, and PB.250; he was
undoubtedly the idea carrier. Fig. 1. English Electric Deuce PMS diagram.
191
192 Part 3 I The instruction-set processor level: variations in the processor Section 1 I Processors with greater than 1 address per instruction
coded. The LGP-30 (Chap. 16), by contrast, has only a basic The RW-400: a new polymorphic data system
instruction set. Hence a problem can be coded only one or two The RW-400 in Chap. 38 is a two-address, binary computer. It
ways. ZEBRA'S performance of 60 percent memory-cycle utiliza- is discussed in Part 5, Sec. 4, page 470.
tion is rather outstanding and raises the possibility that ran-
dom-access primary memories may not be necessary.
Instruction logic of the MIDAC
UNIVAC scientific (1103A) instruction logic The University of Michigan's MIDAC (Michigan Digital Auto-
The UNIVAC 1103A (Chap. 13) is a two-address computer. The matic Computer) is based on the National Bureau of Standards'
computer was designed initially by Engineering Research Asso- SEAC (Standards' Electronic Automatic Computer). MIDAC, a
ciates (ERA) of St. Paul.' UNIVAC acquired ERA in 1952 as a three-address, binary computer, is presented in Chap. 14.
scientific-computer division. The evolution of the 1103A later
yielded the 1107 and 1108 general register processors. The
reader should compare the 1103A with the IBM 704 series Instruction logic of the Soviet Strela (Arrow)
(Chap. 41). At the time both were used, it was not clear which The Russian Strela is presented in Chap. 15. Since it is used
computer was better. only to illustrate a three-address organization, the chapter con-
' A s the third in a series that started with the ERA 1101 and 1102 sists of only the instruction set.
Chapter 11
J. H . Wilkinson
193
194 Part 3 I The instruction-set processor level: variations in the processor Section 1 I Processors with greater than 1 address per instruction
source is the next to be obeyed. The structure of the instruction on an orthodox machine. The period of transfer to destinations
word is as follows: 17 and 18 is very important. Thus
Simplest among the sources and destinations are those associated thus has the effect of dividing the contents of TS26 by 2n, that
with the short delay lines. The six one-word delay lines are each is a right shift of n places. Similarly
given numbers and these for reasons associated with the history
19-26 (n mc)
of the machine are 11, 15, 16, 20, 26 and 27. They are usually
referred to as Temporary Stores or TS’s because they are used to gives a left shift of n places.
store temporarily those numbers which are being operated upon There are two functional sources which give composite func-
most frequently at each stage of a computation. In general TSn tions of the numbers in TS26 and TS27. These are Source 21 which
has associated with it a source, source n, and a destination, des- gives the number
tination n. An instruction of the type TS26 & TS27
15-16
and Source 22 which gives the number
in the preliminary stage of the coding represents the transfer of
TS26 f TS27
a copy of the contents of TS15 via source 15 to TSl6 via the
destination 16. After it has taken place both stores contain the There are a number of sources which give constant numbers which
number originally in TS15. The period of the transfer is not are of frequent use in computation. These are Source 23 which
mentioned in the coding because a transfer of more than one minor gives the number which has a zero everywhere except in the 17th
cycle is irrelevant. Most transfers are for one minor cycle and position, usually known as P17, Source 24 which gives P32, Source
hence the period of transfer is not specified unless it is greater 25 which gives P1, Source 28 which gives zero and Source 29
than one minor cycle. Associated with the TS’s are a number of which gives a number consisting of 32 consecutive ones. These
functional sources and destinations. TSl6 for instance has two sources are valuable because they provide numbers with an access
other destinations 17 and 18 associated with it, in addition to time of one minor cycle and are thus almost as useful as several
destination 16. Any number transferred to destination 17 is added extra TS’s.
to the contents of TSl6 while any number transferred t o destina- The use of a number of TS’s with the arithmetic facilities
tion 18 is subtracted from the contents of TS16. TS16 may be said distributed among them makes it possible to take advantage of
to have some of the functions associated with the accumulator the placing of instructions in appropriate positions in the long
Chapter 11 I The Pilot ACE 195
storage units so that they emerge as required. The coding of a by 1. Similarly for destination 25; the two possible next instructions
trivial example will illustrate the uses of the TS’s and their asso- are consecutive in the store.
ciated sources. It is required to build u p the successive natural The two double word stores are numbered DS12 and DS14.
numbers, their squares and their cubes simultaneously. It is natural DS12 has only source 12 and destination 12 associated with it,
to store the values in TS’s and we may suppose TS15 contains but DS14 has, in addition to source 14 and destination 14, a
n, TS20, n2 and TS26, n3. number of functional sources and destinations. Source 13 gives the
contents of DS14 divided by 2, while transfers to destination 13
have the effect of adding the numbers transferred to DS14. In
Instruction Description
specifying transfers from, and to, the double length stores, the time
1. 28- 15 zero to TS15 i.e. 0 These 3 instructions set the of the transfer must be specified, i.e. whether it takes place in an
2. 28-20 zero to TS20 i.e. 02 initial values even or an odd minor cycle or both. Thus the transfer
3. 28-26 zero to TS26 i.e. 0 3
- 12-14 (odd minor cycle) usually written
4. 26-16 TS16 contains n3 12-14 (0)
5. 20-17 (3rnc) TS16 contains n3 + 3n2
6. 15-17 (3rnc) TS16 contains n3 + 3n2 + 3n represents the transfer of the word in the odd positions of DS12
7. 25- 17 TS16 contains n3 + 3n2 + 3 n + 1 to the odd position in DS14 while
8. 16-26 TS26 contains (n + 1)s
9. 20-16 TS16 contains n* 12-14 (2 minor cycles)
10. 15-17 (2rnc) TS16 contains n* + 2n
represents the transfer of both words in 12 to the corresponding
11. 25-17 TS16 contains n* + 2n + 1
12. 16-20 TS20 contains (n + 1)2
positions in 14. The operation
13. 15-16 TS16 contains n
13-14 (2n)
14. 25-17 TS16 contains (n + 1)
15. 16-15 TS15 contains (n + 1) Next instruction (4) gives us a method of shifting the contents of TS14 n places to the
right while
The instructions (1)to (3)set the initial conditions. The instruction 14-13 (2n)
(4) - (15) have the effect of changing the contents of 15, 20, 26
from n, n2, n3 to (n + l),(n + 1)2,(n + l)3.As remarked earlier, produces a shift of n places to the left.
each instruction selects the next instruction and here instruction The machine is not equipped with a fully automatic multiplier.
(15) selects instruction (4) as the next instruction. In the prelimi- To multiply two numbers, a and b, together, a must be sent to
nary coding this is usually denoted by using an arrow; it must be TS20, b to DS14 odd, zero to DS14 even and a transfer (source
catered for in the detailed coding by the correct choice of the irrelevant) made to destination 19. The product is then produced
timing number, as will be shown below. in DS14 in 2 milliseconds, but a and b are treated as positive
The branching of a programme is achieved by the use of two numbers. Corrections must be made to the answer if a and b are
destinations, destination 24 and destination 25. If a transfer is made signed numbers. To make multiplication fast, it has been made
from any source to destination 24 then the next instruction is one possible to perform other operations while multiplication is pro-
or other of two according as the number transferred is positive ceeding. Thus the corrections necessary if a and b are signed
or negative. Similarly if a transfer is made to destination 25 then numbers may be built up in TS16 during multiplication, and signed
the next instruction is one or other of two according as the number multiplication takes only a little over two millisecs. It is, of course,
transferred is zero or non-zero. In the preliminary coding the therefore, a subroutine but a very fast one. The amount of equip-
bifurcation is denoted by the use of arrows, thus: ment associated with the multiplier is very small. The main part
of the store consists of the long storage units known as DL1, DL2,
. . . , DL11. Each of these has a source and a destination with the
same number as the DL number. The words in each DL are
numbered 0 to 31 and the nth word in DLM is usually denoted
In the detailed coding the effect is that if the number transferred by DLM,. Transfers to and from long lines in the preliminary
to destination 24 is negative then the timing number is increased coding are denoted thus:
196 Part 3 I The instruction-set processor level: variations in the processor Section 1 I Processors with greater than 1 address per instruction
8,- 16 (transfer nth word of DL8 to TS16) The last column gives the position of the next instruction in DL1;
8,-,-17 (add all the words from 8, to 8, i.e. n - m + 1 con- it is given by (m T+ + 2). The first 4 instructions occupy minor
secutive words of DL8" to TS16) cycles, 0, 2 and 4, 6 and each takes two minor cycles, and gives
a transfer for one minor cycle only. The next instruction occupies
Detailed coding minor cycle number 8 and it requires a transfer lasting 3 minor
In the second stage of the coding the true instruction words are cycles. The simplest and fastest way of getting this is to have
derived from the preliminary coding. This is a fairly automatic W = 0 and T = 2 giving a transfer of (2 - 0 +
1) minor cycles.
process and recent experience has shown that it can be carried The next instruction is in position (8 2 + +
2), that is minor cycle
out satisfactorily by quite junior staff. The timing of each instruc- 12, and so on. When we reach the instruction in minor cycle 31,
tion is given relative to the position of that instruction in the store. viz. 25-17, a transfer for one minor cycle is required. The simplest
This is an incidental feature of the code which arose from the way is to have W = 0 T = 0 and this makes the next instruction
attempts to minimize equipment. It would be dropped in any occupy position (31 + 0 + 2) i.e. position 33 which is position 1.
future machine in favour of an absolute timing system. If an in- If position 1 had been already occupied, a value of T could have
struction occupies position m in a DL and has a wait number been chosen in order to land in an unoccupied position. In order
W and timing number T then the transfer always begins in minor to ensure that a transfer of one minor cycle only took place, the
cycle (m W + +
2) and the next instruction is always in minor characteristic could have been made 1. It should be appreciated
cycle (m + T + 2) of the selected next instruction source. The that the choice of C, W and T is far from unique. Whenever
period of transfer depends on the value of the characteristic. If possible T = 0 and W = 0 are chosen because this gives the
the characteristic is zero then the transfer lasts for the whole highest speed of operation besides being simplest. The instruction
period from (m W + +
2) to (m T + +
2), that is (T - W + 1) occupying position 1 is of special interest because this is the last
minor cycles. If the characteristic is one, then the transfer is for instruction of the cycle needed to build u p a square and cube and
one minor cycle, that is minor cycle (m +
W + 2). If the charac- it must select as its next instruction the first of the cycle, which
teristic is three then the transfer is for two minor cycles is, in position number 6. This is achieved by making T = 3 (giving
(m + W +
2) and (m W + +
3). The characteristic value, two, the next instruction in m.c. 1 3 + +
2 = 6). This incidentally
is not used. The characteristic value zero gives a prolonged transfer gives a transfer lasting four minor cycles but since it is a transfer
which is peculiar to the Pilot ACE. The characteristics 1 and 3 from one TS to another and no functional source or destination
are analogous to the facility on EDSAC whereby full length or is in use, the prolonged transfer produces no harmful effect. If a
l/-length words may be transferred. On the Pilot ACE we transfer prolonged transfer had to be avoided then the characteristic could
single or double length words. This facility is invaluable for double be taken as 1. It is seldom necessary to use any characteristic other
length, floating and complex arithmetic. In the above definitions than zero for transfers to and from TS's but when transfers are
the numbers (m W + +
2) etc. are to be interpreted modulo 32. made to and from DL's, characteristic values of 1 or 3 are almost
In general, timing and wait numbers are simpler than they appear universal. All 12 instructions which comprise the repeated cycle
from the definitions because they are very frequently both zero, of the computation take a total time of one major cycle exactly
corresponding to a transfer for one minor cycle. The detailed (32 minor cycles) the last instruction of the cycle having been
coding of the problem given earlier will illustrate the procedure. specially designed to get back to the beginning of the cycle. This
All the instructions are in D L l so that the next instruction source is in contrast to the position in a machine not using optimum
is always one. The key to the headings in the following table is: coding, where 12 major cycles would be necessary quite apart from
the fact that the multiplications by factors of 3 and 2, each of
m.c. Minor cycle position of instructions in DLI which uses one instruction, would normally need more than one
instruction if a prolonged transfer were not available. Figure 1
N.I.S. Next instruction source
gives a simplified diagram of the machine. The sequence of events
S Source in obeying the instruction
D Destination
N S D C W T
C Characteristic 2 16 - 2C 0 8 10
W Wait number occupying DL1, for example is as follows. Starting from the time
T Timing number when the last instruction was completed, the instruction from
Chapter 11 I The Pilot ACE 197
0 1 28 15 0 0 0
1 1 16 15 0 0 3
2 1 28 20 0 0 0
3
4 1 28 16 0 0 0
5
6 1 26 16 0 0 0
7
8 1 20 17 0 0 2
9
10
11
12 1 15 17 0 0 2
13
14
15
16 1 25 17 0 0 0
17
18 1 16 26 0 0 0
19
20 1 20 16 0 0 0
21
22 1 15 17 0 0 1
23
24
25 1 25 17 0 0 0
26
27 1 16 20 0 0 0
28
29 1 15 16 0 0 0
30
31 1 25 17 0 0 0
DL1, will have passed into the special TS marked TS COUNT DL2,, will be trapped in COUNT. The cycle of events is now
during minor cycle number 2. By the end of minor cycle number complete. COUNT is associated with a counter and it is this
3, S switch number 16 will be over and also N switch number counter which determines from the wait, timing, and characteristic
2. The contents of TSl6 will be passing into HIGHWAY and those numbers of the trapped instruction, when the D and X switches
of DL2 into INSTRUCTION HIGHWAY. At the beginning of go over and back.
minor cycle number 12 (i.e. 2 + 8 + 2), D switch number 20 will
go over, and TS20 will stop recirculating and the number on the
HIGHWAY will pass into TS20. The transfer will continue until Input and output
minor cycle 14 (i.e. 2+ +
10 2 ) when the D switch number 20 The only part of the instruction word not described is the GO
will switch back. At the beginning of minor cycle 14, the switch digit. If the GO digit is a one, the instruction is carried out at
X on COUNT will go over and the number on INSTRUCTION high speed, but if it is a zero the machine stops and does not
HIGHWAY during this minor cycle, DL2,,, will pass into COUNT. proceed until a manual switch is operated. The GO digit is omitted
At the end of minor cycle 14, the X switch will close again and in strategic instructions when a programme is being tested. It also
198 Part 3 I The instruction-set processor level: variations in the processor Section 1 I Processors with greater than 1 address per instruction
c
sion is done between the rows of the card and up to 30 decimal
digits per card may be translated. This speed of conversion is only
possible because of the use of optimum coding. The facility for
carrying out computation between rows of cards is used extensively
particularly in linear algebra when matrices exceeding the storage
capacity of the machine are involved. The matrices are stored on
cards in binary form with one number on each of the 12 rows of
each card, all the computation being done either between rows
1s IS
TS. 27
when reading or when punching. Times comparable with those
DS
os
14
ta etc.
possible with the matrices stored in the memory are often achieved
in this way, when the computation uses a high percentage of the
available time between rows. Up to 80% of this time may be safely
used.
Initial input
The initial input of instructions is achieved by choosing destination
Fig. 1. Simplified diagram showing some sources, destinations, and
next-instruction sources. 0 in a special manner. When a transfer is made to destination 0,
then the instruction transferred becomes the next to be obeyed
serves a further purpose in synchronising the input and output and the next instruction source is ignored. Source 0 has already
facilities with the high speed computer. Input on the machine is been chosen specially since it is provided from a row of a card.
by means of Hollerith punched cards. When cards are passed The instruction consisting of zeros has the effect of injecting the
through the reader the numbers on the card may be read row by instruction punched on a row of a card into the machine as the
row as each passes under a set of 32 reading brushes. When a row next to be obeyed. The machine is started by clearing the store
of a card is under the reading brushes, the number punched on and starting the Hollerith reader which contains cards punched
that row, regarded as a number of 32 binary digits, is available with appropriate instructions. Destination 0 is also used when an
on source 0. In order to make certain that reading takes place instruction is built up in an arithmetic unit ready to be obeyed.
when a row is in position and not between rows, transfers from
source 0, have the GO digit omitted and it is arranged that the Miscellaneous sources and destinations
Hollerith reader has the same effect as operating the manual Destination 29 controls a buzzer. If a non-zero number is trans-
switch each time a row comes into position. The passage of a card ferred to destination 29 the buzzer sounds.
through the reader is called for by a transfer from any source to Source 30 is used to indicate when the last row of a card is
destination 31. No transfer of information from the card takes place in position in the reader or punch. This source gives a non-zero
unless the appropriate instruction using source 0 is obeyed during number only when a last row is in position. The operation of the
the passage of the card. Output on the machine is also provided arithmetic facilities on DS14 may be modified by a transfer to
Chapter 11 1 The Pilot ACE 199
destination 23. If a transfer with an odd characteristic is made stores! In addition a magnetic drum intermediate store with the
from any source to destination 23 then, from then on, DS14 be- equivalent of 32DL’s storage capacity will be added. A full scale
haves as though it were two single length accumulators in series. machine will probably soon be under development employing a
This means that carries are suppressed at the end of each of the 4 address code. Typical instructions will be of the form
single words. This condition persists until a transfer is made to
A k B C
destination 23 using an even characteristic, when DS14 behaves
as an accumulator for double length numbers with their least and will select the next source of instruction. This code is more
significant parts in even minor cycles and more significant parts economical in instruction storage space and since all single word
in odd minor cycles. stores will then become complete accumulators with all facilities
The operation TS20 is modified by transfers to destination 21. except multiplication on them, it will be possible to take much
If a transfer with an odd characteristic is made to destination 21 fuller advantage of optimum coding.
then TS20 ceases to have an independent existence and from then
Sources, destination and next instruction sources
on is fed continuously from DL10. Source 20 then gives the con-
tents of DLl0 one minor cycle later than from source 10. TS20 Sources Des tinations Next instr. sources
reverts to its former condition when a transfer with an even char-
0. Input 0. INSTRUCTION 0. DLll
acteristic is made to destination 21. The facility is used to move
1. DL1 1. DL1 1. DL1
the 32 words in DLlO round one position so that the word in minor 2. DL2 2. DL2 2. DL2
cycle n is available in minor cycle (n + 1). 3. DL3 3. DL3 3. DL3
4. DL4 4. DL4 4. DL4
Assessment of optimum coding 5. DL5 5. DL5 5. DL5
6. DL6 6. DL6 6. DL6
A detailed assessment of the value of optimum coding is by no
7. DL7 7. DL7 7. DL7
means simple. Roughly speaking, subroutines are on an average 8. DL8 8. DL8
about 4 or 5 times as fast as on an orthodox machine using the 9. DL9 9. DL9
same pulse repetition rate. In main tables a somewhat lower factor 10. DLlO 10. DLlO
is usually achieved. The factor of 4 or 5 would be exceeded if less 11. DLll 11. D L l l
of the advantage given by optimum coding were used to overcome 12. DS12 12. DS12
13. DS14 + 2 13. DS14add
disadvantages due to the rudimentary nature of the arithmetic 14. DS14 14. DS14
facilities on Pilot ACE. Even so, the bald statement of the average 15. TS15 15. TS15
ratio of speeds does not do full justice to the value of optimum 16. TS16 16. TS16
coding on the Pilot ACE. Its value springs as much from the fact 17. TS26 17. TS16add
that it has made possible the programmes in which computing 18. TS26 i 2 18. TS16 subtract
19. TS26 x 2 19.t MULTIPLY
is done between the rows of cards and also the high output speed
20. TS20 20. TS20
of decimal numbers. The binary decimal conversion routines for 21. TS26 & TS27 21. Modifies Source 20
punching out several decimal numbers simultaneously on a card 22. TS26 $ TS27 22. -
and also decimal-binary conversion routines for reading several 23. P17 23. Modifies Source 13,
numbers, achieve a ratio of something like 14 to 1, and on a Destination 13
machine which is being used extensively for scientific computation 24. P32 24. DISCRIMINATE on sign
25. P1 25. DISCRIMINATE on zero
on a commercial basis this is of immense importance.
26. TS26 26. TS26
27. TS27 27. TS27
28. Zero 28. Output
Future programme 29. Ones 29. BUZZER
Engineered versions of the Pilot Model are now under construction 30. Last row of card 3 0 . t PUNCH
by the English Electric Company. These machines will be similar 31. - 3 1 . t READ
to the Pilot Model but will have a little more high-speed store, t Independent of source used.
an automatic divider, two quadruple length stores and a subtrac-
tive input on the double length accumulator besides several minor References
modifications including a rationalization of the numbering of the WilkJ53; TuriS59
Chapter 12
Summary The computer ZEBRA is a computer based on the following n o technical details nor questions about dimensions or capacity
ideas: will be discussed. They can all be found in the literature [van
der Poel, 1956; van der Poel, 19521.
1. The logical structure of the arithmetic and control units of the The main idea of the machine is t o economise as far as possible
machine have been simplified as much as possible; there is not even
o n the number of components by simplifying the logical structure.
a built-in multiplier nor a divider.
For example, multiplication and division a r e not built in but must
2. The separate bits in an instruction word are used functionally and be programmed. Of course this system can only work with a n
can be put together in any combination. appropriate internal code which has enough properties to execute
3. Conventional two stage operation (set-up, execution) has been aban- basic arithmetic and logical routines effectively. In fact, the inter-
doned. Each unit time interval can be used for arithmetical opera- nal machine code is more or less a system of microprogramming
tions. [Wilkes and Stringer, 19531.
4. A small number of fast access registers is used as temporary storage;
at the same time these registers serve as modifier registers (B-lines).
5. Optimum programming is almost automatically done to a very great
extent. The percentage of word times effectively used is usually
greater than 60%.
6. An instruction can be repeated and modified while repeated by
using an accumulator as next instruction source and the address
counter as counter. This can be done without any special hardware. 15 bits 5 bits
This has resulted in a machine which has a very simple structure and hence
contains only a very moderate number of components, giving high relia-
AKQLRIBCDE
IVX~XZXW
test bits
operation part
I
/ 00000
fast store
address
bility and easy maintenance. Because of the functional bit coding, the
programming is extremely flexible. In fact the machine code is a sort of
micro-programming. Full-length inultiplication or half-length mnltiplica-
tion in half the time are just as easy, only require a different micro-
programme. The minimum latency programming together with the effec-
tive use of word times lost in other systems results in a very high speed
of operation compared to the basic clock pulse frequency.
Introduction
In the Dr. Neher Laboratory of the Dutch Postal & Telecom-
munications Services t h e logical design of a computer called ZE-
BRA has been developed, and this computer has been engineered
a n d constructed by Standard Telephones & Cables Ltd, England.
T h e logical system is so different from most computers, that it
is worth while t o devote a special lecture t o it. As time is limited,
D- and E-bits
I,]
El
Arithmetic
unit
Control The functional bits D and E control the direction of flow of infor-
mation.
It will be seen that A and K can have 4 possible combinations: A200.5 Add (200) (the contents of address 200) and (5)
to the accumulator. Step the address counter
Case 1 . A = 0, K = 0. This is called the adding jump (Fig. 2a). by 2.
While a new instruction is coming into the control from the drum, X200E5 Take next instruction from 200 ( = jump to 200)
the arithmetic unit can at the same time do an operation with and store contents of accumulator in 5.
the operand coming from the fast store. This is the fastest type
of operation. When the following instruction is placed in the next X200KE5 Jump to 200 and store previous contents of ad-
location on the drum there is no waiting time, and 32 instructions dress counter in 5. This amounts to placing a link
of this type can be executed per revolution. (One revolution = 10 instruction for return from a sub-routine.
ms, one word time = 312 ps.) X200K5 Take next instruction from 200 but modify it with
(5) thus making a variable instruction.
Case 2. A = 0, K = 1. This is called the double jump (Fig. 2b).
Both stores are now used for giving information to the control, Arithmetic bits
i.e., making a jump. Since the fast store is used for the control, The remainder of the function bits have arithmetic meanings. We
the instruction coming in from the drum is modified by the con- shall only briefly indicate their different actions.
tents of a fast register. In this way the B-line facility, as it is often
called, is realised. B: Do not use the A accumulator (most significant accumulator)
but the B accumulator.
Case 3. A = 1, K = 0. This is called the double addition (Fig. 2c).
Both stores are now connected to the arithmetic unit. The control
must take care of itself using the address counter which is stepped
up by 2 at a time, thus enabling this type of instruction to reach
the number lying between the two successive instructions without
any waiting time. Constants in particular will always be taken
from optimum places on the drum.
C: Clear the accumulator specified by B after storing, or before As can be seen, many complicated operations can be composed
addition. (In a serial machine like ZEBRA this is auto- by the elementary possibilities of the separate bits.
matically the case, cf. Fig. 3.)
I: Subtract instead of add. The accumulator
Q: Add one (unit in the least significant place) to the B-accu- A simplified block diagram of one of the accumulators is shown
mulator. in Fig. 3.
Shifting is effected by looping the accumulator over one place
L: Shift both accumulators one place to the left.
less or one place more. In a double addition the contents of the
R: Shift both accumulators one place to the right. The accu- drum store and the fast store are first added together in the pre-
mulators are always coupled together in shifting except adder (possibly augmented by unity in the B accumulator, if Q
when C is present. is present) and this result is added into the accumulator (or sub-
tracted in case of I). A clearing gate controlled by C interrupts
A few more examples will be given. the recirculation of the previous contents.
X1007
X101E5 X102
This will take ( 100) into C. (C) + 2 -+ D.
Another jump comes into C taking in (101)
and storing (A) + 5.
(C) + 2 -+ D gives X103E5.
Note that the operational part is kept in the
counter. The necessary constant from 102 is
just becoming available.
.L
const. X103E5 The next instruction is taken from 103 which
Drum store Fast store To store is immediately following. The constant in
A is stored to 5 by E5, and is still active
Fig. 3. Accumulator. after coming back from D.
Chapter 12 I ZEBRA, a simple binary computer 203
we can return not two but one location further on, i.e., X221K5
takes as next instruction ( 5 ) - 1 = X101. Here 5 contains the
D instruction and the drum modifier.
If
I
The drum address in D is
counted up but is not active.
By ending the sub-routine:
The register address remains
the same. Hence the instruc-
220 X221K5
tion in 5 is repeated.
221 - 1
204 Pari 3 1 The instruction-set processor level: variations in the processor Section 1 1 Processors with greater than 1 address per instruction
-
2-2 a X8188K5RW The repeating instruction as tion of a block of locations from 200 to 300 in the store. This
involves 101 locations. The programme reads:
1
well as the repeated instruction
are both shifted one place to
100 AlOlBC
F 3* a ARW X8190K5RW the right. Put A200Q in B (B has address 3).
db
2-4 a X 8 1 9 0 K 5 R W 101 A200Q
2-5 * a ARW XOOOK6RW As the drum address overflows 102 X103KE4C Put return jump X104 in 4. Clear A in
1
into the fast store address the advance.
repeating instruction becomes
X8192K5RW = XOOOK6RW 103 X7990K3W Repeat A200Q 101 times. Because A200Q
taking the next instruction from is standing in B the Q augments the in-
2-6 a XOOOK6RW 6. struction itself at every repetition. Hence
104 etc. successively (200), (201) etc. are added
2 Y . a X102 As (6) = X102 the repetition to A. At the end the sum is left in A and
returns to the main programme the programme proceeds a t 104.
and the A accumulator is shifted
over 7 places. It is left to the reader to work out the action diagram.
This example is not programmed for minimum waiting, but by
The instruction ARW has thus been repeated p times when the supplying the repeating instruction X7990K3W with a Q it will
drum address of the repeating instruction is 8192-213. This way step u p the repeated instruction A200Q by 2 every time. Now,
of repeating an instruction has made it possible to do multipli- once the first instruction has been located, all even locations follow-
cation, division, block transfers, table look u p and many other ing are emerging from the drum just at the right time. The odd
small basic repetitive processes in a very simple way. There is no numbered locations must be summed in a second, similar repeti-
special hardware present in the machine to do the counting neces- tion.
sary for the repetition, as this counting is done by the normal
References
address counter.
As a last example we shall give a programme for the summa- VandW59; VandW52, 56; WilkhlS3a.
Chapter 13
John W. Cam I l l
205
206 Part 3 I The instruction-set processor level: variations in the processor Section 1 I Processors with greater than 1 address per instruction
44 Q-Jump QJuv: If Q35 = 1, take (u) as NI. If Q35 = 0, take 76 External Read ERjv: If j = 0, replace the right-hand 8 bits
(v) as NI. Then, in either case, left circular shift (Q) by of (v) with (IOA); if j = 1, replace (v) with (IOB).
one place. 77 External Write EWjv: If j = 0, replace (IOA) with the
right-hand 8 bits of (v); if j = 1, replace (IOB) with (v).
Cause the previously selected unit to respond to the infor-
One-way conditional jump instructions mation in IOA or IOB.
41 Index Jump IJuv: Form in A the difference D(u) minus 61 PRint PR-V:Replace (TWR) with the right-hand 6 bits of
1. Then if A,, = 1, continue the present sequence of in- (v). Cause the typewriter to print the character corre-
structions; if A,, = 0, replace (u) with (AR) and take (v) sponding to the 6-bit code.
as NI. 63 Punch PUjv: Replace (HPR) with the right-hand 6 bits
42 Threshold Jump TJuv: If D(u) is greater than (A), take (v) of (v). Cause the punch to respond to (HPR). If j = 0, omit
as NI; if not, continue the present sequence. In either case, seventh level hole; if j = 1, include seventh level hole.
leave (A) in its initial state.
43 Equality Jump EJuv: If D(u) equals (A), take (v) as NI,
if not, continue the present sequence. In either case leave Arithmetic instructions
(A) in its initial state. 71 Multiply MPuv: Form in A the 72-bit product of (u) and
(v), leaving in Q the multiplier (u).
72 Multiply Add MAuv: Add to (A) the 72-bit product of (u)
One-way unconditional jump instructions and (v), leaving in Q the multiplier (u).
45 Manually Selective Jump MJjv: If the number j is zero, 73 Divide DVuv: Divide the 72-bit number (A) by (u),putting
take (v) as NI. If j is 1, 2, or 3, and the correspondingly the quotient in Q, and leaving in A a non-negative re-
numbered MJ selecting switch is set to “jump,” take (v) mainder R. Then replace (v) by (Q). The quotient and
as NI; if this switch is not set to “jump,” continue the -
remainder are defined by: (A), = (u) (Q) + R, where
present sequence. 05R < I(u)I. Here (A)i denotes the initial contents
37 Return Jump RJuv: Let y represent the address from of A.
which CI was obtained, Replace the right-hand 15 bits of 74 Scale Factor SFuv: Replace (A) with D(u). Then left cir-
(u) with the quantity y plus 1. Then take (v) as NI. cular shift (A) by 36 places. Then continue to shift (A) until
14 Interpret IP: Let y represent the address from which CI A,, # A,5. Then replace the right-hand 15 bits of (v) with
was obtained. Replace the right-hand 15 bits of (F,) with the number of left circular shifts, k, which would be neces-
the quantity y + 1. Then take (F,) as NI. sary to return (A) to its original position. If (A) is all ones
or zeros, k = 37. If u is A, (A) is left unchanged in the
first step, instead of being replaced by D(A,).
Stop instructions
56 Manually Selective Stop MSjv: If j = 0, stop computer
operation and provide suitable indication. If j = 1, 2, or Sequenced instructions
3 and the correspondingly numbered MS selecting switch 75 Repeat RPjnw: This instruction calls for the next instruc-
is set to “stop,” stop computer operation and provide tion, which will be called NIuv, to be executed n times,
suitable indication. Whether or not a stop occurs, (v) is its u and v addresses being modified or not according to
NI. the value of j. Afterwards the program is continued by the
57 Program Stop PS-Stop computer operations and provide execution of the instruction stored at a fixed address F,.
suitable indication. The exact steps carried out are:
c If j = 0, do not change u and v. 67 Divide FDuv: Form in Q the normalized rounded packed
If j = 1, add one to v after each execution. floating point quotient (u) + (v).
If j = 2, add one to u after each execution. 01 Polynomial Multiply FPuv: Floating add (v) to the floating
If j = 3, add one to u and v after each execution. product (Q)i (u), leaving the packed normalized rounded
result in Q.
The modification of the u address and v address is done 02 Inner Product FIuv: Floating add to (Q)i the floating
in program control registers. The original form of the product (u) * (v) and store the rounded normalized packed
instruction in storage is unaltered. result in Q. This instruction uses MC location F4 = 00003
for temporary storage, where (FJf = (Q)i. The subscripts
d On completing n executions, take (FJ, as the next i and f represent “initial” and “final.”
instruction. F, normally contains a manually selec- 03 Unpack UPuv: Unpack (u), replacing (u) with ( u ) and ~
tive jump whereby the computer is sent to w for replacing ( v ) with
~ ( u ) or ~ its complement if (u) is negative.
the next instruction after the repeat. The characteristic portion of ( u ) ~ contains sign bits. The
sign portion and mantissa portion of ( v ) ~ are set to zero.
e If the repeated instruction is a jump instruction,
the occurrence of a jump terminates the repetition.
Note. The subscripts M and C denote the mantissa and
If the instruction is a Threshold Jump or an Equality characteristic portions.
Jump, and the jump to address v occurs, (Q) is 04 Normalize Pack NPuv: Replace (u) with the normalized
replaced by the quantity j, (n - r), where r is the rounded packed floating point number obtained from the
number of executions that have taken place. possibly unnormalized mantissa in ( u ) ~and the biased
characteristic in ( v ) ~Note.
. It is assumed that ( u ) has
~ the
binary point between uZ7and uZ6;that is, that ( u ) is ~ scaled
by 2-27.
Floating point instructions 05 Normalize Exit NEj-: If j = 1 normalize without rounding
64 Add FAuv: Form in Q the normalized rounded packed until a master clear or until the instruction is again exe-
floating point sum (u) (v). + cuted with i = 0.
65 Subtract FSuv: Form in Q the normalized rounded packed
floating point difference (u) - (v). References
66 Multiply FMuv: Form in Q the normalized rounded Univac Scientific Electronic Computing System Model 1103A, Form EL
packed floating point product (u) (v). - 338
Chapter 14
The MIDAC, Michigan Digital Automatic Computer [Carr, 19561, instruction. Upon completion of the operation, stop the machine
was constructed on the basis of the design of the SEAC a t the if the proper external switches are energized.” The binary com-
National Bureau of Standards. Its instruction code is particularly bination represented by 5 is the operation code for addition.
of interest because it incorporates the index register concept into
a three-address binary instruction. Numbers in this machine are
(44, 0, 0)2 fixed points. The word length is 45 binary digits with Data or addresses
serial operation. The addresses given by the twelve binary digits in each of the
three locations designate in the machine the individual acoustic
storage cells and blocks of eight magnetic drum storage cells. The
Word structure
addresses from 0 to 1023 (decimal) or 000 to 3FF (hexadecimal)
The data or address positions of an instruction are labeled the a , correspond to acoustic storage cells. The addresses from 1024 to
j3, and y positions. Each contains twelve binary digits represented 4095 (decimal) or 400 to FFF (hexadecimal) correspond to mag-
externally as three hexadecimal digits. Four binary digits, or one netic drum storage blocks. In certain operations, however, the
hexadecimal digit, are used to convey the instruction modification addresses 0 to 15 (decimal) or 0 to F (hexadecimal) represent
or relative addressing information. The next four binary digits or input-output stations rather than storage locations.
single hexadecimal digit represents the operation portion of the These twelve-binary-digit groups will in some cases be modified
instruction. The final binary digit is the halt or breakpoint indi- by the machine in order to yield a final twelve-binary-digit address.
cator for use with the instruction. The method of processing will depend on the values of the instruc-
For example, the 45-binary-digit word tion modification digits. After modification, the final result will
00000110010000001100100000010010l100000001011 then be interpreted by the control unit as a machine address.
In some instructions, namely those that perform change of
considered as an instruction would be interpreted as control operations, which involve cycling and counting rather than
a P Y abcd O p halt simple arithmetic operations on numbers, the a and /3 positions
000001100100 000011001000 o00100101100 0OOo 0101 1 in an instruction are not considered as addresses. In those cases,
they are used instead as counters or tallies. In other instructions,
In external hexadecimal form this would be written which do not require three addresses, but only one or two, the
064 0c8 12c 0 5 -
p position is not considered as an address. In these cases, the
oddness or evenness of the /3 address is used to differentiate be-
The above binary word is the equivalent machine representation tween two operations having the same operation code digits. That
of the following instruction: “Take the contents of hexadecimal is, the parity of binary digit P22 is used as an extra function
address 064, add to it the contents of hexadecimal address 0c8, designator.
and store the result in hexadecimal address 12c. There is no
modification of the 12-binary-digit address locations given by the
Instruction modification digits
‘In E. M . Grabbe, S. Ramo, and D. E. Wooldridge (eds.), “Handbook of
The four binary digits P9-P6 are used as instruction modification
Automation, Computation, and Control,” vol. 2, chap. 2, pp. 115-121,
John Wiley & Sons, Inc., New York, 1959. or relative addressing digits. Their normal function is relatively
2Carr’striplet notation for: fractional significant digits, digits in exponent, simple; nevertheless, the possible exceptions to the general rule
and digits to left of radix point. can make their behavior complicated. These four digits are labeled
209
210 Part 3 I The instruction-set processor level: variations in the processor Section 1 1 Processors with greater than 1 address per instruction
the a, b, c, and d digits. Ordinarily the a digit is associated with Then the modified addresses a’, b’, and y’ are related to the a ,
the a position, the b digit with the position, and the c digit /I,
and y addresses appearing in the instruction by the following:
with the y position in an instruction.
When binary digit P22 (or the p position) is used in an instnic-
a’ = a + aC, p’ = + 1Xd y’ =y + cC,
(a, b, c, d = 0, 1)
tion to represent extra operation information, the instruction
modification digit b is ignored. In the case of input and output In certain instructions addresses relative to one of the two
instructions, when the various address positions represent machine counters may be prohibited. Thus, if in a particular instruction
address locations on the drum, input-output stations, or block N may be relative only to the instruction counter, then for that
The contents of the two counters will he designated by C, 1 It may contain the address of the initial word in a group,
(d = 0, 1). thus serving as a base address to which integers representing
C, = contents of the instruction counter the relative position of a given word in the group of words
C, = contents of the base counter may be added by using the address modification digits.
Chapter 14 I Instruction logic of the MIDAC 211
2 It may contain a counter or tally which can be increased 5 Multiply, Rounded. The most significant 44 binary digits
by a base instruction. This instruction makes use of the of (a’) x (p’) k 1 2-45 are placed in y’. The 1 * 2-45 is
address modification digits to change the counter so as to added if (a’) x (p’)is positive, and subtracted if (a’) x (p’)
count the number of traversals of a particular cycle of is negative.
instructions.
6 Divide. The most significant 44 binary digits of (D’)/(a’)
are placed in y’.(Note the inversion of order of a and p.)
Instruction types Result must be less than 1 in absolute value.
Instructions used in MIDAC can be divided into three categories: 7 Power Extract. The number n * 2-44 is placed in y’ where
change of information, change of control, and transfer of informa- n is the number of binary 0’s to the left of the most signifi-
cant binary 1 in (a’). The b digit is ignored; p may be any
tion. The first category can be further subdivided into arithmetic
even number. If (a’) is all zeros, zero is placed in y’.
and logical instructions. In the arithmetic instructions are included
addition, subtraction, division, various forms of multiplication; 8 Shift Number. The 44 binary digits immediately to the
power extraction, number shifting; and number conversion instruc- right of the radix point in (a’) * 2(P’)’2’‘ are placed in y’.
tions. The sole logical instruction is extract, which modifies infor- The result, in y’,is the equivalent of shifting (a’)n places,
mation in a nonarithmetic fashion.
-
where n 2-44 = (p’) and 11 positive indicates a shift left,
n negative a shift right. If In1 2 44, zero is placed in y’.
The transfer of information or data transfer instructions include
transfers of individual words or blocks of words into and out of 9 Extract or Logical Transfer. Those binary digits in (y’),
the acoustic storage and drum and magnetic tape control. including the sign digit, whose positions correspond to 1’s
The possible change of control instructions includes two com- in (p’) are replaced by the digits in the corresponding
positions of (a’).
parisons that provide different future sequences dependent on the
differences of two numbers. In the compare numbers or algebraic 10 Decimal t o Binary Conversion. This operation may be
comparison, the difference is an algebraic, signed one. In the interpreted in two ways: (a) (a’) is considered as a binary-
compare magnitudes or absolute comparison, the difference is one coded-decimal integer times 2-44. It is converted to the
between absolute values. Two other instructions, file and base, equivalent binary integer times 2-37 and the result is
placed in y’, or ( b ) (a’) is considered as a binary-coded-
perform other tasks beside transferring control. The file instruction
decimal fraction, D. It is converted into an intermediate
transfers control unconditionally. The file instruction files or stores
binary fraction, Ri, such that Bi = D x loll x 2-37 and
the contents of the base or instruction counter in a specific address the result placed in y’.To obtain B, the true binary equiv-
position of a particular word in the storage. The base or tally alent of D, Bi must be multiplied by x 237).How-
instruction provides a method for referring addresses automatically ever, since this factor is greater than l and therefore can-
relative to the address given by the base counter, irrespective of not be represented in the machine, two operations must
its contents. The base instruction also gives a conditional transfer be performed. For example,
of control. B~ x (10-11x 237 - 1) = B~
The nineteen MIDAC instructions can be described function- B = Bi + Bj
ally as follows:
Here the b digit is ignored, and p may be any eoen number.
11 Binary-to-Decimal Conversion. (a’),considered as a binary
Change of information fraction, is converted into the equivalent eleven-digit bi-
Add. (a’) + (p’) is placed in y’. Result must be less than nary-coded-decimal fraction. The result is placed in y’.The
1 in absolute value. b digit is ignored, and /3 may be any odd number.
Subtract. (a’) - (p’) is placed in y’. Result must be less
Change of control
than 1 in absolute value.
12 Compare Numbers. y can be relative only to the instruc-
Multiply, Low Order. The least significant 44 binary digits
tion counter. If (a’) 2 (p’),the contents of the instruction
of (a’) x (p’) are placed in y’.
counter are increased by one as is normally done at the
Multiply, High Order. The most significant 44 binary digits <
end of each instruction. If (a’) (B’), the contents of the
of (a’) x (p’) are placed in y’. instruction counter are set to y’.
212 Part 3 I The instruction-set processor level: variations in the processor Section 1 I Processors with greater than 1 address per instruction
13 Compare Magnitudes. y can be relative only to the instruc- 16 Alphanumeric Read In. The a digit must be 1; the b digit
tion counter. If I (a’)1 2 I (p’)1, the contents of the instruc- is ignored. If p is in the range 0 to 7 (decimal) or 000 t o
tion counter are increased by one as is normally done at 007 (hexadecimal) a characters are read into the acoustic
the end of each instruction. If I (a’)I <
1 (p’)1, the contents storage from input-output station /3. The first character
of the instruction counter is set to y’. read in is placed in y’, the second in y’ 1, etc. Each +
character occupies the six most significant digit positions
14 Base or Tally. The d digit is ignored. a and p may be of the register into which it is read; the other positions
relative only to the base counter, y only to the instruction
are set to zero. This operation may not be used to read
counter. If a’ 2 p’, the contents of the base counter are
words from the drum into the acoustic storage.
set to zero and the contents of the instruction counter
increased by one as usual. If a’ <
/3‘, the contents of the 17 Alphanumeric Read Out. The a digit must be 1; the c digit
base counter are set to a’ and the contents of the instruc- is ignored. Starting with (p’),read out a consecutive char-
tion counter to y’. (Note. The comparisons made here are acters from the acoustic storage to input-output station
of addresses themselves, not their contents.) y ; y must be in the range 0 to 7 (decimal) or 000 to 007
(hexadecimal). This operation may not be used to read
15 File. p may be any odd number. a and y may be relative
words from the acoustic storage onto the drum.
only to the instruction counter.
If d = 0, the contents of the instruction counter in- 18 Move Tape Forward. (a, b, c and d digits are ignored.) /3
creased by one is placed in the y position of (a’), and the may be any even number; y must be in the range 0 to 15
instruction counter is set to y’. decimal (000 to OOF hexadecimal). The magnetic tape at
If d = 1, the contents of the base counter is placed in input-output station y is moved forward n blocks where
the a position of (a’), and the instruction counter is set
a - 1
to y’. In addition, if b = 1, the contents of the base counter n=[T] +1
is set to zero; if b = 0, the contents of the base counter
is not changed. that is, one plus the integral part of a - yx,or the number
of blocks that include a words.
Transfer of information 19 Move Tape Backward. (a, b, c, and d digits are ignored.)
/3 may be any odd number; y must be in the range 0 to
16 Read In. The a digit must be 0; the b digit is ignored. 15 decimal (000 to OOF hexadecimal). The magnetic tape
If p is in the range 0 to 7 (decimal) or 000 to 007 (hexadeci- at input-output station y is moved backward n blocks
mal) a words are read into the acoustic storage from in- where
put-output station p. The first word read in is placed in
y’, the second in y’ + 1, etc. If p is in the range 1024 to
1791 decimal (400 to 6FF hexadecimal), a words are read
.=[TI a - 1
+1
into the acoustic storage from the drum starting with the
first word in the drum block whose address is p. The first
that is, one plus the integral part of a - yx,or the number
of blocks that include a words.
word is placed in y’, the second in y’ + 1, etc.
17 Read Out. The a digit must be 0, the c digit is ignored.
Starting with (p’),read out a consecutive words from the
acoustic storage to input-output station y, if y is in the
References
range 0 to 7 decimal (000 to 007 hexadecimal), or to the
drum starting at the beginning of the drum block whose CarrJ56. SEAC computer references: AinsE52; AlexS51; ElboR53; GreeS52,
address is y,if y is in the range 1024 to 1791 decimal (400 53; HaueR52; PikeJ52; SerrR62; ShupP53; SlutR51. DYSEAC computer
to 6FF hexadecimal). references: LeinA54.
Chapter 15
Instruction logic of the
Soviet Strela (Arrow)l
John W. Caw I I I
A typical general purpose digital computer using three-address As a rule, the address of the instruction being changed corresponds
instruction logic is the Strela (Arrow) constructed in quantity to the address y.
under the leadership of Iu. la. Basilewskii of the Soviet Academy 03. - a /3 y. Subtraction with signed numbers. From
of Sciences, and described in detail by Kitov [1956]. This com- the number (a)is subtracted the number (p) and the result sent
puter uses a (35, 6, 0)2 binary floating point number system. to y.
Its instruction word, of 43 digits, contains a six-digit operation 04. - ‘cy /3 y. Difference of the absolute value of two
code, and three 12-digit addresses, with one breakpoint bit. In numbers I(a)I - I(P)I = (VI.
octal notation, two digits represent the operation, four each the 05. X a /3 y. Multiplication of two numbers (a)and (/?)
addresses, and one bit the breakpoint. This machine operates with with result sent to y .
up to 2048 words of high-speed cathode ray tube storage. 06. A a /3 y. 1,ogical multiplication of two numbers in
Input-output is ordinarily via punched cards and punched cells a and P. This instruction is used for extraction from a given
paper tape. A “standard program library” is attached to the com- number or instruction a part defined by the special number (p).
puter as well as magnetic tape units (termed “external accumula- 07. V cy /3 y. Logical addition of two numbers ( a ) and
tors” below). Note. This computer is different from both the BESM (p) and sending the result t o cell y. This instruction is used for
described by Lebedev [ 19561 and the Ural reported by Basilewskii forming numbers and commands from parts.
[ 19571. Apparently, it is somewhat lower in performance than 10. Sh a /3 y. Shift of the contents of cell a by the
BESM. number of steps equal to the exponent of the (p).If the exponent
Since all arithmetic is ordinarily in floating point, “special of the (p) is positive then the shift proceeds to the left, in the
instructions” perform fixed point computations for instruction direction of increasing value; if negative, then the shift is right.
modifications. In addition, the sign of the number, which is shifted out of the
Ordinarily instructions are written in an octal notation, but cell, is lost.
external to the machine operation symbols are written in a 11. - cy /3 y. Special subtraction, used for decreasing
mnemonic code. The two-digit numerals are the octal instruction the addresses of instructions. In the cell a is found the instruction
equivalent. to be transformed, and in cell p the specially selected number.
Ordinarily addresses a and y are identical.
Arithmetic and logical instructions 12. # a /3 y. Comparison of two numbers (a)and (p)
by means of digital additions of the numbers being compared
01. + cy /3 y. Algebraic addition of (a)to (p)with result modulo two. In the cell y is placed a number possessing ones in
in y. those digits in which inequivalence results in the numbers being
02. + a /3 y. Special addition, used for increasing ad- compared.
dresses of instructions. The command (a)or (/?) is added to the
number (/3) or ( a ) and the result sent to the cell with address y .
Control instructions
‘In E. M. Grahhe, S. Ramo, and D. E. Wooldridge (eds.), “Handbook of
13. C cy /3 0000. Conditional transfer of control either to
Automation, Computation, and Control,” vol. 2, chap. 2, pp. 111-115,
John Wiley & Sons, Inc., New York, 1959. instruction (a)or to instruction (p), depending on the results of
’Carr’s triplet notation for: fractional significant digits, digits in exponent, the preceding operation. With the operations of addition, sub-
and digits to left of radix point. traction, and subtraction of absolute values, it appraises the sign
213
214 Part 3 1 The instruction-set processor level: variations in the processor Section 1 1 Processors with greater than 1 address per instruction
of the result: for a positive or zero result it transfers control to 21. T, a n 0000. This instruction guarantees the trans-
the command (a),for negative results to the command (p). fer to the input-output unit (to punched paper tape or punched
The result of the operation of multiplication is dependent on cards) of a group of n numbers from the storage, beginning with
the relationship to unity. Transfer is made to the command (a) address a. The record on punched paper tape or punched cards
in the case where the result is greater than or equal to one, and as a rule will begin with the first line and therefore a positive
to command (p), if it is smaller than one. indication of the addresses of the record is not required.
For conditional transfer after the operation of comparison, 22. T, a n y. Instruction T, guarantees transfer of a
transfer to the instruction ( a ) is made in the case of equality of group of n numbers from one place in the storage with initial
binary digits, and to (p) when there is any inequivalence. address a into another place in the storage with initial address y.
After the operation A (logical sequential multiplication) the 23. T, a n y. Instruction T, guarantees transfer of a
conditional transfer command jumps to the instruction (a)when group of n numbers from the storage with initial address a into
the result is different from zero, and to instruction (p) when it the external accumulator with address y.
is equal to zero. 24. T, a n 0000. Instruction T, serves for transfer of n
A forced comparison is given by numbers from the zone of the external accumulator with address
a into the input-output unit.
c a a 0000
Instructions T, and T, cannot be performed concurrently with
The third address in this command is not used and in its place other machine operations.
is put zero.
14. 1 - 0 a 0000 0000. This instruction is executed paral- Standard subroutine instructions
lel with the code of the other operations, and guarantees bringing
Certain instructions in the Strela, although written as ordinary
into working position in good time the zone of the external ac-
instructions, are actually “synthetic” instructions which call on
cumulator (magnetic tape unit) with the address a .
a subroutine for computation of the function involved. The amount
15. H 0000 0000 0000. This instruction executes an ab-
of machine time (number of basic instruction cycles) for an itera-
solute halt.
tive process depends on the required precision of the computed
Group transfer instructions function. The figures given below are based on approximately
ten-digit decimal numbers with desired precision one in the tenth
Special instructions for group transfer serve for the accomplish-
place.
ment of a transfer of numbers to and from the accumulators. In
25. D a /3 y. This standard subroutine serves for exe-
the second address in these instructions stands an integer, desig-
cution of the operation of division: The number ( a )is divided into
nating the quantity of numbers in the group which must be trans-
the number (p) and the quotient is sent to cell 7.
ferred. Group transfers always are produced in increasing sequence
The actual operation of division is executed in two steps: the
of addresses of cells in the storage.
initial obtaining of the value of the inverse of the divisor, by which
16. T, 0000 n y. The instruction T, guarantees transfer
the dividend is then multiplied. The computation of the inverse
from a given input unit (with punched cards, perforated tape, etc.)
is given by the usual Newton formula, originally used with the
into the storage. In the third address y of the instruction is indi-
EDSAC [Wilkes et al., 19521.
cated the initial address of the group of cells in the storage where
numbers are to be written. With punched paper tape or punched Yn+1 = YnP - Yn4
cards the variables are written in sequence, beginning with the
For x = d * 2P, where ‘/z d < <
1, the first approximation is taken
first line.
as 2-P. The standard subroutine takes 8 to 10 instructions and can
17. T, 0000 n y. The instruction T, guarantees transfer
be executed in 18-20 machine cycles (execution time for one
of a group of n numbers from an input unit into the external
typical command).
accumulator in zone y.
20. T, a n y. This instruction guarantees a line-by-line
26. < a 0000 y . This instruction guarantees obtaining
the value & from the value x = ( a ) and sending the result to
sequence of transfers of n numbers from zone a of the external
cell y. Initially l / & is computed by the iteration formula
accumulator into the cells of the storage beginning with the cell
with address y .
Chapter 15 I Instruction logic of the Soviet Strela (Arrow) 215
where the first approximation is taken as 32. DB a n y. This instruction performs conversion of
- Z‘P/Z’ a group of n numbers, stored in locations a , a + 1, . . . from bi-
0-
nary-coded decimal into binary and sending of the result to loca-
the bracket indicating “integral part of.” After this the result is tions y. y + l , . . . . The subroutine contains 14 instructions and
multiplied by x to obtain 6. This standard subroutine contains 14 is executed in 50 cycles (for each number).
instructions and is executed in 40 cycles. 33. BD a n y. This instruction performs the conversion
27. ex a 0000 y. This instruction guarantees formation of a group of n numbers stored in locations a , a +
1,.. . from the
of L for the value x = (a) and sending the result to cell y. The binary system into binary-coded decimal and sends them to loca-
computation is produced by means of expansion of ex in a power tions y, y + l , .. . . The subroutine contains only 30 instructions
series. The standard subroutine contains 20 instructions and is and is executed with 100 cycles (for each number).
executed in 40 cycles. 34. MS a n y. This is an instruction for storage sum-
30. l n x a 0000 y. This instruction guarantees forma- ming. This instruction produces the formal addition of numbers,
tion of the function In x for the value x = ( a )and sending the re- stored in locations beginning with address a , and the result is sent
sult t o location y. computation is produced by expansion of In x in t o location y . Numbers and instructions are added in fixed point.
series. The subprogram contains 15 instructions and is executed This sum may be compared with a previous sum for control of
in 60 cycles. storage accuracy.
31. s i n x a 0000 y. This instruction guarantees execu-
tion of the function sin x and sending the result to location y. The References
computation is produced in two steps: initially the value of the BasiI57; KitoA56; LebeS6; WilkM52.
argument is translated into the first quadrant, then the value of
the function is obtained by a series expansion. The subroutine
contains 18 instructions and is executed in 25 cycles.
Section 2
216
Chapter 16
Since there is only one address/instruction, a method is needed operations: (+.-.x ,/,A,x 2))
for the optimal allocation of operands. Otherwise, each instruction Mp(drum; t . c y c l e : 260 us/w; t.access: (.260 - 16.6) ms;
i . r a t e : 2.34 ms/w contiguous addresses: 4096 w; (31 , I
might have to wait a complete drum (or disk) revolution each time
space) b/w)
a data reference is made. The LGP-30 provides for operand-
T ( F l e x o w r i t e r , paper t a p e )
location optimization by interlacing the logical addresses on the
drum so that two adjacent addresses (e.g., 00 and 01) are separated LGP-21 ; t e c h n o l o g y : (460 t r a n s i s t o r s ) , (375 d i o d e s ) ; power:
by nine physical locations.' These spaces allow for operands to 300 w a t t s ; w e i g h t : 9 0 pounds; number p r o d u c e d : - 150;
be located next to the instructions which use them. There are 64 t.delivery: December 1962;
tracks, each with 64 words (sectors). Each word is accessed by M p ( f i x e d head d i s k ; c y c l i c ; t . c y c l e : 400 us/w; t.access:
a track address of 6 bits and a word address of 6 bits. The sequence (0 - 52) m s : i . r a t e : 7.26 ms/w contiguous addresses:
of words (sectors) within a track is 00, 57, 50, 43, 36, 29, 22, 15, 4096 w: (31.1 space) b/w)
08, 01, 58, 51, 44, 37, . . . , 06, 63, 56, 49, 42, 35, 28, 21, 14, T ( # 1 : 3 2 ; F l e x o w r i t e r , paper tape, analog, CRT, c a r d ) ,
217
218 Part 3 1 The instruction-set processor level: variations in the processor
Section 2 Processors constrained by a cyclic, primary memory
Appendix 1
pc State
A d ) : 302 Accmtator
C-48:23,24 :29; Propram Counter r e g i s t e r
OV Overflow, LCP-21 only on LCP-30 machine stops i f an overflow
Run
pc Console S t a t e
0P4,8,16,32> Break Point switches
TC Transfer Control switch
I
I 8%
I(
State
M[O:778~~O:778~<O:30~~
State
primar,u memory; 212 w ; track and sector (word)
The following Input Output devices do not have synchronization r'escription variables. LCP-21 only. LCP-30 has a Flexowriter.
I n s t r u c t i o n Format
i<0:30> instruction
opaJ:j> := i 4 2 : 1 5 > operation code
td:5> := i<18:23> track s e l e c t b i t on Mp
t'al:4>:= t<l:5> innut-output s e l e c t , LCP-21 o n l y
sd:5> := i Q 4 : 2 9 > sector s e t e c t h i t of W
s k i p c o n d i t i o n := t ( t 4 : 3 > A BP) # 0)
I n s t r u c t i o n I n t e r p r e t a t i o n Process
Run - ( i t M [ C ] ; C t C + I ; next fetch
lnstructiongxecution) execute
2 (:= op = 0)
(t = OOOOOe)
--
I n s t r u c t i o n g x e c u t i o n := (
(
(Run to): stop
skip condition -(C t C + 1); sense BP and t r a n s f e r
ia>+ ( O V +(OV to; c t~ + 1))); sense overflow and t r a n s f e r
B ( : = op = 1 ) + ( A +-M[tl[~l): bring f r o m memory
Y ( : = op = 2 ) + (M[tl[s]<18:29>~-A<18:29>); s t o r e address
R ( : = op = 3) +(M[t][s1<18:29> CC + 1): s e t r e t u r n address
I (:= op = 4) + ( s h i f t s , and input
7 iaJ> A ( t = 6 2 ) + ( A t A x Z6 [logical));
ia>A (t=62) -(A +A x Z4 [logical]):
7 iaJ> A (tf62) --f (input,b,bit) :
id>A ( t # 6 2 ) + ( i n p u t h J i t ) ) :
Chapter 16 I The LGP.30 and LGP-21 219
6
inputdubit := ( A c A x 2 (logical): next input processes
k25:3CD t Input,device[t'l; next
hA<o>V s t o p code) + input,&bit) wait
input,4,bit := (A + A x Z4 { l o g i c a l ) ; next
A<27:3CD t Input d e v i c e [ t ' l < l : b ; next
hA<O>V s t o p code) + inputY4,bit)
D (:= op = 5) + (0v.A t round(A / MCtl[sl)); divide
N ( : = op = 6 ) + (A + A X M[tl[sl {s.inteqer\): m u l t i p l l y , save right
M ( : = op = 7) + (A t A x M[tl[sl {s.fraction)); multipl,u, save l e f t
P ( : = op = l o 8 ) + (
i4>+ (Output,device[t'l<l:6> tA<D:5>): print 6 b i t
i a>+ (Out p u t,dev i ce[ t ' ]<I:6> c A<D : i>OlOO) ) : print 4 b i t
E ( : = op = 118) + (A + A A M[tl[s]); extract
u ( : = op = 12) + (C t tos); unconditional transfer
T ( : = op = 13) + (i<D+ ((A<CU V TC) + (C c t 0 s ) ) ; trans.fer control
Ti<@ + (A<@ -f (C t tOs))); conditional transfer
H ( : = op = 14) + (M[tl[Sl + A ) ; hold and store
C (:= op = 1 5 ) + (MCtICsl + A ; next A to); clear
A (:= op = 16) + (OvoA + A + M [ t ] [ s l ) ; add
5 ( : = op = 17) + (OvoA + A - M[tl[51) subtract
) end Innstruction,execution
Chapter 17
The basic IBM 650 is a magnetic drum (10,0, 0)2decimal computer Input-output instructions
with one-plus-one address instruction logic. It has a storage of 1000 70 RD (Read). This operation code causes the machine to
or 2000 10-digit words (plus sign) with addresses 0000-0999 or read cards by a two-step process. First, the contents of the 10
0000-1999. More extended versions of the equipment have built-in words of read buffer storage are automatically transferred to one
floating point arithmetic and index accumulators, but the basic of the 20 (or 40) possible 10-word groups of read general storage.
machine will be described here. There are three arithmetic regis- The group selected is determined by the D address of the Read
ters in addition to the standard program register and program instruction. Secondly, a card is moved under the reading brushes,
counter. All information from the drum to the arithmetic unit and the information read is entered into buffer storage for the next
passes through a signed 10-digit distributor. A twenty-digit ac- Read instruction.
cumulator is divided into a lower and upper part, each of 10 digits 71 PCH (Punch). This operation code causes card punch-
with sign. Each of these is addressable (distributor 8001, lower ing in two steps. First the contents of one of the 20 (or 40) possible
accumulator 8002, and upper accumulator 8003). Each accumula- 10-word groups of punch storage are transferred to punch buffer
tor may be cleared to zero separately (in IBM 650 terminology, storage. The group selected is specified by the D address of the
“reset”). The entire 20-digit register can be considered as a unit, Punch instruction. Secondly, the card is punched with the infor-
or each part separately (but affecting the other in case of carries). mation from buffer storage.
The 10-digit instruction is broken down into the following form: 69 LD (Load Distributor). This operation code causes the
contents of the D address location of the instruction to be placed
10 9 8 7 6 5 4 2 3 1 0 in the distributor.
Data Next Instruction 24 STD (StoreDistributor). This operation code causes the
OP. Sign
Code Address Address contents of the distributor with the distributor sign to be stored
in the location specified by the D address of the instruction. The
One particular instruction, Table Look-Up, allows automatic table contents of the distributor remain undisturbed.
search for one particular element in a table, which can be stored
Addition and subtraction instructions
with a corresponding functional value. Input-output is via 80-digit
numerical punched cards. An “alphabetic device” allows limited I O AU (Add to Upper). This operation code causes the
alphabetical entry on cards. Only certain 10-word groups on the contents of the D address location to be added to the contents
magnetic drum are available for input and output. The following of the upper half of the accumulator. The lower half of the ac-
information is taken from an IBM 650 manual [Type 650, Magnetic cumulator will remain unaffected unless the addition causes the
Drum Data-Processing Machine Manual of Operations]. Much of sign of the accumulator to change, in which case the contents of
the input-output is handled via board wiring, which is not de- the lower half of the accumulator will be complemented. Also,
scribed in detail below. The two-digit pair represents the machine the units position of the upper half of the accumulator will be
code. The BRD (Branch on Digit) operation is used with special reduced by one.
board wiring to tell when certain specific card punches exist. 15 AL (Add to Lower). This operation code causes the
contents of the D address location to be added to the contents
iIn E. M. Grabhe, S. Ramo, and D. E. Wooldridge (eds.), “Handbook of
of the lower half of the accumulator. The contents of the upper
Automation, Computation, and Control,” vol. 2, chap. 2, pp. 93-98,
John Wiley & Sons, Inc., New York, 1959. half of the accumulator could be affected by carries.
Carr’s triplet notation for: fractional significant digits, digits in exponent, 11 SU (Subtract from Upper). This operation code causes
and digits to left of radix point. the contents of the D address location to be subtracted from the
220
Chapter 17 1 IBM 650 instruction logic 221
contents of the upper half of the accumulator. The contents of causes positions 8-5 of the distributor to be replaced by the con-
the lower half of the accumulator will remain unaffected unless tents of the corresponding positions of the lower half of the ac-
the subtraction causes a change of sign in the accumulator, in cumulator. The modified word in the distributor with the sign of
which case the contents of the lower half of the accumulator will the distributor is then stored in the location specified by the
be complemented. Also, the units position of the upper half of D address of the instruction.
the accumulator will be reduced by one. 23 STIA (StoreLower InstructionAddress). This operation
16 SL (Subtract from Lower). This operation code causes code causes positions 4-1 of the distributor to be replaced by the
the contents of the D address location to be subtracted from the contents of the corresponding positions of the lower half of the
contents of the lower half of the accumulator. The contents of accumulator. The modified word in the distributor with the sign
the upper half of the accumulator could be affected by carries. of the distributor is then stored in the location specified by the
60 RAU (Reset and Add into Upper). This operation code D address of the instruction. The contents of the lower half of
resets the entire accumulator to plus zero and adds the contents the accumulator remain unchanged, and the sign of the accumu-
of the D address location into the upper half of the accumulator. lator is not transferred to the distributor. The modified word re-
65 RAL (Reset and Add into Lower). This operation code mains in the distributor upon completion of the operation.
resets the entire accumulator to plus zero and adds the contents
of the D address location into the lower half of the accumulator.
61 RSU (Reset and Subtract into Upper). This operation Absolute value instructions
code resets the entire accumulator to plus zero and subtracts the 17 AABL (Add Absolute to Lower). This operation code
contents of the D address location into the upper half of the causes the contents of the D address location to be added to the
accumulator. contents of the lower half of the accumulator as a positive factor
66 RSL (Reset and Subtract into Lower). This operation regardless of the actual sign. When the operation is completed,
code resets the entire accumulator to plus zero and subtracts the the distributor will contain the D address factor with its actual
contents of the D address location into the lower half of the sign.
accumulator. 67 RAABL (Reset and Add Absolute into Lower). This
operation code resets the entire accumulator to zeros and adds
the contents of the D address location into the lower half of the
Accumulator store instructions accumulator as a positive factor regardless of its actual sign. When
20 STL (Store Lower in Memory). This operation code the operation is completed, the distributor will contain the D ad-
causes the contents of the lower half of the accumulator with the dress factor with its actual sign.
accumulator sign to be stored in the location specified by the D ad- 18 SABL (Subtract Absolute from Lower). This operation
dress of the instruction. The contents of the lower half of the code causes the contents of the D address location to be subtracted
accumulator remain undisturbed. from the contents of the lower half of the accumulator as a positive
It is important to remember that the D address for all store factor regardless of the actual sign. Wnen the operation is com-
instructions must be 0000-1999. An 8000 series D address will not pleted, the distributor will contain the D address factor with its
be accepted as valid by the machine on any of the store instruc- actual sign.
tions. 68 RSABL (Resetand Subtract Absolute into Lower). This
21 STU (Store Upper in Memory). This operation code operation code resets the entire accumulator to plus zero and
causes the contents of the upper half of the accumulator with the subtracts the contents of the D address location into the lower
accumulator sign to be stored in the location specified by the half of the accumulator as a positive factor, regardless of the actual
D address of the instruction. If STU is performed after a division sign. When the operation is completed, the distributor will contain
operation, and before another division, multiplication, or reset the D address factor with its actual sign.
operation takes place, the contents of the upper accumulator will
be stored with the sign of the remainder from the divide operation
(Op-Code 14). The contents of the upper half of the accumulator Multiplication and division
remain undisturbed. 19 MULT (Multiply). This operation code causes the ma-
22 STDA (StoreLower Data Address). This operation code chine to multiply. A 10-digit multiplicand may be multiplied by
222 Part 3 1 The instruction-set processor level: variations in the processor Section 2 1 Processors constrained by a cyclic, primary memory
a 10-digit multiplier to develop a 20-digit product. The multiplier causes the overflow circuit to be examined to see whether it has
must be placed in the upper accumulator prior to multiplication. been set. If the overflow circuit is set, the location of the next
The location of the multiplicand is specified by the D address of instruction to be executed is specified by the D address. If the
the instruction. The product is developed in the accumulator overflow circuit is not set, the location of the next instruction to
beginning in the low-order position of the lower half of the ac- be executed is specified by the I address.
cumulator and extending to the left into the upper half of the 90-99 BRD 1-10 (Branch on 8 in Distributor Position
accumulator as required. 1-10). This operation code examines a particular digit position
14 DIV (Divide). This operation code causes the machine in the distributor for the presence of an 8 or 9. Codes 91-99 test
to divide without resetting the remainder. A 20-digit dividend may positions 1-9, respectively, of the test word; code 90 tests position
be divided by a 10-digit divisor to produce a 10-digit quotient. 10. If an 8 is present, the location of the next instruction to be
In order to remain within these limits, the absolute value of the executed is specified by the D address, If a 9 is present, the location
divisor must be greuter than the absolute value of that portion of of the next instruction to be executed is specified by the I address.
the dividend that is in the upper half of the accumulator. The The presence of other than an 8 or 9 will stop the machine.
entire dividend is placed in the 20-position accumulator. The
location of the divisor is specified by the D address of the divide
instruction. Shift instructions
64 DIV RU (Divide and Reset Upper). This operation 30 SRT (Shift Right). This operation code causes the con-
code causes the machine to divide as explained under operation tents of the entire accumulator to be shifted right the number of
code 14 (DIV). However, the upper half of the accumulator con- places specified by the units digit of the D address of the shift
taining the remainder with its sign is reset to zeros. instruction. A maximum shift of nine positions is possible. A data
address with units digit of zero will result in no shift. All numbers
shifted off the right end of the accumulator are lost.
Branching instructions (decision operations) 31 SRD (Shift Round). This operation causes the contents
44 BRNZU (Branch on Non-Zero in Upper). This opera- of the entire accumulator to be shifted right the number of places
tion code causes the contents of the upper half of the accumulator specified by the units digit of the D address of the instruction.
to be examined for zero. If the contents of the upper half of the A 5 is added ( - 5 if the accumulator is negative) in the twenty-first
accumulator is nonzero, the location of the next instruction to be (blind) position of the amount in the accumulator. A data address
executed is specified by the D address. If the contents of the upper units digit of zero will shift 10 places right with rounding.
half of the accumulator is zero, the location of the next instruction 35 SLT (Shift Left). This operation code causes the con-
to be executed is specified by the I address. The sign of the ac- tents of the entire accumulator to be shifted left the number of
cumulator is ignored. places specified by the units digit of the D address of the instruc-
45 BRNZ (Branch on Non-Zero). This operation code tion. A maximum shift of nine positions is possible. A data address
causes the contents of the entire accumulator to be examined for with a units digit of zero will result in no shift. All numbers shifted
zero. If the contents of the accumulator is nonzero, the location off the left end of the accumulator are lost. However, the overflow
of the next instruction to be executed is specified by the D address. circuit will not be turned on.
If the contents of the accumulator is zero, the location of the next 36 SCT (ShiftLeft and Count). This operation code causes
instruction to be executed is specified by the I address. The sign (1)the contents of the entire accumulator to be shifted to the left
of the accumulator is ignored. until a nonzero digit is in the most significant place, (2) a count
46 BRMIN (Branch on Minus). This operation code causes of the number of places shifted to be inserted in the two low-order
the sign of the accumulator to be examined for minus. If the sign positions of the accumulator. This instruction is to aid fixed-point
of the accumulator is minus, the location of the next instruction scaling.
to be executed is specified by the D address. If the sign of the
accumulator is positive, the location of the next instruction to be
executed is specified by the I address. The contents of the accu- Table look-up instructions
mulator are ignored. 84 TLU (Table Look-up). This operation code performs an
47 BROV (Branch on Overflow). This operation code automatic table look-up using the D address as the location of
Chapter 17 I IBM 650 instruction logic 223
the first table argument and the I address as the address of the refers to the location specified by the instruction address of the
next instruction to be executed. The argument for which a search No-Op instruction.
is to be made must be in the distributor. The address of the table 01 Stop. This operation code causes the program to stop
argument equal to, or higher than (if no equal exists) the argument provided the programmed switch on the control console is in the
given is placed in positions 8-5 of the lower accumulator. The stop position. When the programmed switch is in the run position
search argument remains, unaltered, in the distributor. the 01 code will be ignored and treated in the same manner as
00 (NO-Op).
224
Chapter 18
f l i e d length d o t o
vt,drum, XR,dish,mognetiC tope
IBM 1401 ISP. The ISP of the H-200 is more complex and increases
performance by organizing Mp by both characters and words.
The IBM 1401, 1440, and 1460 are the only IBM computers Plugboard ond Punched- cpc 607 604 603 608,6101
tlrr! ge?ero?Io"
tube/st,
. , ,
t i +
t
,
tt
, . 1 ,
tions and data are stored in variable-length character strings; these
strings are addressed by a pointer register to the string. The ad-
dress integer is fixed at three characters. The encoding process C('Honeyse1i H-ZOO, d a t a W , C h o r H r l n g , 2 p s / c h o r , 1 4 0 1 COmDotible)
for addresses is given in Appendix 1 of this chapter. The 3-char- C l ' 1 4 1 0 ; 1 0 z l 8 0 h C h a r , 4 , 5 p r / c h o r , MPO 115 x 5 e h o r l ; 1 4 0 1 ComDotlble)
Cl'7OlO; 4O*UlOO I c h o r , 1 2 p i l c h o r , doto W,Choi.rtring, 1401,1410 compohblc)
acter address (3 x 6 bits) is assigned as 3 x 4 bcd characters for C('l403, 4-16 k c h o r ; l l . S p i s / c h o r , 8 blehor.2 a d d r e r r , M p s b 8 char);
storogem rtoroge mrfrocrms), c I ' t 4 6 0 , 6 p 0 / c h ~ r l ; C I ' 1 4 4 0 ; l 1 . t p $ / c h m )
encoding addresses 0:999; 2 x 2 bits for selecting 16 x 1,000 Cl'7070; 6ps/r,5%10 h w , i l O , 1 s i g n l d l w , 5 b l d ; I (Iddre4S1,nstruction;Mpr 1 9 9 ' X R l I
addresses; and 2 bits for selecting one of the three index registers. C1'7074; 6 p s / w ; 5 % 30 hw), C l 7 0 7 2 , 4 p i / w , 5 % 30 hw)
225
226 Part 3 I The instruction-set processor level: variations in the processor Section 3 I Processors for variable-length-string data
instructions are necessary for subroutines-the Store Address Regis- ISP structure
ter Feature; Indexing Feature; Multiply-Divide Feature; High- The IBM 1401 ISP is given in Appendix 1 of this chapter. Instruc-
Low-Equal Compare Feature; Read Release and Punch Release tion strings and data strings are delimited by the special F bit
Feature; the Column Binary Feature; Early-Card-Read Feature;
in a character. A character in Mp is of the form1
Processing Overlap Feature, etc.
C(check,F,B',A', 8, 4, 2, 1)
An n-character string is C[O], C[1], . . . C[n - 11
PMS structure and would be stored in Mp[j:j n - 11 +
The 1401 PMS structure (Fig. 2) is an early 1 Pc structure. The The first character (or head) of an instruction must contain the
diagram does not show the S(fixed) Pc interconnection structure word-mark flag or F bit. The head .of the instruction, which is to
with the Ms and T. The Pc-(MslT) interconnection restricts the be interpreted next, is held at Mp[IJ, and. succeeding characters
concurrency of T and Ms. The optional processing overlap feature of the instruction are at Mp[I + +
11, Mp[I 21, etc. Correctly
provides a link to Mp to allow the T(card; read, punch) to be run defined instructions are 1, 2, 4, 5, 7, and 8 characters long. Un-
concurrently with Pc processing. When any of the peripheral defined instruction lengths of up to 8 characters are also inter-
devices are operating without the processing overlap feature, the preted without an error condition. The interpretation algorithm
Pc is dedicated to be a data transmission link or K (as in earlier presented in the ISP description does not explain the action of
computers). The device K is connected directly to Pc. For example, instructions which have an incorrect length. Actually, the 1401
Ms(disk, magnetic tape) data transfers use the main registers of Reference Manual does not go into details of general instruction
the Pc and can tie it up full time during data transmission. By interpretation but dwells on "correct" operation. Table 1 presents
careful programming, several devices can be synchronized and the correct instruction lengths and formats. If we take the instruc-
thus run concurrently for communicating with Pc from a K. The tions in the table, the set is not variable in length but is fixed at
Pc does not have an interrupt system. Thus the peripherals have these six sizes. The instruction set (not including the input/output
no way of communicating with Pc. Subsequent models, the 1440 instructions) is presented in Table 2. This table also provides a
and 1460, added interrupt capability and made it easier to control hint of the implementation, since the execution times are given
multiple simultaneous data transfers among the peripheral K's in terms of memory cycles.
and Pc. The ISP state, unlike that of more conventional processors, has
no temporary operand storage (e.g., accumulators). The ISP state
has registers which point to operands. The state of the machine
(see Appendix 1)is basically: Mp, the Instruction Location Counter,
Indicators or miscellaneous bits, three 3-character blocks of Mp
T.consol*
reserved for Index registers, and the two registers A-address and
Mp2- P!T
' (1
' 402; card; reader,punch)- B-address which point to data operands.
I
T ( ' 1 4 0 3 1'1404; line; printer)+
Instruction interpretation
T('1407 Console I n q u i r y S t a t i o n ; t y p e w r i t e r ) -
T (paner tape; reader)+ There are three principal state types in processing an instruction:
M s ( # l :6; m a g n e t i c tape)- o.q., when the instruction is being formed; o.v., when the operands
Ms('l405; disk)
are being accessed or the results are being stored in Mp; and 0,
character strings, { charstring}, and the state diagram accounts for address registers point to the next lowest variable-length string in
strings on a character-at-a-time basis. For an add instruction M after an operation is performed. We allow the definition of a
Fig. 3 oversimplifies the execution because it implies that each variable-string operation, for example, + { charstring}, to imply
character of the A and B operand is accessed, the addition is per- the action on the processor state.
formed, and the result is restored according to the B-address Some instructions can be defined with a single character, and
register. A more complex description must account for A and B these are called chained instructions. Chained instructions take
strings of unequal length, and the case of getting a number which the previous values of the pointer registers, the A and B address
must be recomplemented because it is the wrong sign. The re- registers, as the operand addresses. The add instruction, for exam-
complementation process requires a reverse scan to find the end ple, can be either 1 (chained), 4, or 7 characters; the forms of all
of the B string and then a forward scan to recomplement each instructions appear in Table 1. The 4-character add instruction
character of B. Figure 4 is a detailed state diagram of the add places the A address field in both the A and B address registers;
execution process. thus the effect is an instruction to double a string (add it to itself).
The states in the ISP description (Appendix 1) within the in-
struction-interpretation process correspond to the three state types
just described: the single-instruction character-fetch operation, the Data
fetch-operand-addresses for the remainder of the instruction, and An n-decimal-digit numeric data string is represented as
Instruction-execution. Instruction-execution is not given in any
C[n - 11, C[n - 21,. . . , C[l], C[0], C[M]
-
detail. For example, the execution of add is defined as “A”(:=
op = 110001) + OvOM[B] c M[B] + M[A] {charstring};. The The underlined characters, C[n - 11 and C[M],
- have the flag bit
state diagram (Fig. 4) presents this execution in detail. Note that present, that is, (C[n - 1](F) = 1) and (C[M](F) = 1). The n
in the ISP description we omit telling the reader that the A and B characters are stored in locations Mp[ jl, Mp[ j +
11, . . . , Mp[j +
228 Part 3 I The instruction-set processor level: variations in the processor Section 3 I Processors for variable-length-string data
1
B string hor terminated
INHIBIT MIlVE
ADDER
STORAGE LOGIC
ADDRESS ~~~~~ OP
1
MODIFIER REG DECODE
f- 1
I A A AUX-
ADDRESS ADDRESS ADDRESS
f f f t
1401 P R O C E S S O V E R L A P
11
L
OP
MODIFIER DECODE
I . I
I 4 4 4- I
I A A - AUX B B - AUX 0
*
ADDRESS ADDRESS ADDRESS ADMIESS ADDRESS ADDRESS
f t 1 f 4- -
Fig. 5. IBM 1401 system data flow (registers structure). (Courtesy of International Business Machines Corporation.)
c
transfer level primitives of the complete computer together with The implementation is straightforward, and the instruction
several options. The options, of course, increase the complexity times (Table 2) show the implementation at the register-transfer
(and concurrency). Without the overlap feature, for example, level. For example, as an instruction is being read by Pc, prior
all data are accessed in Mp via Pc's address registers. to instruction execution, each new character is taken in and ex-
There are register pairs consisting of a 3-character memory amined for the instruction-terminating flag bit. When the flag bit
address (access) register, and a 1-character data register. The is present, the instruction is complete and ready to be executed.
memory-address, memory-data register pairs are A-address, The character of the next instruction is not saved but is picked
A-data; B-address, B-data; 1-address, Operation/Op; Overlap- up again after the previous instruction has been executed.
,address, Overlap,data/O.
Chapter 18 I The IBM 1401 231
Appendix 1
t B M 1401 ISP D e s c r i p t i o n
I nd i ca t o r (0 IO00 I ] := Unequa1,compare
I n d i c a t o r [011001] := O v e r f l o w
Mp S t a t e
I n s t r u c t i o n Format
op6,Bt,A',8,4,2.1> i n s t r u c t i o n r e g i s t e r s p e c i , f y i v q t h e operation
dLhar4, B',A',8,4,2,1> a d d i t i o n a l character u s i ~ dt n eome i n s t r u c t i o n s
d-char-present i n d i c a t e s a d j h a r i s used i n t h e c u r r e n t i n s t r u c t i o n
active i n d i c a t e s an i n s t r u c t i o r i s t r i n p is s t i l l being .fetched
A-address-present i n d i c a t e s t h e r e i s an A address n a r t of an i n s t r u c t i o n
B a d d r e s sup r e s e n t i n d i c a t e s there i s a R (Iddress p a r t of an i n s t r u c t i o n
Vove, load, and s t o r e i n s t r u c t i o n t y p e s c o n t r o l t h e i n i t i a l i z a t i o n of A and E.
mdve or load o r s t o r e A or B/mls := ((move characters and e d i t = opl v (load characters t o A word mark = o p l v (move characters
t o A o r B idords mark = op) v (move characters and suppress zeros = o p ) V (move numerical = O D ) V (move zone = opl V ( s t o r e A
address r e g i s t e r = opl v ( s t o r e R address r e g i s t e r = o p l l
l n s t r u c t i o n I n t e r n r e t a t i o n Process
Run + (op c M [ I ] : I I I + I; next , f e t c h operation
Fe t ch-ope rand,a dd r e s ses ; n e x t f e t c h addresses ,for A and R
I n s t r u c t i OnLexecut i o n ) execute
Address Calculation Process
The 1401 c a l c u l a t e s e x p l i c i t e f f e c t i v e addresses b y f i r s t s e t t i n g up t h e A, and R address r i i g i s t e r s . Operands are not f e t c h e d
i n I n s t r u c t i o n ~ x e r u t i o n . There are 1 , 2 , 4 , 5 , 7 and 8 character i n s t r u c t i o n s which have t h e o p and t h e ,following operands
( r e s p e c t i v e l y ) : no char, d char, the I or A address, t h e I o r A address and d char, t h e A and B address, and t h e I G P A address
and E address and d char. The folloiuinp process d e f i n e s t h e operation ,for c o r r e c t lenpth i r s t m c t i o n s .
Fetch-operand-addresses := (
d ~ h a r - p r e s e n t+ 0 :
-. Bdddress,present
B,address,present
100 address
--f :
+ I -A):
c l e a r storage
c l e a r storage ant'bnanch
character s t r i n g , { c h . s I , a r i t h m e t i c :
"A" (:= op = 110001) + (Ov,M[B] +M[B] + M[A] (ch.sl) add
".j" ( : = op = 010010) + (Ov,M[B] cM[Bl - MIA] {ch.s)) subtract
,1111. (:= op = 101010) + (M[B] e 0 - M [A] (ch.s)) zero and s u b t r a c t
"?" ( : = op = ll1010) + ( M [ B ] e0 + M[A] (ch.5)) z e r o and add
I, I, ( : = op = 0 0 1 1 0 0 ) + (0v,M[E] -M[B] x M[A] {ch.~]); m u l t i p l y ; f u l l l e n g t h p,roduct in U [B], s p e c i a l harduare ODtion
"%" (:= op = OlIlOO) + ( O v , M [ E ] tM[B] / M[A] Ich.5)); d i v i d e ; a u o t i e n t and remainder both end UD i n M[Bl.
"#" (:= op = OOlOll) + (M[B] eM[B] + M[A] (3.chl; vac'ifi, address
B c B - 3: A +A - 3);
branches, h a l t , no-operation:
"N" ( : = op = 100101)+ ; no ooeration
*'." ( : = op = l l l O l l ) + (Run eo;
Laddressupresent + ; halt
A,address,present + I *A); h a l t and branch
,IB" ( : = op = 110010) + (
(l B,address,present d,char,present) + I c A; branch
(1 B,address,present A d,char,present) + ( branch i . f inc'icator on
I n d i c a t o r [f(d,char)] -(I +A);
I nd i c a t o r [f (d,char) 1 t- 0) ;
( B g d d r e s e p r e s e n t A d-char-present) + ( branch if char eaual
B c B - 1;
(M [E] = d,char) + I t A)!:
234 Part 3 1 The instruction-set processor level: variations in the processor Section 3 I Processors for variable-length-string data
I
; data:(scalar, r e c t a n g u l a r c o - o r d i n a t e v e c t o r , p o l a r co-
machines are cleverly designed and make efficient use of the
ordinate vector); fixed, floating; decimal; operations:(+,
hardware they possess. Eventually there may be more of these -, x, /, cos, s i n , t a n , s i n - ‘ , COS
-1
, tan-‘, s i n h , cosh,
computers than conventional stored program computers. The tanh, sinh-l, cosh-I, tanh-I, I n , log,,, abs, e , s q r t ,
reader should note that not all “electronic desk calculators” integer part,{rectangular c o - o r d i n a t e v e c t o r ) c { p o l a r co-
are computers; most are electronic versions of their mechanical ordinate vector), {polar co-ordinate v e c t o r ] c {rectangular
or electromechanical ancestors. co-ordinate vector))
-T.numer i c g r i n t e r +
-T. p I o t ter-,
-
c
-L.external device
The OLlVETTl UNDERWOOD PROGRAMMA 101 desk calculator
-
The Programma 101 (Chap. 19) is at the limit of what we call
a stored program computer. It has a sufficient instruction set
LT-M m a g n e t i c c a r d ; 2 programs;
6 b/program&tep
196 program&teps/prograrn;
3
to be classified as a computer, but the storage for temporary
data, constants, and programs is limited. The machine’s in-
’Pc
i.
:= Mp(read o n l y ; 512 w, 64 b / w )
!mi croprogramrned (H.p r o c e s s o r s t a t e (40 b)) 1
struction set is interesting because memory is not addressed ‘P.microprogrammed := P.rnicroprogrammed
I
explicitly. A jump, for example, is executed by scanning the Mp(contro1; r e a d only; 800 ns/w;
program for a particular marker which was named in the jump 64 w ; 29 b/w)
instruction. The Programma 101 uses an Mp.cyclic.
The program library for the Programma 101 is extensive and Fig. 1. Hewlett-Packard Model 9100A Computing Calculator PMS
provides an indication of its capability. diagram.
235
236 Part 3 I The instruction-set processor level: variations in the processor Section 4 I Desk calculator computers: keyboard processors with small memories
The implementation has approximately 36.2 kb of memory, words, a Henry Ford has yet to emerge from the computer field.
including the read-only and read-write parts. The design is (Our guess is that he may come from Japan.)
physically outstanding, and its use of microprogramming is Whereas many computers in this book are included because
superb. The reader should note there are two levels of M(read they are typical of points in the computer space, the HP 9100A
only). We could draw the PMS structure of Pc as a P.micro- is included because it is innovative. It is worthy of note that
programmed within a P.microprogrammed. HP rightfully re- only one of the engineers had some computer design experi-
gards the two ISP's (29-bit and 64-bit word) a.s proprietary and ence; Cochran, who did the programming, had prior experience
carefully avoids discussing these points in the article (Chap. 20). with circuitry and instrumentation. Had he been a programmer
It might be noted that an IBM System/360 Model 30 requires by training, a larger Mp might have been required. By way of
about 2.9 milliseconds for a floating-point square root, whereas comparative evidence, the IBM 1800 floating-point arithmetic
the HP 9100A requires 19 milliseconds. By way of evidence of functions + , - , X , / , sin, cos, tan-', fl,log, exponential,
its outstanding packaging, its cost is about five-eighths that tanh, binary to decimal, and decimal to binary require approxi-
of a PDP-8/1 for about the same amount of physical hardware. mately 1,425 16-bit words, or 23 kb. On the other hand, the
The cost difference, though trulydifficult to compare, is partially FOCAL1 interactive calculator program for a 4,096-word PDP-8
the result of a design from an instrument maker (Hewlett- (49 kb) provides the user with all but polar-rectangular coordi-
Packard) versus a design from a computer manufacturer (DEC). nates and hyperbolic functions, but it does have a complete
The TV-like construction of the HP 9100A is an important les- program editing capability, text handling, control structure, and
son that computer manufacturers have not learned. In other 1,600-character Mp.
The Programma 101 is manufactured by the Olivetti Underwood arithmetic unit where to find the information and what operation
Corporation. The cost of Programma 101 is about $3,500 (in 1968). to perform.
Several thousand are currently in use. Unlike conventional The PMS diagram shown below is, of course, very simple. It
stored program computers it has instructions which can be exe- conforms closely to the classic diagram of what a digital computer
cuted directly as commands from a keyboard or instructions which looks like:
can be stored in a program and interpreted by the processor. The Mp-Pc T-M.magnetic-card-
processor uses the decimal representation for mixed numbers. The
decimal point location is controlled manually. Although informa-
TT
tion is stored in character strings, the maximum length is 22 digits
or 24 instructions for a register. A program can be up to 120
I LT.printer+
LT.keyboard +
characters long and is stored as a continuous string. The internal
encoding of a character is 8 bits. There are no absolute addresses
for instructions, and jump instructions are programmed by placing Primary memory and processor memory
labels or references in the string to transfer to. The Programma 101
The memory has 10 registers; eight are for general storage and
is composed of the following elements.
two are used exclusively for instructions. A character can have
several meanings, depending on the register and its use.
Memory. The memory stores nnmeric data and program instruc-
The two instruction registers, 1 and 2, each store 24 instruc-
tions.
tions. An instruction is one character long.
The eight storage registers, M, A, R, B, C , D, E, and F, have
Keyboard. The keyboard has four functions: It is used for operator
a capacity of 22 decimal digits, plus decimal point and sign. The
control of the calculator (power on, off, etc); in manual mode the
sign and decimal point do not require character space. Alterna-
instructions are executed immediately as in a conventional desk
tively, D, E, and F hold 24 instructions. M, A, and R are operating
calculator (e.g., add); the keys write a program’s instructions in
registers and take part in all arithmetic operations. They are
the memory, and the instructions are executed when the program
considered t o be the arithmetic unit.
is run; and numeric data may be entered to a running program.
The M register is the Median (or distributive) register. All
keyboard figure entries are held in the M register and distributed
Printing unit. Serial printing is from right to left, at 30 characters
to the other registers as instructed.
per second; this unit prints all keyboard entries, programmed
The A register functions with the arithmetic unit to form the
output, and instructions.
Accumulator. Arithmetic results are developed and retained in the
A register. A result of u p to 23 digits can be produced in the A
Magnetic-card reader/recorder. This device permits instructions
register.
and constants for a program to be stored and retrieved from
The R register retains the complete results in addition and
magnetic cards.
subtraction, the complete product in multiplication, the remainder
in division, and a remainder in square root. B, C , D, E, and F
Control and arithmetic units. The control unit is the administrative
are storage registers. Each can be split into two registers, each
section of the computer. It receives the incoming information,
with a capacity of 11 digits, plus decimal point and sign. When
determines the computation to be performed, and directs the
storage registers are split, the right portion of the split register
lThe description is partially taken from the Programma 101 Programming retains its original designation, and the left side is identified with
Manual. the corresponding lowercase letter. Thus these registers become
237
238 Part 3 1 The instruction-set processor level: variations in the processor Section 4 1 Desk calculator computers: keyboard processors with small memories
Structure
The calculator parts are described briefly below. The parts corre-
spond to both the numbers (Fig. 2) and the lettered keyboard (Fig.
3 ) .The following parts are, in effect, the console. Some of the keys
are used for control of the calculator, and some can be used either
as programmed instructions or as commands which are executed
directly. The following section discusses their instruction function.
The on-off key (1). This is a dual-purpose switch for both the
on and off positions. (Note: The O F F position automatically clears
all stored data and instructions.)
The error (red) light (2). This lights when the computer is turned
on and whenever the computer detects an operational error, e.g.,
exceeding capacity, division by zero.
The general reset key ( 3 ) .This key erases all data and instruc-
tions from the computer and turns off the error light.
The correct-performance (green) light (4). This light indicates
the computer is functioning properly. A steady light indicates that
the computer is ready for an operator decision; a flickering light
indicates that the computer is executing programmed instructions
and that the keyboard is locked.
The decimal wheel (5).This determines the number of decimal
places (0, 1,., . , 15) to which computations will be carried out
in the A register and the decimal places in the printed output,
~ i 1.~programma
. 101 functional block diagram. (Courtesy ,f oli- except for results from the R register. u p to 22 decimal digits may
vetti Underwood Corporation.) be developed in, and printed from, the R register.
Chapter 19 I The OLlVETTl Programma 101 desk calculator 239
record program switch is on, the keys specify the instruction t o register while retaining the entire contents in A. The original
b e recorded in the program memory. Finally, the descriptions contents of the M register are destroyed. The R register is not
specify the instruction's behavior as it is executed within a pro- affected by this instruction. (M t fraction,part(A))
gram.
Start S. The instruction S (used in creating a program) directs Arithmetic operations
the computer to stop and release the keyboard for the entry of
figures or the selection of a subroutine. After figure entry, the All arithmetic operations are performed in the operating registers
program is restarted by touching the start key (S). M, A, and R. An arithmetic operation is performed in two phases:
The program can also be restarted by touching a routine selec-
1 The contents of the selected register are automatically
tion key. When the S instruction stops the program, the computer
transferred to the M register. The M register is selected
may also be operated in the manual mode without disturbing the automatically if no other register is indicated.
program instructions in the memory. Any figures entered on the
keyboard before depression of start or an operation key will be 2 The operation is carried out in the M, A, and R registers.
printed automatically.
Clear *. The clear operation ' directs the computer to clear Programma 101 can perform these arithmetic operations: +,
the selected register. The M and R registers cannot he cleared -, X , i,fl,and absolute value. Figures are accepted and
with this instruction. computed algebraically. A negative value is entered by depressing
When the computer is operated manually this key will cause the negative key at any time during the entry of a figure. If there
it to print the contents of the selected register, r. (r t o ) is no negative indication, the computer will accept the figure as
positive.
Data-transfer operations The subtract operation key is separate from the numeric key-
To A J. An instruction containing the operation J, directs the board and is used exclusively for subtraction (not negation).
computer to transfer contents of the addressed register, r, to A Addition + . An instruction containing the operation + directs
while retaining them in the original register. The contents of M the computer to add the contents of the selected register (addend)
and R are not affected. The previous contents of A are destroyed. to the contents of the A register (augend). Addition is executed
(A + r ) in two phases:
From M t. An instruction containing the operation t directs
the computer to transfer the contents of M to the addressed regis- 1 Transfer the contents of the selected register (addend)
ter while retaining them in M. The contents of registers A and to M.
R are unaffected by this instruction. The original contents of the 2 Add the contents of M to the contents of A (augend) ob-
addressed register are destroyed. (r t M) taining in A the sum truncated according to the setting of
Exchange $. An instruction containing the operation $ directs the decimal wheel. The complete sum is in R. M contains
the computer to exchange the contents of the A register with the the addend. (M t r; next R t A + M; next A t f(R,deci-
contents of the addressed register. The contents of M are not mal-wheel))
affected except by the exchange between A and M. The contents
of the R register are not affected. (A t r ; r + A ) Multiplication x . An instruction containing the operation x
D-R exchange RS. The instruction RS directs the computer to directs the computer to multiply the contents of the selected
exchange the contents of D (both D and d registers) with the register (multiplicand) by the contents of the A register (multi-
contents of the R register. (D t R; R t D) plier).
This instruction has a special use in multicard programs to store
temporarily the contents of the D (d,D) register in R, when a new 1 Transfer the contents of the addressed register to M.
card has to be read to continue the program. During this tem- 2 Multiply the contents of M by the contents of A, obtaining
porary storage no instruction affecting the R register should be in A the product truncated according to the setting of the
executed. decimal wheel. The complete product is in R. M contains
Decimal part to M /$. The instruction /t directs the computer the multiplicand. (M t r; next R t A x M; next A t f(R,
to transfer the decimal portion of the contents of A to the M decimal-wheel))
Chapter 19 I The OLlVETTl Programma 101 desk calculator 241
Subtraction - . An instruction containing the operation - The jump process consists of two related instructions or char-
directs the computer to subtract the contents of the selected acters:
register (subtrahend) from the contents of the A register (minuend).
1 The reference point or label, 1, is where the program begins
1 Transfer the contents of the selected register (subtrahend) or where the jump is to start. The sequence is restarted at
to M . this point. This label has no effect when interpreted.
2 Subtract the contents of M from the contents of A (minu- 2 The jump instruction specifies the label for the instruction
end), obtaining in A the difference truncated according to sequence.
the setting of the decimal wheel. The complete difference is
in R. M contains the subtrahend. ( M t r; next R t A - M;
There are two types of jump instructions: unconditional jumps
next A t f(R,decimal,wheel))
and conditional jumps.
2 Divide the contents of M into the contents of A, obtaining (AV,V),(AW,W), (AY,Y),( AZ,Z), (BV,CV), . . . ,
in A the quotient truncated according to the setting of the (BZ,CZ), (EV,DV), . . . , (EZ,DZ), (FV,RV), . . . , (FZ,RZ)
decimal wheel. The decimally correct fractional remainder
is in R. M contains the divisor. (M c r; next A t A - M; All programs must begin with reference parts of an uncondi-
R c A mod M) tional jump instruction. Reference points AV, AW, AY, AZ are
used so that these program sequences can be started by touching
Syuare Root <.An instruction containing the operation \r the routine selection keys V, W, Y, or Z.
directs the computer to:
Conditional Jumps. If the contents of the A register are:
1 Transfer the contents of the selected register to M
Greater than zero: the program jumps to the corresponding
2 Extract the square root of the contents of M, as an absolute
reference point (label).
value, obtaining in A the result truncated according to the
setting of the decimal wheel. The R register contains Zero or less: the program continues with the next in-
a nonfunctional remainder. At the end of the operation, struction in sequence.
M contains double the square root. (M c r ; next
M,R t sqrt(abs(M)) x 2; next A c f(M/2, decimal-wheel)) The labels or reference points for conditional jumps, L, and
the corresponding conditional jump instruction, cj, are given as
Absolute Value A I . The absolute-value instruction A t changes (L,cj). The permissible jump labels and jump instructions are
the contents of the A register, if negative, to positive. (A t abs(A)
(aV,/V), . . . , (az,/z), (bV,cV), . . . ,
Jump operations (bZ,cZ), (eV,dV), . . . , (eZ,dZ), (f V,rV),
The jump operation directs the computer to depart from the . . . , (fZ,rZ)
normal sequence of step-by-step instructions and jump to a pre-
selected point in the program. Constants as instructions A/?. A one-digit constant can be gener-
These instructions provide both internal and external (manual) ated by a special instruction. The results of the instruction place
decision capability and are useful to create “loops” that allow the digit in M. The digit value of the constant must follow A/T.
repetitive sequences in a program to b e executed; routines or
subroutines to be performed at the discretion of the operator; Instructions and data i n the same register. An instruction can be
and automatically to “branch” to alternate routines or subroutines considered to be data and, therefore, used as both a constant and
according to the value in the A register. an instruction. Another technique allows the computer to interpret
242 Part 3 1 The instruction-set processor level: variations in the processor Section 4 1 Desk calculator computers: keyboard processors with small memories
data as null instructions so that both data (for reading and writing) comments
and instructions can be stored in the same register. program start, label
stop, enter n from keyboard into M
ExawLpZes. A program to take values for the numbers A, B, C, and D t n; D holds n! or n x (n - 1) x
D from the keyboard and then print the value of the expression Atn;Aholdsn,n-1,n-1, . . . ,1
[(A + B) x C]/D would be written as follows: label
generate 1 in M
instruction comments A c A - 1; ( n t n - 1)
+AV label to allow the program to be started by key, V test if n 2 0
S wait; enter A from keyboard into M print result
J or JM1 A value goes to A register get next n from keyboard
S wait, enter B from keyboard begin to update n!, label
+M a register contains A + B A holds n!; D holds n - 1 after execution
S wait, enter C from keyboard A holds n x (n - 1) x
XM a register X C or (A +
B) x C D holds n!; A holds n - 1 after execution
S wait, enter D from keyboard return to compute n - 2
t M a register has expression
A0 print A register Conclusion
-V jump back t o beginning label to recalculate ex-
Many algorithms have been written for Programma 101, being
pression for new variables
coded in impressively small space. The techniques have sometimes
1M is implied if left blank.
been borrowed from conventional computer programming. For
example, multiple card programs operate by using chains in the
The following program computes and prints n!. n is entered same way as large FORTRAN programs. The significant fact to
from the keyboard, where n 2 1, and an integer. The program is the reader is that the Programma 101 calculator is a nicely de-
started by pressing key Z. signed stored program computer.
Chapter 20
A new electronic calculator with computerlike capabilities operations on two numbers, one in X and one in Y, appear
__ in the
Many of the day-to-day computing problems faced by scientists Y register. The Z register is a particularly convenient register to
and engineers require complex calculations but involve only a use for temporary storage.
moderate amount of data. Therefore, a machine that is more than Numbers
a calculator in capability but less than a computer in cost has a
great deal to offer. At the same time it must be easy to operate One of the most important features of the Model 9100A is the
and program so that a minimum amount of effort is required in tremendous range of numbers it can handle without special atten-
the solution of typical problems. Reasonable speed is necessary tion by the operator. It is not necessary to worry about where
so that the response to individual operations seems nearly instan- to place the decimal point to obtain the desired accuracy or to
taneous. avoid register overflow. This flexibility is obtained because all
The HP Model 9100A Calculator, Fig. 1, was developed to fill numbers are stored in ‘floating point’ and all operations performed
this gap between desk calculators and computers. Easy interaction using ‘floating point arithmetic.’ A floating point number is ex-
between the machine and user was one of the most important pressed with the decimal point following the first digit and an
design considerations during its development and was the prime exponent representing the number of places the decimal point
guide in making many design decisions. should be moved-to the right if the exponent is positive, or to
the left if the exponent is negative.
CRT display
One of the first and most basic problems to be resolved concerned
the type of output to be used. Most people want a printed record,
but printers are generally slow and noisy. Whatever method is
used, if only one register is displayed, it is difficult to follow what
is happening during a sequence of calculations where numbers are
moved from one register to another. It was therefore decided that
a cathode-ray tube displaying the contents of three registers would
provide the greatest flexibility and would allow the user to follow
problem solutions easily. The ideal situation is to have both a CRT
showing more than one register, and a printer which can be at-
tached as an accessory.
Figure 2 is a typical display showing three numbers. The X
register displays numbers as they are entered from the keyboard
one digit at a time and is called the keyboard register. The Y
register is called the accumulator since the results of arithmetic
‘This chapter is a compilation of three articles [Monnier, 1968; Osborne, Fig. 1. This new HP Model 9100A calculator is self-contained and is
1968; Cochran, 19681, reprinted from Hewlett-Puckurd Journul, vol. 20, capable of performing functions previously possible only with larger
no. 1, pp. 3-9, 10-13, 14-16, September, 1968. computers.
243
244 Part 3 I The instruction-set processor level: variations in the processor Section 4 1 Desk calculator computers: keyboard processors with small memories
explained and key codes are listed. Some simple examples are
provided to assist those using the machine for the first time or
to refresh the memory of an infrequent user. Most questions re-
garding the operation of the Model 9100A are answered on the
card.
Data entry
The calculator keyboard is shown in Fig. 4. Numbers can be
entered into the X register using the digit keys, the v key or the
ENTER EXP key. The ENTER EXP key allows powers of 10 to
be entered directly which is useful for very large or very small
numbers. 6.02 x loz3is entered @ @ @ @ 0.
If the
ENTER EXP key is the first key of a number entry, a 1 is auto-
Fig. 2. Display in fixed point with the decimal wheel set at 5. The Y
register has reverted to floating point because the number is too large
to be properly displayed unless the digits called for by the DECIMAL-
DIGITS setting are reduced.
them. Special keys located in a block to the left of the digit keys converting from polar to rectangular coordinates, 6' is placed in
are used to identify the lettered registers. Y, and R in X, @ is pressed and the display shows y in Y and
To store a number from the X register the key @ is used. The x in X.
parenthesis indicates that another key depression, representing the ACC+ and ACC- allow addition or subtraction of vector
storage register, is necessary to complete the transfer. For example, components in the f and e storage registers. ACC+ adds the
storing a number from the X register into register 8 requires two contents of the X and Y register to the numbers already stored
key depressions: @ @ . The X register remains unchanged. To in f and e respectively; ACC- subtracts them. The RCL key
store a number from Y register the key @ is used. recalls the numbers in the f and e registers to X and Y.
The contents of the alpha registers are recalled to X simply Illegal operations
by pressing the keys a, b, c, d, e, and f. Recalling a number from A light to the left of the CRT indicates that an illegal operation
a numbered register requires the use of the @ key to distinguish has been performed. This can happen either from the keyboard
the recall procedure from digit entry. This key interchanges the or when running a program. Pressing any key on the keyboard
number in the Y register with the number in the register indicated will reset the light. When running a program, execution will
by the following keystroke, alpha or numeric, and is also useful continue but the light will remain on as the program is completed.
in programs since neither number involved in the transfer is lost. The illegal operations are:
The CLEAR key sets the X, Y, and Z display registers and the Division by zero
f and e registers to zero. The remaining registers are not affected. fi where x 0 <
The f and e registers are set to zero to initialize them for use with In x where x 5 0; log n where x 5 0
the 0and @ keys as will be explained. In addition the CLEAR s i x 1 x where 1x1 >
1; c0s-I x where (. X I >1
key clears the FLAG and the ARC and HYPER conditions, which cosh-' x where x <
1; tanh-' x where 1x1 >1
often makes it a very useful first step in a program.
Accuracy
Coordinate transformation and complex numbers The Model 9100A does all calculations using floating point arith-
Vectors and complex numbers are easily handled using the keys metic with a twelve digit mantissa and a two digit exponent. The
in the column on the far left of the keyboard. Figure 5 defines two least significant digits are not displayed and are called guard
the variables involved. Angles can be either in degrees or radians. digits.
To convert from rectangular to polar coordinates, with y in Y and The algorithms used to perform the operations and generate
x in X, press @. Then the display shows 0 in Y and R in X. In the functions were chosen to minimize error and to provide an
extended range of the argument. Usually any inaccuracy will be
contained within the two guard digits. In certain cases some in-
accuracy will appear in the displayed number. One example is
Y
where the functions change rapidly for small changes in the argu-
ment, as in tan x where x is near 90". A glaring but insignificant
inaccuracy occurs when an answer is known to be a whole number,
but the least significant guard digit is one count low:
2.000 000 000 N 1.999 999 999.
Accuracy is discussed fnrther in the 'Internal Programming'
y = R sin 0 section in this chapter. But a simple summary is: the answer result-
ing from any operation or function will lie within the range of
true values produced by a variation of i l count in the tenth digit
of the argument.
Programming
Fig. 5. Variables involved in conversions between rectangular and polar Problems that require many keyboard operations are more easily
coordinates. solved with a program. This is particularly true when the same
Chapter 20 I The HP Model 9100A computing calculator 247
operations must be performed repeatedly or an iterative technique display mode or from a program as a program step. The flag is
must be used. A program library supplied with the Model 9100A set to a ‘no’ condition by either asking IF FLAG in a program
provides a set of representative programs from many different or by a CLEAR instruction from the keyboard or from a program.
fields. If a program cannot be found in the library to solve a
particular problem, a new program can easily be written since Data input and output
no special experience or prior knowledge of a programming lan- Data can be entered for use in a program when the machine is
guage is necessary. in the display mode. (The screen is blank while a program is
Any key on the keyboard can be remembered by the calculator
running.) A program can be stopped in several ways. The @ key
as a program step except STEP PRGM. This key is used to ‘debug’
a program rather than as an operation in a program. Many indi- will halt the machine at any time. The operation being performed
vidual program steps, such as ‘sin x’ or ‘to polar’ are comparatively will be completed before returning to the display mode. As a
powerful, and avoid the need of sub-routines for these functions program step, STOP stops the program so that answers can be
and the programming space such sub-routines require. Registers displayed or new data entered. END must be the last step in a
program listing to signal the magnetic card reader; when encoun-
0, 1, 2, 3, 4, 5, 6, 7 , 8, 9, a, b, c, d can store 14 program steps
tered as a program step it stops the machine and also sets the
each. Steps within the registers are numbered 0 through d just
program counter to 0-0.
as the registers themselves are numbered. Programs can start at
As a program step, PAUSE causes a brief display during pro-
any of the 196 possible addresses. However 0-0 is usually used for
gram execution. Nine cycles of the power line frequency are
the first step. Address d-d is then the last available, after which
counted-the duration of the pause will be about 150 ms for a 60
the program counter cycles back to 0-0.
Registers f and e are normally used for storage of constants only, Hz power line or 180 ms for a 50 Hz power line. More pauses
can be used in sequence if a longer display is desired. While a
one constant in each register. As more constant storage is required,
it is recommended that registers d, then c, then b, etc., are used program is running the PAUSE key can be held down to stop the
machine when it comes to the next PAUSE in the program. PAUSE
starting from the bottom of the list. Lettered registers are used
provides a particularly useful way for the user and the machine
first, for the frequently recalled constants, because constants stored
in them are more easily recalled. A register can be used to store to interact. It might, for instance, be used in a program so that
the convergence to a desired result can be observed.
one constant or 14 program steps, but not both.
Other means of input and output involve peripheral devices
Branching such as an X-Y Plotter or a Printer. The PRINT key activates the
The bank on the far right of the keyboard, Fig. 4,contains program printer, causing it to print information from the display register.
As a program step, PRINT will interrupt the program long enough
oriented keys. @ is used to set the program counter. The two
for the data to be accepted by the printer and then the program
sets of parentheses indicate that this key should be followed by will continue. If no printer is attached, PRINT as a program step
two more key depressions indicating the address of the program will act as a STOP. The FMT key, followed by any other keystroke,
step desired. As a program step, ‘GO TO’ is an unconditional provides up to 62 unique commands to peripheral equipment. This
branch instruction, which causes the program to branch to the flexibility allows the Model 9100A to be used as a controller in
address given by the next two program steps. The ‘IF’ keys in this small systems.
group are conditional branch instructions. With @ @ , and@
the numbers contained in the X and Y registers are compared. Sample program-N!
The indicated condition is tested and, if met, the next two program A simple program to calculate N! demonstrates how the Model
steps are executed. If the first is alphameric, the second must be 9100A is programmed. Figure 6 (top) shows a flow chart to com-
also, and the two steps are interpreted as a branching address. pute N! and Fig. 6 (bottom) shows the program steps. With this
When the condition is not met, the next two steps are skipped program, 60! takes less than ‘/z second to compute.
and the program continues. @ is also a very useful conditional
branching instruction which tests a ‘yes’ or ‘no’ condition inter- Program entry and execution
nally stored in the calculator. This condition is set to ‘yes’ with After a program is written it can be entered into the Model 9100A
the SET FLAG from the keyboard when the calculator is in the from the keyboard. The program counter is set to the address of
248 Part 3 1 The instruction-setprocessor level: variations in the processor Section 4 1 Desk calculator computers: keyboard processors with small memories
I Store N
n
p
2 i
Fig. 7. Program step address and code are displayed in the X register
as steps are entered. After a program has been entered, each step can
be checked using the STEP PRGM key. In this display, step 2-d is 36,
the code for multiply.
Specifications of HP Model 9100A* Coordinate transformation: polar-to- repetitive use. Two programs of 196
rectangular, rectangular-to-polar, steps each may be recorded on each
The HP Model 9100A is a programmable,
cumulative addition and subtraction reusable card. Cards may be cascaded
electronic calculator which performs opera-
of vectors. for longer programs.
tions commonly encountered in scientific
Miscellaneous: other single-key opera-
and engineering problems. Its log, trig and Speed
tions include-taking the absolute
mathematicalfunctionsareeach performed Average times for total performance of
value of a number, extracting the
with a single key stroke, providing fast, typical operations, including decimal-
integer part of a number, and enter-
convenient solutions to intricate equa- point placement:
ing the value of ?r. Keys are also
tions. Computer-like memory enables the add, subtract: 2 milliseconds
available for positioning and storage
calculator to store instructions and con- multiply: 12 milliseconds
operations.
stants for repetitive or iterative solutions. divide: 18 milliseconds
Programming
The easily-readable cathode ray tube in- square-root: 19 milliseconds
The program mode allows entry of
stantly displays entries, answers and inter- sin, cos, tan: 280 milliseconds
program instructions, via the keyboard,
mediate results. In x: 50 milliseconds
into program memory. Programming
eX: 110 milliseconds
Operations consists of pressing keys in the proper
These times include core access of
Direct keyboard operations include: sequence, and any key on the keyboard
1.6 microseconds.
Arithmetic: addition, subtraction, mul- is available as a program step. Program
tiplication, division and square-root. capacity is 196 steps. No language or General
Logarithmic: log x, In x and eX. code-conversions are required. A self- Weight: Net 40 Ibs, (18,l kg.); shipping
Trigonometric: sin x, cos x, tan x, contained magnetic card reader/re- 65 Ibs. (29,5 kg.).
sin-lx, cos-’x and tan-lx (x in de- corder records programs from program Power: 115or230V k 10%,50to60Hz,
grees or radians). memory onto wallet-size magnetic 400 Hz, 70 watts.
Hyperbolic : sinh x, cosh x, tanh x, cards for storage. It also reads programs Dimensions: 8%“ high, 16” wide, 19”
sinh-lx, cosh-lx, and tanh-lx. from cards into program memory for deep.
caF:iity
PROGRAM
Activate
(Read only)
I
I 825 ns
CLOCK
coRNoTioL
Activate
(Read only)
I I
Activate
.
(Read - Write)
1 1 COINCIDENT
CURRENT
CORE
MEMORY
I
High Order
Memory
CONTROL
WORD 64 WORD
512 WORD 368 WORDS
29 B i r i w
Description 64 BITIW
800 ns
6 BIT/W
m m
c c
.e .e
‘f
I
ADDRESS
FLIP FLOPS
J
C
0
2
C
m
Y
0
2
Y
m
-c -c
\
CONTROL
LOGIC
ADDRESS
FLIP FLOP
1161FLIPD FLOPSA T 1-1 ~ ~ADDRESS
FLIP FLOPS
1 LowOrder
Memory
No Memory
Fig. 10. Arithmetic processing unit block diagram. This system is a marriage of conventional, reliable diode-resistor logic to a 32,000-bit read-only
memory and a coincident current core memory.
operations are synchronized by the clock shown at the top center contains information for implementing all of the keyboard opera-
of Fig. 10. tions while the latter stores user data and user programs.
The clock is connected to the control read only memory (ROM) All internal operations are performed in a digit by digit serial
which coordinates the operation of the program read only memory basis using binary coded decimal digits. An addition, for example,
and the coincident current core read/write memory. The former requires that the least significant digits of the addend and augend
be extracted from core, then added and their sum replaced in core.
This process is repeated one BCD digit at a time until the most
significant digits have been processed, There is also a substantial
amount of ‘housekeeping’ to be performed such as aligning decimal
points, assigning the proper algebraic sign, and floating point
normalization. Although the implementation of a keyboard func-
tion may involve thousands of clock cycles, the total elapsed time
is in the millisecond region because each clock cycle is only 825
ns long.
The program ROM contains 512 64-bit words. When the pro-
gram ROM is activated, signals (micro-instructions) corresponding
t o the bit pattern in the word are sent to the hard wired logic
gates shown at the bottom of Fig. 10. The logic gates define the
changes to occur in the flip flops at the end of a clock cycle. Some
of the micro-instructions act upon the data flip flops while others
Fig. 11. Arithmetic unit assembly removed from the calculator. change the address registers associated with the program ROM,
Chapter 2 0 1 The HP Model 9100A computing calculator 2 5 1
control ROM and coincident current core memory. During the All the displayed characters are 'pieces of eight.' Sixteen differ-
next clock cycle the control ROM may ask for a new set of micro- ent symbols are obtained by intensity modulating a figure 8 pattern
instructions from the program ROM or ask to be read from or as shown in Fig. 12. Floating point numbers are partitioned into
written into the coincident current core memory. The control groups of three digits and the numeral 1 is shifted to improve
ROM also has the ability to modify its own address register and readability. Zeros to the left of the most significant digit and
to issue micro-instructions to the hard wired logic gates. This insignificant zeros to the right of the decimal point are blanked
flexibility allows the control logic ROM to execute special pro- to avoid a confusing display. Fixed point numbers are automati-
grams such as the subroutine for unpacking the stored constants cally rounded up according to the decimal wheel setting. A fixed
required by the keyboard transcendental functions. point display will automatically revert to floating point notation
if the number is too large to be displayed on the CRT in fixed
Control logic
point.
The control logic uses a wire braid toroidal core read only memory
containing64 29-bit words. Magnetic logic of this type is extremely Multilayer instruction logic board
reliable and pleasingly compact. All of the hard wired logic gates are synthesized on the instruction
The crystal controlled clock source initiates a current pulse logic board using time-proven diode-resistor logic. The diodes and
having a trapezoidal waveform which is directed through one of resistors are located in separate rows, Fig. 13. All diodes are
64 word lines. Bit patterns are generated by passing or threading oriented in the same direction and all resistors are the same value.
selected toroids with the word lines. Each toroid that is threaded The maze of interconnections normally associated with the back
acts as a transformer to turn on a transistor connected to the plane wiring of a computer are located on the six internal layers
output winding of the toroid. The signals from these transistors of the multilayer instruction logic board. Solder bridges and acci-
operate the program ROM, coincident current core, and selected dental shorts caused by test probes shorting to leads beneath
micro-instructions. components are all but eliminated by not having interconnections
on the two outside surfaces of this multilayer board. The instruc-
Coincident current core read/write memory
tion logic board also serves as a motherboard for the control logic
The 2208 (6 x 16 x 23) bit coincident current memory uses wide board, the two coincident core boards and the two flip flop boards,
temperature range lithium cores. In addition, the X, Y, and inhibit the magnetic card reader, and the keyboard. It also contains a
drivers have temperature compensated current drive sources to connector, available at the rear of the calculator, for connecting
make the core memory insensitive to temperature and power peripherals.
supply variations.
The arithmetic processing unit includes special circuitry to Flip flops
guarantee that information is not lost from the core memory when The Model 9100A contains 40 identical J-K flip flops, each having
power is turned off and on. a threshold noise immunity of 2.5 volts. Worst case design tech-
niques guarantee that the flip flops will operate at 3 MHz even
Power supplies
though 1.2 MHz is the maximum operating rate.
The arithmetic processing unit operates from a single -15 volt
supply. Even though the power supply is highly regulated, all
circuits are designed to operate over a voltage range of -13.5
to -16.5.
Display I I/ \I
The display is generated on an HP electrostatic cathode ray tube
only 11 inches long. The flat rectangular face plate measures
3y4 x 4l3/,, inches. The tube was specifically designed to gener-
ate a bright image. High contrast is obtained by using a low
transmissivity filter in front of the CRT. Ambient light that usually
tends to 'wash out' an image is attenuated twice by the filter, while Fig. 12. Displayed characters are generated by modulating these figures.
the screen image is only attenuated once. The digit 1 is shifted to the center of the pattern.
Fig. 13. Printed-circuit boards which make up the arithmetic unit are, left to right at top, side board, control logic, flip flop, core and drivers, core
sense amplifiers and inhibit, flip flop, and side board. Large board at the lower left is the multilayer instruction board, and the program ROM is at
the right. The magnetic card reader and its associated circuitry are at the bottom. ,k
14 I2 :I I .
Chapter 20 1 The HP Model 9100A computing calculator 253
Program read only memory multiplexed, three-bit codes and recorded on three of the four
The 32,768 bit read only program memory consists of 512 64-bit tracks. The fourth track provides the timing strobe.
words. These words contain all of the operating subroutines, stored Information is read from the card and recombined into six bit
constants, character encoders, and CRT modulating patterns. The codes for entry into the core memory. The magnetic card reading
512 words are contained in a 16 layer printer-circuit board having circuitry recognizes the ‘END’ program code as a signal to end
drive and sense lines orthogonally located. A drive line consists the reading process. This feature makes it possible to enter sub-
of a reference line and a data line. Drive pulses are inductively routines within the body of a main program or to enter numeric
coupled from both the reference line and data line into the sense constants via the program card. The END code also sets the
lines. Signals from the data line either aid or cancel signals from program counter to location 0-0, the most probable starting loca-
the reference line producing either a 1 or 0 on the output sense tion. The latter feature makes the Model 9100A ideally suited to
lines. The drive and sense lines are arranged to achieve a bit ‘linking’ programs that require more than 196 steps.
density in the ROM data board of 1000 bits per square inch.
Packaging and servicing
The program ROM decoder/driver circuits are located directly
above the ROM data board. Thirty-two combination sense ampli- The packaging of the Model BlOOA began by giving the HP indus-
fier, gated-latch circuits are located on each side of the ROM data trial design group a volume estimate of the electronics package,
board. The outputs of these circuits control the hard wired logic the CRT display size and the number of keys on the keyboard.
gates on the instruction logic board. Several sketches were drawn and the best one was selected. The
electronics sections were then specifically designed to fit in this
Side boards case. Much time and effort were spent on the packaging of the
arithmetic processing unit. The photographs, Figs. 11 and 14,
The program ROM printed circuit board and the instruction logic
attest t o the fact that it was time well spent.
board are interconnected by the side boards, where preliminary
The case covers are die cast aluminum which offers durability,
signal processing occurs.
effective RFI shielding, excellent heat transfer characteristics, and
convenient mechanical mounts. Removing four screws allows the
The keyboard
case to be opened and locked into position, Fig. 14. This procedure
The keyboard contains 63 molded plastic keys. Their markings will exposes all important diagnostic test points and adjustments. The
not wear off because the lettering is imbedded into the key body keyboard and arithmetic processing unit may be freed by removing
using a double shot injection molding process. The key and switch four and seven screws respectively.
assembly was specifically designed to obtain a pleasing feel and Any component failures can be isolated by using a diagnostic
the proper amount of tactile and aural feedback. Each key operates routine or a special tester. The faulty assembly is then replaced
a single switch having gold alloy contacts. A contact closure acti- and is sent to a service center for computer assisted diagnosis and
vates a matrix which encodes signals on six data lines and generates repair.
an initiating signal. This signal is delayed to avoid the effects of
contact bounce. An electrical interlock prevents errors caused by Reliability
pressing more than one key at a time. Extensive precautions have been taken to insure maximum relia-
bility. Initially, wide electrical operating margins were obtained
Magnetic card reader by using ‘worst case’ design techniques. In production all transis-
Two complete 196 step programs can be recorded on the credit tors are aged at 80% of rated power for 96 hours and tested before
card size magnetic program card. The recording process erases being used in the Model Y100A. Subassemblies are computer tested
any previous information so that a card may be used over and and actual operating margins are monitored to detect trends that
over again. A program may be protected against accidental erasure could lead to failures. These data are analyzed and corrective
by clipping off the corner of the card, Fig. 9, page 249. The missing action is initiated to reverse the trend. In addition, each calculator
corner deactivates the recording circuitry in the magnetic card is operated in an environmental chamber at 55°C for 5 days prior
reader. Program cards are compatible among machines. to shipment to the customer. Precautions such as these allow
Information is recorded in four tracks with a bit density of 200 Hewlett-Packard to offer a one year warranty in a field where 90
bits per inch. Each six-bit program step is split into two time- days is an accepted standard.
254 Part 3 1 The instruction-set processor level: variations in the processor Section 4 1 Desk calculator computers: keyboard processors with small memories
Trigonometric operations
Sin x, cos x, tan x
Arcsin x, arccos x, arctan x
Sinh x, cosh x, tanh x
Arcsinh x, arccosh x, arctanh x
Polar to rectangular and rectangular to
polar coordinate transformation
Miscellaneous
Enter TI
Absolute value of y
Integer value of x
the ‘Polar to Rectangular’ function uses the sin routine which uses
multiply which uses add, etc.
0 ENTRY
Multiplication
Multiplication is successive addition of the multiplicand as deter-
mined by each multiplier digit. Offset in the digit position flip-flops
Functions is increased by one after completion of the additions by each
multiplier digit. Exponents are added after completion of the
product. Then the product is normalized to justify a carry digit
which might have occurred.
A REGISTER
Division
From EXP Division involves repeated subtraction of the divisor from the
From CLEAR
dividend until an overdraft occurs. At each subtraction without
overdraft, the quotient digit is incremented by one at the digit
position of iteration. When an overdraft occurs, the dividend is
CLEAR KEYBOARD restored by adding the divisor. The division digit position is then
REGISTER
incremented and the process continued. Exponents are subtracted
after the quotient is formed, and the quotient normalized.
5 2 i - 1 = n2
i=l
POINT SET ? In square root, the divisor digit is incremented at each iteration,
and shifted when an overdraft and restore occurs. This is a very
--
fast algorithm for square root and is equal in speed to division.
Circular routines
I 1 Yes I I
The circular routines (sin, cos, tan), the inverse circular routines
EXPONENT FROM EXPONENT (arcsin, arccos, arctan) and the polar to rectangular and rectangu-
lar to polar conversions are all accomplished by iterating through
a transformation which rotates the axes. Any angle may be repre-
y REA0 MOST
SIGNIFICANT
DIGIT LOCATION
1 I EXIT I sented as an angle between 0 and 1 radian plus additional infor-
mation such as the number of times m/2 has been added or sub-
tracted, and its sign. The basic algorithm for the forward circular
function operates on an angle whose absolute value is less than
1 radian, but prescaling is necessary to indicate quadrant.
To obtain the scaling constants, the argument is divided by 2m,
the integer part discarded and the remaining fraction of the circle
multiplied by 257. Then m / 2 is subtracted from the absolute value
MOST SIGNIFICANT until the angle is less than 1 radian. The number of times m/2
DIGIT LOCATION THIS LOCATION
is subtracted, the original sign of the argument, and the sign upon
completion of the last subtraction make up the scaling constants.
Fig. 15. Flow chart of a simple digit entry. Some of these flow paths To preserve the quadrant information the scaling constants are
are used by other calculator operations for greater hardware efficiency. stored in the core memory.
256 Part 3 1 The instruction-set processor level: variations in the processor Section 4 I Desk calculator computers: keyboard processors with small memories
The algorithm produces tan 0. Therefore, in the Model 9100A, The exponential answer is reciprocated in case the original
cos 8 is generated as argument was negative, and for use in the hyperbolic functions.
For these hyperbolic functions, the following identities are used:
1
diTGx e" - e-"
sinh x = ___2
and sin8 as
tan 8
vTFi&z
Sin0 could be obtained from the relationship sin8 =
d-, for example, but the use of the tangent relationship Natural logarithms
preserves the 12 digit accuracy for very small angles, even in the The exponential routine in reverse is used as the routine for natural
range of 0 <
10-12. The proper signs of the functions are assigned logs, with only the mantissa operated upon. Then the exponent
from the scaling constants. is multiplied by In 10 and added to the answer. This routine also
For the polar to rectangular functions, cos 0 and sin 0 are com- yields these loglo and are hyperbolic functions:
puted and multiplied by the radius vector to obtain the X and In x
Loglox = -
Y coordinates. In performing the rectangular to polar function, In 10
the signs of both the X and Y vectors are retained to place the
resulting angle in the right quadrant. cosh-l(x) = ln(x +d m )
Prescaling must also precede the inverse circular functions,
tanh-l(x) = l n p
since this routine operates on arguments less than or equal to 1. 1-x
The inverse circular algorithm yields arctangent functions, making
The sinh-l(x) relationship abdve yields reduced accuracy for
it necessary to use the trigonometric identity.
negative values of x. Therefore, in the Model YlOOA, the absolute
value of the argument is operated upon and the correct sign affixed
after completion.
If cos-l(x) is desired, the arcsin relationship is used and a scaling
constant adds m/2 after completion of the function. For arguments Accuracy
greater than 1, the arccotangent of the negative reciprocal is found It can be seen from the discussion of the algorithms that extreme
which yields the arctangent when m/2 is added. care has been taken to use routines that have accuracy commensu-
rate with the dynamic range of the calculator. For example; the
Exponential and logarithms square root has a maximum possible relative error of 1 part in
The exponential routine uses a compound iteration .algorithm lo1" over the full range of the machine.
which has an argument range of 0 to the natural log of 10 (In 10). There are many algorithms for determining the sine of an angle;
Therefore, to be able to handle any argument within the dynamic most of these have points of high error. The sine routine in the
range of the calculator, it is necessary to prescale the absolute Model 9100A has consistent low error regardless of quadrant.
value of the argument by dividing it by In 10 and saving the integer Marrying a full floating decimal calculator with unique mathe-
part to be used as the exponent of the final answer. The fractional matical algorithms results in accuracy of better than 10 displayed
part is multiplied by In 10 and the exponential found. This number digits.
is the mantissa, and with the previously saved integer part as a
power of 10 exponent, becomes the final answer.
Section 5
a First edition of manual, or a paper, or the appearance in Adoms Computing Characteristics Quarterly.
hStill evolving. B 8501 was discontinued in 1968.
.George, University of New South Wales, interpreter using Polish notation and a stack. Circa 1957 [Hamblin, 19621.
dProduced for command and control (military) applications.
* B 8500 IS a system name: the Pc is a B 8501.
‘Reported. Actual delivery unknown.
p Dual processor.
2 57
258 Part 3 I The instruction-set processor level: variations in the processor Section 5 1 Processors with stack memories (zero addresses per instruction)
I
when interpreting the compiled program. However, the lan-
K-T(#I :2; c a r d ; r e a d e r ) +
K-T(#1:2; paper t a p e ; reader)+
guage-based machines can still be studied profitably with the
K-T(card; punch)+ stack in mind.
K-T(#I:Z line; printer)+ The following comments will be directed to the P.stack com-
K-Ms(#I : 2 ; drum) puters manufactured by both English Electric and Burroughs.
K-Ms(#1:16; magnetic tape) - There are three basic P.stack computer families: B 5000 + B
5500 4 B 6500/B 7500; D825 + D830 + B 8500; and KDF9.
'Mp(core; 4 p s / w ; 4096 w; (48.3) b/w)
'Pc(stack; 12 b / s y l l a b l e ; 6 b/char; d a t a : si,sf,bv,w,char.
Each root member was made available at about the same time
string; (I - 2) s y l l a b l e / i n s t r u c t i o n ; Mps(- 4 w) a n t e - by Burroughs (Pasadena, Calif.), Burroughs (Paoli, Pa.), and
English Electric. The IBM Corporation later responded with a
cedents: 'ALGOL language; descendants; ' 6 5000, B 6500,
B 7500; t e c h n o l o g y : t r a n s i s t o r ; -41961 . 1963)) proposed Pc.stack, but the machine never entered the produc-
'S(from: 2 P c , ~ K; t o : 8 Mp; c o n c u r r e n c y : 4) tion phase.
4S(from: 4 K i o ; t o : KT,KMs; c o n c u r r e n c y : 4) The Pc.stack is a major alternative to the main line organi-
zation of 1 address per instruction (augmented with index reg-
Fig. 1. Burroughs B 5000 PMS diagram. isters or general registers). It tries to capitalize on the hierarchi-
cal character of computation to avoid having to give memory
shuffling instructions explicitly. In Chap. 3, page 64, we gave
a comparison of a trivial computation using a stack and a
general-register organization, in order to make clear the case
.P~(#A)~-T.console-
-Pc(#B)3-S,consoIe-
L ( ' R e a l Time D e v i c e ) -
K ( # l :4)-C4-S-K(#l :4)-S-K(#1
- SET
a.
M
1 K f o r 8 Ms(rnagnetic tape)
,forward
ory in the Pc, thus that it is dominated by the general-register S(2 K; IO Ms)-Ms(#0:9: magnetic tape)-
program execution times for the Pc.stack are indeed impressive. b. 2 K f o r IO Ms(magnetic tape)
However, no definitive analysis has been published, as far as
we know. Pcstack iscertainly an organization that rates serious
study by any computer designer. - L- K S(4 K; 16 Ms)-Ms(#0:15; magnetic tape)-
- L- K
The PMS structure of the examples -
- L- K
The PMS structure diagram of the B 5000 and B 6500/B 7500 c. 4 K f o r 16 Ms(magnetic tape)
(Figs 1 to 5) should be compared with Burroughs own structure
representation (Chap. 22, page 268). The D825 structure is
similar; it is given in Chap. 36, page 447. All the Burroughs 'L ( t o : K i o ( ' I nput/Output Mu1 t i p l e x o r ) )
Burroughs was probably the first computer company to take 3S(1K; 8 Ms; bus)
- L- K -T ( l i n e ; printer) --f
'Mp(core;
2S(16 Mp; 16(P,K);
6 ps/w; 4 - 32 kw; 48
concurrency:
b/w)
1)
n e g a t e , -abs)
instruction-size:
operation-code-size:
:
(I - 7) syllable;
5/12 s y l l a b l e ;
3Pc(stack; 8 b/syllable; 0 - 1 a d d r e s s / i n s t r u c t i o n ; 6 b/char; address-size:
o p e r a t i o n forms:
(7/12 + 0
(d3
- 6) s y l l a b l e ;
d l b d2, d2 t u d l ) ;
technology: transistor; d a t a : s y l l a b l e , c h a r , w , bv, s i , t
Fig. 6. English Electric KDF9 PMS diagram. Fig. 7. Burroughs D825 PMS diagram.
Section 5 I Processors with stack memories (zero addresses per instruction) 261
R . H . Allmark / 1. R . Lucking
Summary This paper describes the arithmetic unit of a computer whose on the Reverse Polish algebraic notation, and contains four groups
order code is based on the Reverse Polbh algebraic notation. The order of operations:
code has been realised by causing the arithmetic unit to operate on data
stored in the most accessible registers of a nesting store; these registers a Transfers between the arithmetic unit and the main store.
are of the transistor flip-flop type but are backed up by sixteen fast magnetic b Arithmetic, logical and manipulative functions on data in
core registers. The functions are performed as micro-programmes of trans- the nesting store.
fers between the registers in the arithmetic unit, and the necessary arrange-
ment of transfer paths, logical gates and arithmetic circuits is described. c Conditional and unconditional jump instructions used to
The number system is binary, using the two's-complement representation interrupt the normal sequencing of instructions.
of negative numbers. Automatic floating-point operations are included d Instructions for controlling the operation of the various
which use an autonomous unit to perform the shifts required. peripheral devices which may be attached to the machine.
262
Chapter 21 I Design of an arithmetic unit incorporating a nesting store 263
or 8 places, and a left shift of 8 places; the paths from the B to iii Transfer the complement of W2 to B2 (but setting the
the W registers provide the same shifts in the reverse direction. sign of B2 positive), transfer W3 directly to B1 (W3
The two sets of shift paths are used alternately, those from the has by now been filled with fresh data), switch the
W registers being used first; all shifts are terminated using a path adder’s output to W2, inserting a carry into the right-
into the W registers. Shifts of a large number of places are accom- most adder stage, and read from the nesting store.
iv Add.
plished by a series of shifts of eight places in the appropriate
2) Transfer the complement of W1 to B1 and N b to B2,
direction until the number of places remaining is less than eight;
switch the adder’s output to W1 and insert a carry
if necessary the number is then transferred back into the W regis- into the right-most adder stage if W2 is negative.
ters: the remaining shifts, or the whole shift if the number of places vi Add, simultaneously clearing the sign of W2.
is less than eight, is then completed by a transfer to the B registers
and back again using two appropriate paths. With the shifts avail-
b + F (i.e. add the two single-length floating numbers in W1
and W2).
able, extension of the B registers by two bits at the right-most
i Transfer the complement of W1 to B1, transfer W2
end enables any shift to be performed without loss of accuracy. to B2 and switch the adder’s output to register CD.
In double-length arithmetic shifts, the sign digit of the less sig- ii Store the characteristic of W1 in the eight-bit register
nificant word is by-passed. When a shift is to be performed, the C and add.
number of places and the type of shift are transferred into a semi- iii Clear the characteristic positions of W1, simultane-
autonomous unit, called the shift control, which is then supplied ously transferring CD into the shift number register
with a string of command pulses by the arithmetic unit control; in shift control. This latter operation is such that the
shift control then re-routes these pulses to perform the transfers shift register contains minus the difference in charac-
necessary to obtain the shift. teristics.
When performing floating-point addition and subtraction, shifts iv Clear the characteristic of W2, and if W1 is about
to be shifted, determined by the sign digit of CD,
are required to equalize the characteristics of the two numbers;
replace the contents of C by the characteristic of B2;
the amount of shift is calculated by a modified subtraction, oper-
thus C contains the larger Characteristic.
ating on the characteristic positions of the two numbers. After the 21 Supply control pulses to shift control and thus perform
addition, the shift required to restore the result to standard form the required right-shift of eight W1 or W2.
is determined by logical circuits which interpret the pattern of vi Having completed the shift, transfer W1, W2 and W3
bits in W1 into shift information. The number of shifts performed to B2, B1 and N b respectively, simultaneously switch-
during this standardising operation is made available to the arith- ing the adder’s output to W1, clearing the carry into
metic unit control for use in forming the correct characteristic the right-most adder stage and reading from the core-
of the result. nesting store.
The character conversion operations to, and from, binary are vii Add the fractional parts, simultaneously transferring
accomplished by shift control, using a method involving successive N b to W2.
viii Supply control pulses to shift control so as to cause
shifting of the character word, and adding or subtracting portions
it to enter the standardization procedure and perform
of the radix word.
the shifts required.
ir Store the complement of the number of left-shifts
performed in (viii) in the characteristic position of B2,
Examples of sequences transfer C to the characteristic position of B1, switch
the adder to W1.
To illustrate the working of the arithmetic unit, two sequences X Perform a special add operation which only affects
are described. the characteristic positions of W1.
Computing systems have conventionally been designed via the should be made for the generalized handling of indexing and
‘hardware’ route. Subsequent to design, these systems have been subroutines; a full complement of logical, relational and control
handed over to programming systems people for the development operators should be provided to enable efficient translation of
of a programming package to facilitate the use of the hardware. higher-level source languages such as ALGOL and COBOL; pro-
In contrast to this, the B 5000 system was designed from the start gram syntax should permit an almost mechanical translation from
as a total hardware-software system. The assumption was made source languages into efficient machine code; facilities should be
that higher level programming languages, such as ALGOL, should provided to permit the system to largely control its own operation;
be used to the virtual exclusion of machine language programming, input-output operations should be divorced from processing and
and that the system should largely be used to control its own should be handled by an operating system; multi-programming and
operation. A hardware-free notation was utilized to design a proc- true parallel processing (requires multiple processors) should be
essor with the desired word and symbol manipulative capabilities. facilitated, and changes in system configuration (within certain
Subsequently this model was translated into hardware specifica- broad limitations) should not require reprogramming.
tions at which time cost constraints were considered.
System organization
Design objectives
The B 5000 system achieves its unique physical and operational
The fundamental design objective of the B 5000 system was the modularity through the use of electronic switches which function
reduction of total problem through-put time. A second major logically like telephone crossbar switches. Figure 1 depicts the
objective was facilitation of changes both in programs and system basic organization of the system as well as showing a maximum
configurations. Toward these objectives the following aspects of system.
the total computer utilization problem were considered:
Statement of problems in higher-level machine-independent
languages; efficiency of compilation of machine language; speed of Master control program
compilation of machine language; program debugging in higher- A master control program will be provided with the B 5000 system.
level languages; problem set-up and load time; efficiency of It will be stored on a portion of the magnetic drum. During normal
system operation; ease of maintaining and making changes in
operations, a small portion of the MCP will be contained in core
existing programs, and ease of reprogramming when changes are
memory. This portion will handle a large percentage of recurrent
made in a system configuration.
system operations. Other segments of the MCP will be called in
from the magnetic drum, from time to time, as they are required
Design criteria to handle less frequently-occurring events, or system situations.
Whenever the system is executing the master control program,
Early in the design phase of the B 5000 system the following
it is said to be in the Control State. All entries to the Control
principles were established and adopted:
State are made via ‘interrupts.’ A special operation is provided,
Program should be independent of its location and unmodified
which can only be executed when the system is in the Control
as stored at object time; data should be independent of its location;
State, to permit control to return to the object program it was
addressing of memory within a program should take advantage
executing at the time the ‘interrupt’ occurred.
of contextual addressing schemes to reduce redundancy; provisions
The following are a few typical occurrences which cause an
‘Datamation, vol. 7, no. 5, pp. 28-32, May, 1961. automatic ‘interrupt’ in the system: An input-output channel is
267
268 Part 3 [ The instruction-set processor level: variations in the processor Section 5 1 Processors with stack memories (zero addresses per instruction)
available, an input-output operation has been completed or an word mode and the character mode. For certain operations, a
indexing operation was attempted which violated the storage processor operating on words is most desirable and for other opera-
protection features built into the system. tions, a variable field length mode of operation is most desirable.
In addition to processing interrupt conditions, the master con- By combining both abilities in one processor, a processor can
trol program handles fundamental parts of the total system opera- operate in the mode most desirable for the operation at hand. In
tion such as the initiation of all input-output operations, tanking a B 5000 system, it is even possible for one processor to be operat-
of input-output areas when required, file control, allocation of ing in the word mode and the other in the character mode.
memory, scheduling of jobs (priority ratings, system requirements When operating in the word mode, a standard format for the
of each object program, and the present system configuration are data word is used as illustrated in Fig. 2.
considered), maintenance of an operations log and maintenance Note that the standard word is an octal floating point word.
of a system description. However, the mantissa is treated as an integer rather than as a
fraction (heretofore the reverse has been common practice). This
provides two benefits: first, an integer has the same internal repre-
Operating modes sentation as its unnormalized floating point correspondent; and,
The B 5000 can either operate with fixed-length words or with second, the range of numbers that can be expressed, rather than
variable-length fields. These two modes of operation are called the being from S+64to 8-63, is 8+76to S-51. The first feature eliminates
Chapter 22 1 Design of the B 5000 system 269
way around machine design, but they still must provide object the / operator is of higher precedence than the + operator. The
coding to accomplish the storage and recall functions. In brief, right-hand Polish notation used in the B 5000 is based on placing
conventionally designed computers, with or without automatic the operators to the right of their operands: A + B becomes AB+
programming aids, require the wasteful expenditure of program- in Polish notation. A + B + C can be written either as AB + C + ,
ming effort, memory capacity, and running time to overcome the or as ABC+ +. In the expression ABC+ +, the first + operator
limitations of their internal organization. says to add the operands B and C. The second + operator says
The problem is attacked directly in the B 5000 by incorporation to add A to the sum of B and C. Returning to the first examples
of a “pushdown” stack, which completely eliminates the need for above, A(B + C) can be written as BC + A X or ABC + x in Polish.
instructions (coded or compiled) to store or recall intermediate The second example is written as BC/A+ or ABC/+. The exten-
results. sion of Polish notation to handle equations is shown in the follow-
In a B 5000 processor, the stack is composed of a pair of regis- ing example:
ters, the A and B registers, and a memory area. As operands are Conventional notation Z=A(B-C)/(D+E)
picked up by the programs, they are placed in the A register. If Polish notation ABC - x DE + /Z=
the A register already contains a word of information, that word
is transferred to the B register prior to loading the operand into
The stack in use
the A register. If the B register is also occupied by information,
then the word in B is stored in a memory area defined by an To illustrate the functioning of the stack, two simple examples
address register S. Then the word in A can be transferred to B are shown in Figs. 4 and 5. In the examples, the letters P, Q and
and the operand brought into the A register. The new word coming R represent syllables in the program that cause the operands P,
into the stack has pushed down the information previously held Q, and R to be picked up and placed in the stack. The symbols
in the registers. As each pushdown occurs, the address in the S + and x represent syllables that cause the add and multiply
register is automatically increased by one. The information con- operations to occur. The two examples represent different ways
tained in the registers is the last information entered into the stack; of writing P(Q+R) in Polish notation. The first example in Fig.
the stack operates on a “last in-first out” principle. As information 4 does not require pushdowns or pushups. The second example,
is operated on in the stack, operands are eliminated from the stack shown in Fig. 5, requires a pushdown in the execution of the
and results of operations are returned to the stack. As information syllable R, and a pushup in the execution of the syllable x. The
in the stack is used up by operations being performed, it is possible columns in the table represent the contents of the various registers
to cause “pushups,” i.e., a word is brought from the memory area after execution of the syllable listed in the first column.
addressed by the S register, and the address in the S register is
decreased by one.
To eliminate unnecessary pushdowns and pushups, the A and Independence of addressing
B registers both have indicators used for remembering whether One of the goals set in the design of the B 5000 was to make the
the registers contain information or are empty. When an operand programs independent of the actual memory locations of both the
is to be placed in the stack and either of the registers is empty, program itself and the data, in order to provide really automatic
no pushdown into memory occurs. Also, when an operation leaves
one or both of the registers empty, no automatic pushup occurs.
Polish Notation QR + P x
Polish notation
Executed
The Polish logician, J. Lukasiewicz, developed a notation which
allows the writing of algebraic or logical expressions which do not
require grouping symbols and operator precedence conventions.
For example, parentheses are necessary as grouping symbols in
the expression A(B+ C ) to convey the desired interpretation of the
+
expression. In the expression A B/C, the normal interpretation ~ ~-
Polish Notation PQR + x from drum to core. If the core memory is expanded, less time will
be spent in such activity and the program or programs will be
Contents of
Syllable speeded up, and no reprogramming is required.
Executed Register A Register B Register S Cell 101
P P Empty 100 -
Program reference table
Q Q P 100 -
The means of achieving independence of addressing in the B 5000
Pushdown Empty Q 101 P
is called a Program Reference Table (PRT). The PRT is a 1,025
R word relocatable area in memory used primarily for storing con-
Execute R Q 101 P trol words that locate data areas or program segments. There are
also control words for describing input-output operations. These
control words, called descriptors, contain the base address and size
of data areas, program segments and input-output areas. A descrip-
X
100 tor specifying an input-output operation also contains the desig-
nation of the unit to be used and the type of operation to be
Fig. 5 performed. Operands may also be stored in the PRT, providing
direct access to single values such as indices, counts, control totals,
etc.
program segmentation. Through automatic program segmentation, In the word mode of the B 5000, every item of data is con-
it is possible to have program size practically independent of the sidered to be either a single value or an element of an array of
size of core memory. The systems analyst or programmer intending data. If it is a single value, it will be obtained directly by indexing
to do multi-processing is then no longer faced with the difficult a descriptor contained in the PRT.
task of planning what jobs are to be run together in order that Program segments are described by program descriptors. In
system storage capacities are not exceeded. addition to core base address, the program descriptor contains the
In achieving independence of addressing, a solution requiring location in drum storage of the program segment and an indication
large contiguous areas of memory was not deemed satisfactory. if the program segment is currently in core memory starting at
Each segment of the program and each data area should be com- the address specified in the descriptor. Entry to a program segment
pletely relocatable without modification to the program. It is then is made via its program descriptor contained in the PRT. If the
possible to load all the segments of a program or programs onto program segment is in core memory, entry will be made to the
the drum at load time and call in the segments to any available program segment. However, when entry is attempted to a program
space in core memory as needed during run time. If some segment segment whose descriptor indicates that the segment is not in core
of a program is overlaid by a subsequent segment of a program, memory, automatic entry to the Master Control Program will occur
the segment of the program destroyed in core memory is still and the desired segment will then be brought in from the drum.
available on the drum to be called in again if needed. Notice that in moving from one segment to another, it is not
Due to the very high program densities in the B 5000, the necessary to know whether the segment to be entered is currently
availability of high capacity drum storage on every system and in core memory. Branching within a program segment is self-
automatic segmentation, a minimum B 5000 system has the capa- relative, i.e., the distance to jump either forward or backward is
city for a program or programs equivalent to approximately 40,000 specified, not the address to be jumped to.
to 60,000 single address instructions. Of course, if an installation As a result of keeping all actual addresses of data and program
normally ran such large programs, the system would very likely in the PRT, the program itself does not contain any addresses,
not be a minimum system. However, the installation having an but only references to the PRT. To specify one of the 1,024 posi-
occasional need to run very large programs is not prevented from tions in the PRT requires only 10 bits which contributes greatly
doing so by storage capacity. to the high program density achieved in the B 5000. Since the
Processing speed now becomes a function of the size of core PRT is relocatable, references to the PRT contained in the pro-
memory. If large programs are run in a system with small core gram are to relative locations, thus completely freeing the program
memory, time will be consumed in recalling program segments from any dependence whatsoever on actual memory locations.
272 Part 3 I The instruction-set processor level: variations in the processor Section 5 1 Processors with stack memories (zero addresses per instruction)
Word mode program For (3),indexing of the descriptor by the item that is now the
second item in the stack occurs. For an 'Perand sY1lable, the
The word mode of the B 5000 processor has four types of syllables,
The syllable is distinguished by the two high-order bits of operand is obtained from the indexed address; for the descriptor
action is after the indexing.
each 12-bit syllable. The types of syllable and the identification
bits are: In the case of (4), subroutine entry occurs to the subroutine
addressed. A word of the three previous types may be left in the
registers upon return from the subroutine, in which instance the
00-Operator Syllable
actions described above will take place, depending upon the type
01-Literal Syllable
of syllable which initiated the subroutine.
10-Operand Call Syllable
Essentially, the four types of action that occur for an operand
11-Descriptor Call Syllable
call syllable are obtaining an operand directly, indirectly, from
an array, or by computation. Sometimes in the use of the call
The first of these, the operator syllable, causes operations to be
syllables, it is not known which type of action will occur for a
performed. The remaining ten bits of the operator syllable are the
particular syllable when the program is created. This is particu-
operation codes. There are approximately sixty different operations
larly true for call syllables in subroutines.
in the word mode. For those operations requiring an operand or
Programs in the word mode consist of strings of syllables which
operands, the processor checks for sufficient operands in the regis-
follow the rules of Polish notation. Variable length strings of call
ters; if they are not there, pushups from the stack in memory occur
syllables and literal syllables, which place items of information
automatically.
in the stack, are followed by operator syllables which perform their
The literal syllable is used for placing constants in the stack
operations on information in the stack.
to be used as operands. The ten bits of the literal syllable are
The indexing features of the B 5000 allow generalized indexing
transferred to the stack. This allows the program to contain inte-
gers less than 1,024 as constants. and at the same time provide complete storage protection. Data
areas and program segments of different programs may be inter-
The operand call syllable, and the descriptor call syllable ad-
mingled, but a program is prevented from storing outside of its
dress locations in the program reference table. The purpose of the
data areas. The method of indexing allows any of the 1,024 words
operand call syllable is to place an operand in the stack; the
of the program reference table to be considered index registers.
purpose of the descriptor call syllable is to place the address of
Multilevel indexing is provided, i.e., indices of arrays can them-
an operand, a descriptor, in the stack. There are four situations
selves be elements of arrays.
that arise, depending on the word read from the program reference
The subroutine control provided in the B 5000 allows nesting
table.
of subroutines-even recursive nesting (a subroutine is a subrou-
1 The word is an operand. tine of itself)-arbitrarily deep. Dynamic allocation of storage for
parameter lists and temporary working storage simplify the use
2 The word is a descriptor containing the address of the of subroutines. Storage is automatically allocated and deallocated
operand.
as required.
3 The word is a descriptor containing the base address of the
data area in which the operand resides.
Character mode program
4 The word is a program descriptor containing the base ad-
dress of a subroutine. In the character mode of the B 5000 Processor, there is only one
type of syllable, called the operator syllable. Program segments
For (l),the operand call syllable has completed its action by in the character mode are constructed of strings of these syllables.
placing an operand in the stack. The descriptor call syllable will The character mode is designed to provide editing, formatting,
cause the construction of a descriptor of the operand, replacing comparison, and other forms of data manipulation. In doing so,
the operand by the constructed descriptor. the processor uses two areas of memory-the source and desti-
For (2), the operand call syllable then reads the operand from nation areas. When a program switches from word mode to char-
the cell addressed. The descriptor call syllable has completed its acter mode, two descriptors containing the base addresses of these
action. areas are supplied. The source area or destination area may be
Chapter 22 1 Design of the B 5000 system 273
changed at any time during character mode so that the program Conclusion
may act on several areas. The Burroughs B 5000 system has been designed as an integrated
The character mode 'perator is into two hardware-software package which offers such benefits as savings
parts; the last part specifies the 'peration to be performed and in the memory space required to store equivalent object programs;
the first part 'pecifies the number Of times the 'peration is to be
multi-processing and parallel processing; and identical
performed. Operations are provided for the transferring, deletion, programs on systems with different size memories and different
comparison, and insertion of characters or bits. Also, there are
system configurations with no loss in individual system
operations which allow the repetition of syllable strings. This is
quite useful for complex table look-up operations and for editing References
information which contains repeated patterns. LoneW61; BartR61; BockR63; CarlC63; MaheR6l
Section 6
The processors in this section have features which allow multi- Two original features, one-level storage and extracodes, have
ple programs to exist in the primary memory at the same time. been copied in many other machines. A one-level store is com-
The programs can be executed alternately by a single processor m o n to most new computers which are time-shared or multi-
without having to wait for new programs to be input. The cost programmed; the scheme for memory paging in the SDS 940
is only that of changing the processor state, which involves only is essentially that of Atlas.
a few instructions at most (and only one instruction on some The extracodes feature allows ordinary machine operation
systems, such as the CDC 6600). Since programs are subject codes to be used to call subroutines. Commonly used complex
to numerous unpredictable delays within a single run for inter- instructions (such as sin, cos, and monitor calls) can be written
change with the external environment (either via Ms or T), in a common operating system accessible to all users. Initially
substantial increases in Pc utilization can be achieved by multi- these subroutines were stored in a read-only memory.
programming. If more than a single processor has access to The ISP is straightforward and extremely nice. The extra-
Mp, the system is called a multiprocessor system. code idea appears in the SDS 900 series and was used in the
Time-shared computers are generally multiprogrammed. SDS 940 system for defining common-user instructions. The
Alternatively, time-shared systems can be implemented by IBM Systeml360 SVC (supervisor call) instruction is an adapta-
swapping programs, one at a time, into primary memory for tion of the extracode.
interpretation. The Berkeley Time-sharing System (Chap. 24) Atlas was about the earliest computer to be designed with
uses both multiprogramming and program swapping. The a software operating system and the idea of user machine in
Burroughs B 5000 (Chap. 22) is an early computer to have mind. The operating system has been nicely described [Kilburn
multiprogram capability. The idea of multiprogramming is so et al., 19611 and evaluated [Morris et al., 19671.
fundamental that it should be among the first concepts to be In a letter to the authors of this book, F. H. Sumner makes
understood by the student of computing systems. A very nice the following comments on Atlas.
review of memory mapping and storage allocation is presented
in the paper Dynamic Storage Allocation Systems [Randell and
The initial ideas and the preliminary research on the Atlas computer
Kuehner, 19681.
system started in the Department of Computer Science of the Uni-
versity of Manchester in 1956. The team, under the direction of
Professor T. Kilburn, was later supplemented by several members
Atlas
of the I.C.T. Computer Research Department, and the prototype
The Atlas is one of the most important machines described in machine was working in the department by the Autumn of 1961.
this book. The prototype was originally designed and con- The first production model became operational in January 1963.
structed at Manchester University. The Atlas 1 and Atlas 2 were The significant features of the system can be summarised as:
produced by Ferranti Corp. (prior to becoming part of 1.C.T.l). 1 The provision of a virtual address field greater than the real
Atlas 1 is the most interesting; it incorporates most of the address space.
features of the Atlas prototype. The Lincoln Laboratory TX-2
2 The implementation of a "one-level" store using a mixture
[Clark, 19571 influenced some Atlas features: multiple index
of core store and drum store.
registers and interrupt processing of input/output devices.
Atlas' detailed internal structure is described in a paper [Sum- 3 The interrupt system and the method of peripheral control.
ner et al., 19621. 4 The realisation at the design stage that there would be a
complex operating system and the provision in the hardware
International Computers and Tabulators, U. K. of specific features to assist such an operating system.
274
Section 6 I Processors with multiprogramming ability 275
The method of peripheral control permitted the attachment of 930, together with the operating system software, were sold by
a large number of on-line peripherals with rapid response and entry Scientific Data Systems as the SDS 940. The operating system
into the operating system for a peripheral requiring attention. This, and hardware modifications for multiprogramming make the
together with the multiprogramming features, makes the design 940 one of the first commercially available combined hardware-
ideal for the attachment of keyboards for the provision of multi- software time-sharing computers.'
access operation. In the original design, provision for several such
The description in Chap. 24 is concerned with the machine
on-line typewriters was made, but at the production stage it was
as it appears to the user. That is, the hardware and the oper-
decided to remove these as an economy measure. In view of the
subsequent development of on-line operation, this was rather an ating system software are both presented in the context in
unfortunate decision. which they contribute to form a user machine.
The 940 uses a memory map which is almost a subset of
The Atlas computer at the University has now been in continuous
that of Atlas but is more modest than that of the IBM 360/67
operation for four years and it is expected to provide for the major
[Arden et al., 19661 and GE 645 [Dennis, 1965; Daley and
part of the University's computing needs until 1971.
Dennis, 19681. A number of instructions are apparently built
During the period of its operation the provision of extensive in via the programmed operator calling mechanism, based on
monitoring and logging information has permitted the behaviour of Atlas extracodes (Chap. 23). The software-defined instructions
the system to be studied in detail. The results of these studies have
emphasize the need for hardware features. For example, float-
been extremely valuable in the design of a successor to the Atlas.
ing-point arithmetic is needed when several computer-bound
programs are run. The SDS 945 is a successor to the 940, with
Design of the B 5000 System slightly increased capability but at a lower cost.
M(content addressable; f l i p f l o p )
Mp(#0:3)'4(4 Mp; 3 (P,K)) i('Map)-F'c2-S K--Ms(magnetic tape)-
E
L T ( p a p e r tape)-
K-S-T ( T e l e t y p e ) -
6
Pi0 K--Ms(drum: 2 d w ; 1.3 x 10 w)
K-Ms(moving head d i s k : 1.5 x 10' w)
Summary After a brief survey of the basic Atlas machine, the paper requisite transfers of information taking place automatically. There
describes an automatic system which in principle can be applied to any are a number of additional benefits derived from the scheme
combination of two storage systems so that the combination can be regarded adopted, which include relative addressing so that routines can
by the machine user as a single level. The actual system described relates operate anywhere in the store, and a “lock out,, facility to prevent
to a fast core store-drum combination. The effect of the system on instruc-
interference between different programs simultaneously held in
tion times is illustrated, and the tape transfer system is also introduced
the store.
since it fits basically in through the same hardware. The scheme incor-
porates a “learning” program, a technique which can be of greater impor-
tance in future computers.
2. The basic machine
The arrangement of the basic machine is shown in Fig. 1. The
1. Introduction available storage space is split into three sections; the private store
In a universal high-speed digital computer it is necessary to have which is used solely for internal machine organization, the central
a large-capacity fast-access main store. While more efficient oper- store which includes both core and drum store, in which all words
ation of the computer can be achieved by making this store all are addressed and is the store available to the normal user, and
of one type, this step is scarcely practical for the storage capacities finally the tape store, which is the conventional backing-up large
now being considered. For example, on Atlas it is possible to capacity store of the machine. Both the private store and the main
address lo6 words in the main store. In practice on the first instal- core store are linked with the main accumulator, the B-store, and
lation at Manchester University a total of lo5 words are provided, the B-arithmetic unit. However the drum and tape stores only have
but though it is just technically feasible to make this in one level acces5 to these latter sections of the machine via the main core
it is much more economical to provide a core store (16,000words) store.
and drum (96,000 words) combination. The machine order code is of the single address type, and a
Atlas is a machine which operates its peripheral equipment on comprehensive range of basic functions are provided by normal
a time division basis, the equipment “interrupting” the normal engineering methods. Also available to the programmer are a
main program when it requires attention. Organization of the number of extra functions termed “extracodes” which give auto-
peripheral equipment is also done by program so that many pro- matic access to and subsequent return from a large number of
grams can be contained in the store of the machine at the same built-in subroutines. These routines provide
time. This technique can also be extended to include several main
1 A number of orders which would be expensive to provide
programs as well as the smaller subroutines used for controlling
in the machine both in terms of equipment and also time
peripherals. For these reasons as well as the fact that some orders
because of the extra loading on certain circuits. An example
take a variable time depending on the exact numbers involved, of this is the order:
it is not really feasible to “optimum” program transfers of infor- Shift accumulator contents +n places where n is an integer.
mation between the two levels of store, i.e., core store and drum,
2 The more complex mathematical operations, e.g., sin x,
in order to eliminate the long drum access time of 6 msec. Hence
logx, etc.,
a system has been devised to make the core drum store combi-
nation appear to the programmer as a single level of storage, the 3 Control orders for peripheral equipments, card readers,
parallel printers, etc.,
‘ I R E Truns., EC-II, vol. 2, pp. 223-235, April, 1962. 4 Input-output conversion routines,
276
about 10s weds. ];n R(ue the central store c&paoi2y is about !it@W
r - - - -----1 words OB 4 drums. Any part afthis *re CBn be trans-
ferred ha b l a h 81 !W wads -/from the main core stom, which
am&& of four mpuate stacks, each stack hwbg a wpadty of
4088Wonaa
The &ip system provides a veay large capacity baddag store
! for the machine. The user aua &e@ transfers of v@&kr Lpmaunts
of informWon between this store and the eatad &om In octual
fa& suoh &ansfenare o r @ d by a fixedstcue program which
initiatesc- transfers of blocks of 512 WQlCdio W e e n the
main core store. “he system cpn
5 Special programs concerned with storage allocation to system newswily takes time to establish its
different programs being run sknuftaneously, monitoring pridty, ead 90 &E? b W n fSlWlg0d thoz 00- b t b 8- .81y
routines for fault finding and costing purposes, and the at ea& cinnn ar tapa request. Thus the madhe is not slowed
detailed organization of drum and tape transfers. dmm in payway when aodnug or tape trunshs take place. Thtt
of &am a a d tape t r a d e r s on machine speed is given in
All this information is permanently required and hence is kept Appendix 1.
in part of the private store termed the “fixed store” [Kilburn and To simplt€ythe aontrol commands given t o the drum,t i p . and
Grimsdale, l W a ] which operates on a “read only” basis. This store PBzfpherpaaqUtpHIent in tbs msrchiae, the rdtm all take the b
consists of a woven wire me& into which a pattern of small b+ S or a+ B d the identification of t$e mquired eonmaad
“linear” ferrite slugs are inserted to represent digitai information. register is p v k ? dby the address S,This type of storpgeis daatly
The information content can only be changed manually and will widely Soaapered in &e machine but is termed c o l l e c t h l y the
tend to differ only in detail between the different versions of the v-stm.
Atlas computer. In Muse this store is arranged in two units each En ilye o~ntnhmachine &e main accumulator conbins a fast
of 4096 words, a unit consisthg of 16 columrrs of 256 words, each uMar [Uhrn-et at., 1tHhi~J d has built-in nrrtwplication nnd
word being 50 bits. The access time to a word in any one column diviJiciH &&ties. ft cwn dasal with fked or hating poi& numbers
is about 0.4 psec. If a change of column address is required, this and its operation is completely independent of the B-store and
figure increases by about 1 p e c due to switching w e n t s in the & ~ unit, Tbe c a fast core store. (cycle time 0.7
~ B-store is
read amp&rs. s\tbsequent accsssssin the new c t h r u ~ revert to pee) Qf 1W twenty-four bit words operating in a wosd selected
0.4 p e c . The store operates in mnj with a subsidiary core partial flw wwitclbing mock [Edwards et al., ‘‘fast’’
store of 1024 words which provides working space for the b e d B lines u;e+olsoprovided ia the h m of flipflo these,
store programs, and has a cycle time of about 1.8 pec. There are thwe am uwd as cm&ollines, terbped mojn, extrscode, d inter-
certain safeguards a g h t a normal machine user rupt con&& raapectively. The arrangement has the advantage
to addre- in either part of the privstc store, thcwgh in effect that the & td hnmbers can be m a a t p W b y &e Mwmai&type
he makes use of this stom &rot& the extracode facility. orders, and the existence of three controb permits the machine
The central store of the madthe consists of a dnun and core t~ swit&b wpidiy from one to another without having to transfer
store combination, whiuh Bas a maxi- edclPcssoble oopcity of e m t d &rs to the core store. Main control is used when the
278 Part 3 I The instruction-set processor level: variations in the processor Section 6 1 Processors with multiprogramming ability
( c o r e store and d r u m )
instructions per second. This is achieved by the use of fast tran- I
sistor logic circuitry, rapid access to storage locations, and an 1 I 0 O 10 0 t 0 0 O - C o l u m n - L L i n e a d d r e s s
from the drum and also provides a safeguard against transferring position can then be made from the central machine. It is clear
to the wrong block. that the L.O. digit can also be used to prevent interference be-
As soon as the order asking for a read transfer from the drum tween programs when several different ones are being held in the
has been given the machine continues with the drum transfer machine at the same time.
program. It is now concerned with determining a block to be In Sec. 3 it was stated that addresses demanding access to the
transferred back from the core store to the drum. This is necessary core store could arise from three distinct sources, the central
to ensure an empty core store page position when the next read machine, the drum, and the tape. These accesses are complicated
transfer is required. The block in the core store to be transferred because of (1) the equivalence technique, and (2) the lock out digit.
has to be carefully chosen to minimize the number of transfers The various cases and the action that takes place are summarized
in the program and this optimization process is carried out by a in Table 1.
learning program, details of which are given in Sec. 5. The opera- The provision of the Page Address Registers, the equivalence
tion of this program is assisted by the provision of the “use” digits circuitry, and the learning program have permitted the core store
which are associated with each page position of the core store. and drum to be Legarded by the ordinary machine user as a one-
To interchange information between the core store and drums, level store, and the system has the additional feature of “floating
two transfers, a read from and a write to the drum are necessary. address” operation, Le., any block of information can be stored
These have to be done sequentially but could occur in either order. in any absolute position in either core or drum store. The minimum
The technique of having a vacant page position in the core store access time to information in this store is obviously limited by
permits a read transfer to occur first and thus allows the time for the core store and its arrangement and this is now discussed.
the learning program to be overlapped either into the waiting
period for the read transfer or into the transfer time itself. In the
time remaining after completion of the learning program an entry B. Core store arrangement
is made into the over-all supervisor program for the machine, and The core store is split into four stacks, each with individual address
a decision is taken concerning what the machine is to do until decoding and read and write mechanisms. The stacks are then
the drum transfer is completed. This might involve a change to combined in such a way that common channels into the machine
a different main program. for the address, read and write digits are time shared between
A program could ask for access to information in a page position the various stacks. Sequential address positions occur in two stacks
while a drum or tape transfer is taking place to that page. This alternately and a page position which contains a block of 512
is prevented in Atlas by the use of a “lock out” (L.O.) digit which sequential addresses is thus arranged across two stacks. In this way
is provided with each Page Address Register. When a lock out it is possible to read a pair of instructions from consecutive ad-
digit is set at 1, access to that page is only permitted when the dresses in parallel by increasing the size of the read channel. This
address has been provided either by the drum system, the tape permits two instructions to be completely obeyed in three store
system, or the interrupt control. The latter case permits all trans- “accesses.” The choice of this particular storage arrangement is
fers from paper tape, punched card, and other peripheral equip- discussed in Appendix 2.
ments, to be handled without interference from the main program. The coordination of these four stacks is done by the “core stack
When the transfer of a block has been completed the organizing coordinator” and some features of this are now discussed, starting
program resets the L.O. digit to zero and access to that page with the operation of a single stack.
Table 1 Comparison of demanded block address with contents of the P.A.R.’s resultant state of equivalence and lock out circuits
Equivalence [ Equioalence )
Lock out = 0 Not equivalence Lock out = 1
Sourw of address lE.Q.1 [N.E.Q.] [E.Q. 6- L.O.]
~ ~ ~~~~
1 Central Machine Access to required page position Enter drum transfer routine Not available to this program
2 Drum System Access to required page position Fault condition indicated Fault condition indicated
3 Tape System Access to required page position Fault condition indicated Fault condition indicated
C. Operation Of U &@e rtedr Of corC8rt0rc There is a small delay W, (N 100 mpec) between the “stack
The storage system employed is a cdncident currant M.I.T. system request” signal, SR, and the start of the rtwd phase to allow for
arranged to give paralkl read out of 50 digits. The reading opera- setting of the address s t a b d the decodbg. The output
tion is de$tmctbe and each read phase of the stack cycle Is fol- informath from the store appears in the read strobe period,which
lowed by a write phase during which the infonnaton read ont is towards the end of the read phase. In general, the write phase
may be rewritten. This is achieved by a set of digit stpti&zors starts as soon UL the read phase ends. However, the start of the
which am loaded during the read phase and are a d to control write phase may be held up until the new information is available
the inhibit current drivers d w k g the write phase. When new from the central machine. This delay is shown as W, in Fig. 3c.
information is to be written into the store a similar sequence is The interval T ’ between the stack request and the read strobe
followed, except that the digit staticizors are loaded with the MW is termed &e stack access time, and in practice this is approxi-
information during the read phase. A diagram indicating the mately one third of the cycle time T,. Both Tn and T, are functions
different t y p e s of stack cycle is shown in Fig. 3. of the storage ryatem and resuming that W, is zero have typical
values of 0.7 and 1.9 p c respectively. A holdup gate in the
request channel prevents the next stack request occurring before
the end of the preceding write phase.
Stnrk - arises, then the speed of the system is not store-limited. In most
cases SET CSF is generated when the equivalence operation on
the demanded block address is complete, and the read phase of
the appropriate stack (or stacks) has swed. Until this time the
Write
strobe
!
I
‘
I
I
I U
information held in the B.A.R. must not be allowed to change.
In Fig. 5 a f?mv diagram is shown for the various cases which can
ariseinpmctice.
r - !I- l
Write
phose I When a single address request is accepted it is necesrary to
@% b obtain an “equivalence” indication and form the page location
IC) digits -re ttZe stack request can be generated. The SET CSF
r, = occess time; rc = cyclic
time; Wo = woit for oddrens decoding sippnrl thm OCC\IES as soon as the read phase starto. zf a “not equiva-
and loading of oddreu register; W w = woit for release of write hold
UP. lent” or ‘‘equivalent a& locked out” indication is a stack
request is not generated, and the contents of the 4A.R. are copied
Rg. 3.Bask types ofrtldr cycle. (a) Road orckr (s + A). (b) rmteonkr in to a line of &e V-store before SET CSF is p n e r a t d .
(a + S). (c) Road-writ~W (&I s + S +
). When access to a pair of addresses is reqwsted &e., an instruc-
282 Part 3 I The instruction-set processor level: variations in the processor Section 6 1 Processors with multiprogramming ability
Equivalence
addressbl
A
stack request is generated can arise for a number of reasons.
1 The preceding write phase of that stack has not yet finished.
register
2 The central machine is not yet ready either to accept infor-
EO NEQ EQEiLO
mation from the store, or to supply information to it.
Cornporison
circuit
SA1 OR SA2
1
Walt for
core store
free
1
Stack
Stock 0 Stack 1
re
Wait f o r
equivalence i
Woit (see t e x t )
1
WOlt f o r
equivalence
ond formotion and formalion
of page diglts o f page digits
I t
l
I page
1I
requests
Not equivalent compare Not equivalent
or equivolent or equivalent
ond locked digits with
contents o f and l o c k e d
Page digit out I
W o i t l s e e text1
7w-j
Main core store Copy
t o V l iBn A
eR requ,est
Stack
tion pair) the stack requests are generated on the assumption that
these instructions are located in the same page position as the last 1
S E T CSF
pair requested, Le., the page position digits are taken from the
Copy pede d i g i t s
page digit register. (See Fig. 4.)In this way the time required to to page digit
obtain the equivalent indication and form the page location digits
is not included in the over-all access time of the system. The
S E T CSF SET CSF SET CSF
assumption will normally be true, except when crossing block
boundaries. The latter cases are detected and corrected by com-
paring the true position page digits obtained as a result of the Fig. 5. Flow diagram of main core store control.
3 It is necessary to ensure a certain minimum time between The eppro&mate times for various iastrustiono are given in
SUCCesSive read strobes &Om the core store ScScks to d O W Table 2. These figures relate to the times between completing
mtisfactoV operation of the PafitY C k C U i b , Which take instructions when a long sequence of the same type of instruction
about 0.4 p e c to chwk the information. Thip time could is while this is not ideal, it is necessarybecause
be reduced, but as it is only poSIsible to get such a condition in practice obNg one instruction is overlapped in time with
for a part Of the instruction timing it some part of three other instructions. This makes the detailed
was not thought to be an economical proposition.
timing complicated, and so the timing sequence is developed
slowly by first considering instructions obeyed one after another.
The basic machine timing is now discussed.
It is convedient to make these instructiow a sequemce of floating
point additions with both instruction and operand in the core store
4. Instruction times and with the operand address single B-modiW.
In high-speed computers, one of the main factors limiting speed To obey this instruction the central machine makes two re-
of operation is the store cycle time. Here a number of tecbnlques, quests to the core store, one for the instruction and the second
e.g., splitting the core store into four separate stacks and extracting for the operand. After the instruction is received in the machine
two instructions in a single cycle, have been adopted despite a the function part has to be dscaded and tlm operand address
fast basic cycle time bf 2 p e c in order to alleviate this situation. modified by the contents of one of the B registers More the
The time taken to complete an instruetion is dependent upon operand request can be made. Finally, after the operand has been
obmned the actual accumulator addition takes place to complete
1 The type of instruction (which is defined by the function the instruction. The time from beginning to end of one instruction
~git.4 is 6.05pec and an approximate timing schedule is as follows in
2 The exact location of the instruction and operand in the Table 3.
core or fixed store since this em affect the access time If no other action is permitted in the time required to complete
3 Whether or not the operand address is to be modified the instruction (steps 1to 8 in Table 3). then the different sections
of the machine u w being used very inefliciently, eg., the accumu-
4 In the case of floating point accumulator orders, the actus1 lator adder is only used for less than 1.1p e c . However, the orga-
numbers themselves nization of the computer is such that the different sections such
5 Whether dnun and/or tape transfers are taking place as store stacks, accumulator and M t h m e t i c unit, can operate
Table 3t Timing sequence for floating point addition (instructions Table 3 shows that the arithmetic operation takes 1.2 psec to
and operands in the core store) complete so that, on the average, the capabilities of the store and
the accumulator are well matched.
Time interval Total
between steps time Another technique for reducing store access time for instruc-
Sequence ELSec Pec tions has also been adopted. This permits the read cycles of the
two stacks to start assuming that the same page will be referred
1. Add 1 to Main Control 0 to as in the previous instruction pair. This, of course, will normally
(Addition time) 0.3
be true and there is sufficient time to take corrective procedures
2. Make Instruction Request 0.3
(Transfer times, equivalence time should the page have been changed. The limit of 1.2 psec per
and stack access time) 1.75 instruction is not reduced by this technique, but the possibility
3. Receive Instruction in Central Machine 2.05 of reaching this limit under other conditions is enhanced.
(Load register and decode) 0.2 A schematic diagram of the practical timing of a sequence of
4. Function decoding complete 2.25
floating point addition orders is shown in Fig. 6. The overlapping
(Single address modification) 0.85
5. Request Operand 3.10 is not perfect and in the time between successive instruction pairs
(Transfer times, equivalence time the computer is obeying four instructions for 25 per cent of the
and stack access time) 1.75 time, three for 56 per cent and two for 19 per cent. It is therefore
6. Receive Operand in Central Machine 4.85 to be expected that the practical time for the complete order is
(Load register) 0.1 greater than the theoretical minimum time; it is in fact approxi-
7. Start Addition in Accumulator 4.95
(Average floating point addition, mately 1.6 psec.
including shift round and stand- For certain types of functions the reading of the next pair of
a rd i se) 1.1 instructions before completing both instructions of the first pair
8. Instruction complete 6.05 would be incorrect, e.g., functions causing transfer of control. Such
t In step 4, time is for single address modification. Times for no modification situations are recognized during the function decoding, and the
and two modifications are 0.25 psec and 1.55 psec respectively. request for the next instruction pair is held up until a suitable
time.
In a sequence of floating point addition orders with the operand
at the same time. In this way several instructions can be started addresses unmodified the limit is again 1.2 psec while the time
before the first has finished, and then the effective instruction time obtained is 1.4 p e c . For accumulator orders in which the actual
is considerably reduced. There have, of course, to be certain safe- accumulator operation imposes a limit in excess of 2 psec then
guards when for example an instruction is dependent in any way the actual time is equal to this limit.
on the completion of a preceding instruction. Perhaps a more realistic way of defining the speed of the com-
In the time sequence previously tabulated, by far the longest puter is to give the time for a typical inner loop of instructions.
time was that between a request in the central machine for the A frequently occurring operation in matrix work in the formation
core store and the receipt in the central machine of the infor- of the scalar product of two vectors, this requires a loop of five
mation from that store. This effective access time of 1.75 psec is instructions:
made up as shown in Table 4. It has been reduced in practice
by the provision of two buffer registers, one in the central machine
Table 4 Effective store access time
and the other in the core stack coordinator. These allow the
equivalence and transfer times to be overlapped with the organi- Total time
zation of requests in the central machine. Sequence J!=c
In this way, provided the machine can arrange to make requests
1. Request in Central Machine 0
fast enough, then the effective access time is reduced to 0.8 p e c . 2. Request in Core Stack Coordinator 0.25
Further, since three accesses are needed to complete two instruc- 3. Equivalence complete and request made to selected
tions (one for an instruction pair and one for each of the two stack 0.95
operands) the theoretical minimum time of an instruction is 1.2 4. Information in Core Stack Coordinator 1.65
psec 3 ~ 0 . 8 / 2and it then becomes store limited. Reference to 5. Information in Central Machine 1.75
Chapter 23 1 on*lovel storage syskm 28!5
4
3
Start
next pair
I
Instruction
request
I
1.31 Stock
request
Equivalence
, Read I Function
I
decode Bmodification
OCC
Operand
request
I I
Stack
request
Equivalence
Start second
I
of pair
5 IFd":zl 8 modification
Start Instruction
next pair request
6
I I 1$1 Equivalence
Fig. 6. Timing diagram for a sequence of floating point addition orders. (Singleaddress modification.)
1 Element of first vector into accumulator. (Operand B-modi- store has been initiated, the organizing program examines the state
fied.) of the core store, and if empty pages still exist, no further action
2 Multiply accumulator by element of second vector. (Oper- is taken. However, if the core store is full it is necessary to arrange
and B-modified.) for an empty page to be made available for use at the next non-
equivalence. The selection of the page to be transferred could be
3 Add partial product to accumulator.
made at random; this could easily result in many additional trans-
4 Copy accumulator to store line containing partial product. fers occurring, as the page selected could be one of those in current
5 Alter count to select next elements and repeat. use or one required in the near future. The ideal selection, which
would minimize the total number of transfers, could only be made
The time for this loop with instructions and operands on the by the programmer. To make this ideal selection the programmer
core store is 12.2 psec. The value of the overlapping technique would have to know (1)precisely how his program operated, which
is shown by the fact that the time from starting the first instruction is not always the case, and (2) the precise amount of core store
to finishing the second is approximately 10 psec. available to his program at any instant. This latter information
When the drum or tape systems are transferring information is not generally available as the core store could be shared by other
to or from the core store then the rate of obeying instructions central machine programs, and almost certainly by some fixed store
which also use the core store will be affected. The affect is dis- program organizing the input and output of information from slow
cussed in more detail in Appendix 1. The degree of slowing down peripheral equipments. The amount of core store required by this
is dependent upon the time at which a drum or tape request occurs fixed store program is continuously varying [Kilburn et al., 19611.
relative to machine requests. It also depends on the stacks used The only way the ideal pattern of transfers can be approached
by the drum or tape and those being used by the central machine. is for the transfer program to monitor the behavior of the main
The approximate slowing down is by a factor of 25 per cent during program and in so doing attempt to select the correct pages to
a drum transfer and by 2 per cent for each active tape channel. be transferred to the drum. The techniques used for monitoring
(See Appendix 1.) are subject to the condition that they must not slow down the
operation of the program to such an extent that they offset any
reduction in the number of transfers required. The method de-
5. The drum transfer learning program scribed occupies less than l per cent of the operating time, and
The organization of drum transfers has been described in Sec. 2A. the reduction in the number of transfers is more than sufficient
After the transfer of the required block from the drum to the core to cover this.
286 Part 3 1 The instruction-set processor level: variations in the processor Section 6 I Processors with multiprogramming ability
That part of the transfer program which organizes the selection required by the program for the longest time. If the first two rules
of the page to be transferred has been called the “learning” pro- fail to select a page the third ensures that if the page finally
gram, In order for this program to have some data on which to selected is wrong, in that it is immediately required again, then,
operate, the machine has been designed to supply information as in this case, Twill become zero and the same mistake will not
about the use made of the different pages of the core store by b e repeated.
the program being monitored. For all the blocks on the drum a list of values of T is kept.
With each page of the core store there is associated a “use” The values of T are set when the block is transferred to the drum:
digit which is set to “1”whenever any line in that page is accessed.
T = time of transfer-value of t for transferred page
The 32 “use” digits exist in two lines of the V-store and can be
read by the learning program, the reading automatically resetting When a block is transferred to the core store the value of T is
them to zero. The frequency with which these digits are read is used to set the value of T.
governed by a clock which measures not real time but the number
T = time of transfer-value of T for this block
of instructions obeyed in the operation of the main program. This
clock causes the learning program to copy the “use” digits to a
= length of last period of inactivity
list in the subsidiary store every 1024 instructions. The use of an For the block transferred from the drum t is set to 0.
instruction counter rather than a normal clock to measure “time” In order to make its decision the learning program has only
for the learning program is due to the fact that the operations to update two short lists and apply at the most three simple rules;
of the main program may be interrupted at random for random this can easily be done during the 2 msec transfer time of the block
lengths of time by the operation of peripheral equipments. With required as a result of the nonequivalence. As the learning program
an instruction counter the temporal pattern of the blocks used uses only fixed and subsidiary store addresses it is not slowed down
will b e the same on successive runs through the same part of the during the period of the drum transfer.
program. This is essential if the learning program is to make use The over-all efficiency of the learning program cannot b e
of this pattern to minimize the number of transfers. known until the complete Atlas system is working. However, the
When a nonequivalence occurs and after the transfer of the value of the method used has been investigated by simulating the
required block has been arranged, the learning program again adds behavior of the one-level store and learning program on the
the current values of the “use” digits to the list and then uses Mercury computer at Manchester University. This has been done
this list to bring up to date two sets of times also kept in the for several problems using varying amounts of store in excess of
subsidiary store. These sets consist of 32 values of t and T, one the core store available. One of these was the problem of forming
of each for each page of the core store. The value of t is the length the product A of two 80th order matrices B and C. The three
of time since the block in that page has been used. The value of matrices were stored row by row each one extending over 14
T is the length of the last period of inactivity of this block. The blocks, only 14 pages of core store were assumed to be available.
accuracy of the values of t and T is governed by the frequency The method of multiplication was
with which the “use” digits are inspected.
The page to be written to the drum is selected by the appli- b,, x 1st row of C = partial answer to 1st row of A
cation in turn of three simple tests to the values of t and T. b,, x 2nd row of C + partial answer = second partial answer,
etc.
1 Any page for which t > T + 1, or Thus matrix B was scanned once, matrix C 80 times and each row
2 That page with t # 0 and ( T - t) max, or of matrix A 80 times.
3 That page with T,, (all t = 0). Several machine users were asked to spend a short time writing
a program to organize the transfers for a general matrix multipli-
The first rule selects any page which has been currently out cation problem. In no case when the method was applied to the
of use for longer than its last period of inactivity. Such a page above problem were fewer than 357 transfers required. A program
has probably ceased to b e used by the program and is therefore written specifically for this problem which paid great attention
an ideal one to be transferred to the drum. The second rule ignores to the distribution of the rows of the matrices relative to block
all pages with t = 0 as they are in current use, and then selects divisions required 234 transfers. The learning program required
the one which, if the pattern of use is maintained, will not be 274 transfers; the gain over the human programmer was chiefly
Chapter 23 1 One-level storage system 287
due to the fact that the learning program could take full advantage time taken for address comparison into the store and machine
of the occasions when the rows of A existed entirely within one operating time if it is not to introduce any extra time delays.
block. Simulated tests have shown that the organization of drum transfers
Many other problems involving cyclic running of single or are reasonably efficient and other advantages which accrue, such
multiple sets of data were simulated, and in no case did the learn- as efficient allocation of core storage between different programs
ing program require more transfers than an experienced human and store lock out facilities are also invaluable. No matter how
programmer. intelligent a programmer may be he can never know how many
programs or peripheral equipments are in operation when his
A. Prediction of drum transfers program is running. The advantage of the automatic system is that
it takes into account the state of the machine as it exists at any
Although the learning program tends to reduce the number of
particular time. Furthermore if as in normal use there is some sort
transfers required to a minimum, the transfers which do occur still
of regular machine rhythm even through several programs, there
interrupt the operation of the program for from 2 to 14 msec as
is the possibility of making some sort of prediction with regard
they are initiated by nonequivalence interrupts. Some or all of
to the transfers necessary. This involves no more hardware and
this time loss could be avoided by organizing the transfers in
will be done by program. However, this stage will probably be left
advance. A very experienced programmer having sole use of the
until results on the actual system are obtained.
core store could arrange his own transfers in such a way that no
It can be seen that the system is both useful and flexible in
unnecessary ones ever occurred and no time was ever wasted
that it can be modified or extended in the manner previously
waiting for transfers to be completed. This would require a great
indicated. Thus despite the increase in equipment, the advantages
deal of effort and would only be worthwhile for a program that
which are derived completely justify the building of this automatic
was going to occupy the machine for a long time. By using the
system.
data accumulated by the learning program it is possible to recog-
nize simple patterns in the use made by a program of the various
blocks of the one-level store. In this way a prediction program APPENDIX 1 ORGANIZATION OF THE ACCESS REQUESTS
could forecast the blocks required in the near future and organize TO THE CORE STORE
the transfers. By recording the success or failure of these forecasts There are three sources of access requests to the core store, namely
the program could be made self-improving. For the matrix multi- the central machine, the drum,and the tape systems. In deciding
plication problem discussed above the pattern of use of the blocks how the sequence of requests from all three sources are to be
containing matrix C is repeated 80 times, and a considerable serialized and placed in some sort of order, a number of facts have
degree of success could be obtained with a simple prediction to be considered. These are
program.
1 All three sources are asynchronous in nature.
6. Conclusions 2 The drum and tape systems can make requests at a fairly
high rate compared with the store cycle time of approxi-
A specific system for making a core-drum store combination appear
mately 2 psec. For example, the drum provides a request
as a single level store has been described. While this is the actual
every 4 p e c and the tape system every 11 p e c when all
system being built for the Atlas machine the principles involved
8 channels are operative.
are applicable to combinations of other types of store. For exam-
ple, a tunnel diode-fast core store combination for an even faster 3 The drum and tape systems can only be stopped in multiples
of a block length, i.e., 512 words. This means that any system
machine. An alternative which was considered for Atlas, but which
devised for accessing the core store must deal with both
was not as attractive economically, was a fast core-slow core store
the average rates of drum and tape requests specified in 2.
combination. The system too can be extended to three levels of
Only the central machine can tolerate requests being stopped
storage, and indeed if 106 words of total storage had to be provided at any time and for any length of time. From these facts a
then it would be most economical to provide it on a third level request priority can be stated which is
of store such as a file drum. a Drum request.
The automatic system does require additional equipment and b Tape request.
introduces some complexity, since it is necessary to overlap the c Central machine request.
288 Part 3 1 The instruction-set processor level: variations in the processor
+I-
Remove stock request
way. Inhibit signals
When the central machine, drum and tape are sharing the Stock request
core store then the loss of central machine speed should for drum / t o p
Drum/tape request
be roughly proportional to the activity of the drum or tape I I
h m r t stack r e q u e s t
systems. This means that drum or tape requests must inhibits to reapply
“break’ into the normal machine request channel as and i
1s t h e r e o stored
machine order 7
when required. I A D P ~ Yinhibits t o
stack request channels
and to machine request
channels ( i f these are
not already applied)
The system which accommodates all these points is now dis-
cussed. Whenever a drum or tape request occurs inhibit signals 1
Has the stack request
o f 0 stored machine
are applied to request channel into the core stack coordinator and I order been Stopped i
1
also to the stack request channels from this coordinator. This
results in a “freezing” of the state of flip-flop F (Fig. 5) and this
S t a c k request of
stored rnochine Order
r-lx
No
i
state is then inspected (Fig. 7 , point X). If the state is “busy” this
means that a machine order has been stopped somewhere between
the loading of the buffer address register (B.A.R.) and the stack
request. Normally this time interval can vary from about 0.5 p e c
if there are no stack request holdups, to 20 psec in the case of Fig. 7. Drum and tape break in systems.
certain accumulator holdups. In either case sufficient time is al-
lowed after the inspection to ensure that the equivalence operation
has been completed. If an equivalence indication is obtained all A drum or tape access (as decided by the priority circuit) to
the information relevant to this machine order (i.e., the line ad- the core store then occurs, which removes the inhibits on the stack
dress, page digits, stack(s) required and type of stack order) are request channels. When the stack request for the drum or tape
stored for future reference. Use is made here of the page digit cycle is initiated these inhibits are allowed to reapply. At this stage
register provided to allow the by-pass on the equivalence circuitry (Fig. 7, point Y), if there is a stored machine order it is allowed
for instruction accesses. The core store is then made free for access to proceed if possible. The inhibits on the machine request chan-
by the drum or the tape. If the core store had been found to be nels are removed when the stack request for the stored machine
free on inspection, the above procedure is omitted. order occurs. If there is no stored machine order this is done
Chapter 23 I One-level storage system 289
immediately, and the central machine is again allowed access to the result in this particular case that the machine can still operate
the core store. However, another drum or tape request can arise at 80 per cent of its normal speed.
before the stack request of the stored machine order occurs, in
particular because this latter order may still be held up by the APPENDIX 2 METHODS OF DIVISION OF THE MAIN
CORE STORE
central machine. If this is the case the drum or tape is allowed
immediate access and a further attempt is made to complete the The maximum frequency with which requests can be dealt with
stored machine order when this drum or tape stack request occurs. by a single stack core store is governed by the cycle time of the
If the stored machine order was for an operand, the content store. If the store is divided into several stacks which can be cycled
of the page digit register will correspond to the location of this independently then the limit imposed on the speed of the machine
operand. The next machine request for an instruction pair will by the core store is reduced. The degree of division which is chosen
then almost certainly result in a “wrong page” indication. This is dependent upon the ratio of core store cycle time to other
is prevented by arranging that the next instruction pair access does machine opqrations and also upon the cost of the multiple selec-
not by-pass the equivalence circuitry. tion mechanisms required. ,
The effect on the machine speed when the drum or tapes are Considering a sequence of orders in which both the instruction
transferring information to or from the core store is dependent and operand are in the core store, then for a single stack store
upon two factors. First, upon the proportion of time during which the limit imposed on the operating speed by the store is two cycle
the buffer register in the core coordinator is busy dealing with times per order, Le., 4 psec in Atlas. This is significantly larger
machine requests, and secondly, upon the particular stacks being than the limits imposed by other sections of the computer
used by the central machine and the drum or tape. If the computer (Sec. 4). If the store is divided into two stacks and instructions and
is obeying a program with instructions and operands on the fixed operands are separated, then the limit is reduced to 2 p e c which
or subsidiary store then the rate of obeying instructions is un- is still rather high. The provision of two stacks permits the ad-
affected by drum or tape transfers. A drum or tape interrupt dressing of the store to be arranged so that successive addresses
occurring when the B.A.R. is free prevents any machine address are in alternate stacks. i t is therefore possible by making requests
being accepted onto this buffer for 1.0psec. However, if the B.A.R. to both stacks at the same time to read two instructions together,
is busy then the next machine request to the core store is delayed so reducing the number of access times to three per instruction
until 1.8 psec after the interrupt if different stacks are being used, pair. Unfortunately such an arrangement of the store means that
or until 3.4 psec after the interrupt if the stacks are the same. operands are always on the same stacks as instruction pairs, and
When the machine is obeying a program with instructions and the limit imposed by the cycle time is still 2 p e c per order even
operands on the core store the slowing down during drum transfers if the two operand requests in the instruction pair are to different
can be by a factor of two if instructions, operands, and drum stacks and occur at the same time.
requests use the same stacks. It is also possible for the machine Division into any number of stacks with the addressing system
to be unaffected. The effect on a particular sequence of orders working through each stack in turn cannot reduce the limit below
can be seen by considering the one discussed in Sec. 4 and illus- 2 psec since successive instructions normally occur in successive
trated in Fig. 6. i n this sequence the instructions are on stacks addresses and are therefore in the same stack. However, four stacks
0 and 1 while the operands are on stacks 2 and 3. i f the drum arranged in two pairs reduces the limit to 1psec as the operands
or tape is transferring alternately to stacks 0 and 1then the effect can always be arranged to be on different stacks from the instruc-
of any interrupt within the 3.2 psec of an instruction pair is to tion pairs. In order to reduce the limit to 0.5 psec it is necessary
increase this time by between 0.5 and 3.4 p e c depending upon to have eight stacks arranged in two sets of four and to read four
where the interrupt occurred. The average increase is 1.8 psec instructions at once, which would increase the complexity of the
and for a tape transfer with interrupts every 88 p e c the computer central machine.
can obey instructions at 98 per cent of the normal rate. During The limit of 1p e c is quite sufficient and further division with
drum transfers the interrupts occur every 4 psec which would the stacks arranged in pairs only enables the limit to be more easily
suggest a slowing down to 60 per cent of normal. However, for obtained by suitable location of the instructions and operands.
any regular sequence of orders the requests to the core store by The location of instructions and operands within the core store
the machine and by the drum rapidly become synchronized with is under the control of the drum transfer program; thus when there
290 Pari 3 1 The instruction-set processor level: variations in the processor Section 6 1 Processors with multiprogramming ability
Summoy This paper describes the design of the computer seen by a In a time-sharing system which has been developed by and for
machine-language programmer in a time-sharing system developed at the the use of members of Project Genie at the University of California
University of Californiaat Berkeley. Some of the instructionsin this machine at Berkeley [Lichtenberger and Pirtle, 19651, the user machine
are executed by the hardware, and some are implemented by software. has a number of interesting characteristics. The computer in this
The user, however, thinks of them all as part of his machine, a machine system is an SDS 930, a 24 bit, fixed-point machine with one index
having extensive and unusual capabilities, many of which might be part register, multi-level indirect addressing, a 14 bit address field, and
of the hardware of a (considerably more expensive) computer.
32 thousand words of 1.75ps memory in two independent modules.
Among the important features of the machine are the arithmetic and
string manipulation instructions, the very general memory allocation and Figure 1 shows the basic configuration of equipment. The memory
configuration mechanism, and the multiple processes which can be created is interleaved between the two modules so that processing and
by the program. Facilities are provided for communication among these drum transfers may occur simultaneously. A detailed description
processes and for the control of exceptional conditions. of the various hardware modifications of the computer and their
The input-output system is capable of handling all of the peripheral implications for the performance of the overall system has been
equipment in a d o r m and convenient manner through files having sym- given in a previous paper [Lichtenberger and Pirtle, 19651.
bolic names. Programs can access files belonging to a number of people, Briefly, these modifications include the addition of monitor and
but each person can protect his own files from unauthorized access by user modes in which, for user mode, the execution of a class of
others. instructions is prevented and replaced by a trap to a system rou-
Some mention is made at various points of the techniques of implemen-
tine. The protection from unauthorized access to memory has been
tation, but the main emphasis is on the appearance of the user’s machine.
subsumed in an address mapping scheme: both the 16 384 words
addressable by a user program (logical addresses) and the 32 768
Introduction
words of actual core memory (physical addresses) have been
A characteristic of a time-sharing system is that the computer seen divided into 2048-word pages. A set of eight six-bit hardware regis-
by the user programming in machine language differs from that ters defines a map from the logical address space t o the real memory
on which the system is implemented [Bright, 1964; Comfort, 1965; by speclfying the real page which is to correspond to each of the
Forgie, 1965; McCullogh et al., 1965; Schwartz, 19641. In fact, user’s logical pages. Implicit in this scheme is the capability of
the user machine is defined by the combination of the time-sharing marking each of the user’s pages as unassigned or read-only, so that
hardware running in user mode and the software which controls any attempt to access such a page improperly will result in a trap.
input-output, deals with illegal actions which may be taken by All memory references in user mode are mapped. In monitor
a user’s program, and provides various other services. If the hard- mode, all memory references are normally absolute. It is possible,
ware is arranged in such a way that calls on the system have the however, with any instruction in monitor mode, or even within
same form as the hardware instructions of the machine [Lichten- a chain of indirect addressing, to specify use of the user map.
berger and Pirtle, 19651, then the distinction becomes irrelevant Furthermore, in monitor mode the top 4096 words are mapped
to the user; he simply programs a machine with an unusual and through two additional registers called the monitor map. The
powerful instruction set which relieves him of many of the prob- mapping process is illustrated in Fig. 2.
lems of conventional machine-language programming [Lampson, Another si@cant hardware modification is the mechanism for
1965; McCarthy et al., 19631.
going between modes. Once the machine is in user mode, it can
‘Pm. IEEE, 54, vol. 12, pp. 1766-1774, December, 1966. get to monitor mode under three circumstances:
291
292 Part 3 I The instruction-set processor level: variations in the processor Section 6 I Processors with multiprogramming ability
m
1 processor I
Basic features of the machine
A user in the Berkeley time-sharing system, working at what he
I
I Memory
175esec Graphic
display
and
thinks of as the hardware language level, has at his disposal a
machine with a configuration and capability which can be con-
veniently controlled by the execution of machine instruction se-
light pen
I 3 x IO6 WORDS quences. Its simplest configuration is very similar to that of a
51105 WDS/SEC
IGeneral 1 1
POQ 3
0 4
I 5
Fig. 1. Configuration of equipment. 2 6
3 7
4 8
5 9
6 10
1 If a hardware interrupt occurs 7 11
I2
l 6 K virtual core 13
2 If a trap is generated by the user program as outlined. la
u 1 5
3 If an instruction with a particular configuration of two bits 32K real core
is executed. Such an instruction is called a system pro- (0)
In case 3, the six-bit operation field is used to select one of 64 joo01001( Mapping reglster 5 118
locations in absolute core. The current address of the instruction
[go ,003 ; 0 7 m : Real effective address 4 4 6 5 4 a
is put into absolute location zero as a subroutine link, the indirect
address bit of this link word is set, and another bit is set, marking fl Read-only bit o f f
the memory location in the link word as having come from user- (b)
standard medium-sized computer. In this configuration, the result in an ilkgal instruction violation. The effect of an illegal
machine possesses the standard 930 complement of arithmetic and instruction violation is described later.
logic instructions and, in addition, a set of software interpreted
monitor and executive instructions. The latter instructions, which
Memory configuration
will be discussed more fully in the following, do rather complex The memory size and organization of the machine is specified by
input-output of many different kinds, perform many frequently an appropriate sequence of instructions. For example, the user may
used table lookup and string processing functions, implement specify a machine which has 6K of memory with addresses from
floating point operations, and provide for the creation of more 0 to 13777,; alternatively, he may specify that the 6K should
complex machine configurations. Some examples of the instructions include addresses 0 to 3777,, 14000, to 17777,, and 34oO0, to
available are: 37777,. The user may also specify the size and configuration of
the machine’s secondary storage and, to a considerable extent, the
Load A, B, or X (index) registers from memory or store any structure of its input-output system. A full discussion of this capa-
of the registers. Indexing and indirect addressing are avail- bility will be deferred to a later section.
able on these and almost all other instructions. Double word The next few paragraphs discuss the mechanism by which the
load and store are also available. user’s program may specify its memory size and organization. This
The normal complement of fixed-point arithmetic and logic mechanism, known as the process map to distinguish it from the
operations. hardware memory address mapping, uses a (software) mapping
register consisting of eight 6-bit bytes, one byte for each of the
Skips on various arithmetic and logic conditions.
eight 2K blocks addressable by the 14 bit address field of an in-
Floating point arithmetic and input-output. The latter is struction. Each of these bytes either is 0 or addresses one of the
in free format or in the equivalent of Fortran E or F format. 63 words in a table called the private memory table (PMT).Each
Input a character from a teletype or write a block of arbi- user has his own private memory table. An entry in this table
trary length on a drum file. provides information about a particular 2K block of memory. The
Look up a string in a hash-coded table and obtain its posi- block may be either local to the user or it may be shared. If the
tion in the table. block is local, the entry gives information about whether it is
currently in core or on the drum. This information is important
Create a new process and start it running concurrently with
to the system but need not concern the user. If the block is shared,
the present one at a specified point.
its PMT entry points to an entry in another table called the shared
Redefine the memory of the machine to include a portion memory table (SMT). Entries in this table describe blocks of
of that which is also being used by another program. memory which are shared by several users. Such blocks may con-
tain invariant programs and constants, in which case they will be
It should be emphasized that, although many of these instruc- marked as read-only, or they may contain arbitrary data which
tions are software interpreted, their format is identical to the is being processed by programs belonging to two different users.
standard machine instruction format, with the exception of the A possible arrangement of logical or virtual memory for a
one bit which specifies a system interpreted instruction. Since the process is shown in Fig. 3. The nature of each page has been noted
system interpretation of these instructions is completely invisible in the picture of the virtual memory; this information can also
to the machine user, and since these instructions do have the be obtained by taking the corresponding byte of the map and
standard machine instruction format, the user and his program looking at the PMT entry specified by that byte. The figure shows
make no distinction between hardware and software interpreted a large amount of shared memory, which suggests that the process
instructions. might be a compilation, sharing the code for the compiler with
Some of the possible 192 operation codes are not legal in the other processes translating programs written in the same source
user machine. Included in this category are those hardware in- language. Virtual pages one and two might hold tables and tem-
structions which would halt the machine or interfere with the porary storage which are unique to each separate compilation.
input-output if allowed to execute, and those software interpreted Note that, although the flexibility of the map allows any block
instructions which attempt to do things which are forbidden to of code or data to appear anywhere in the virtual memory, it is
the program. Attempted execution of one of these instructions will certainly not true that a program can run regardless of which pages
294 Part 3 I The instruction-set processor level: variations in the processor Section 6 I Processors with multiprogramming ability
the routine and data base do not fit into 16K, or where several of which is independent of the others and equivalent to them from
common routines are concurrently employed, it may be necessary the point of view of the originating process. Figure 4 shows two
to make frequent adjustment to the map during execution. simple multi-process structures, one for each of two users. Note
that each process has associated with it pointers to its controlling
Multiple processes process and to one of its subsidiary processes. When a process has
An important feature of the user machine allows the user program, two immediate descendants, as in the case of processes 1.2 and
which in the current context will be referred to as the controlling 1.3, they are chained together on a ring. Thus, three pointers, up,
process, to establish one or more subsidiary processes. With a few down, and ring, suffice to defme the process structure completely.
minor exceptions, to be discussed, each subsidiary process has the The up pointers are, of course, redundant, but are convenient for
same status as the controlling process. Thus, it may in turn estab- the implementation. The process is identified by a process number
lish a subsidiary process. It is therefore apparent that the user which is returned by the system when it is created.
machine is in fact a multi-processing machine. The original sug- A complex structure such as that in Fig. 5 may result from the
gestion which gave rise to this capability was made by Conway creation of a number of subsidiary processes. The processes in
[Conway, 19631, more recently the Multics system has included Fig. 5 have been numbered arbitrarily to allow a clear description
a multi-process capability [Corbato and Vyssotsky, 1965; Dennis of the way in which the pointers are arranged. Note that the user
and Van Horn, 1966; Saltzer, 19661. need not be aware of these pointers; they are shown here to clarify
A process is the logical environment for the execution of a the manner in which the multiple process mechanism is imple-
program, as contrasted to the physical environment, which is a mented.
hardware processor. It is defmed by the information which is re- A process may destroy one of its subsidiary processes by execut-
quired for the program to run; this information is called the state ing the appropriate instruction. For obvious reasons this operation
vector. To create a new process, a given process executes an in- is not legal if the process being destroyed itself has subsidiary
struction which has arguments specifying the state vector of the
new process. This state vector includes the program counter, the
central registers, and the process map. The new process may have
a memory configuration which is the same as, or completely differ-
ent from, that of the originating process. The only constraint
, placed on this memory specification is that the total memory
available to the multi-process system is limited to 128K by the
process mapping mechanism, which is common to all processes.
Each user, of course, has his own 128K.
This facility was put into the system so that the system could
control the user processes. It is also of direct value, however, for
many user processes. The most obvious examples are input-output
buffering routines, which can operate independently of the user’s PMT 1 PMT 2 SMT
main program, communicating with it through memory and with 1 M3 1 SMT1 1 M1
2 M4 2 SMT5 2 MI6
interrupts (see the following). Whether the operation being buff- 3 M5 3 M7 3 M2
4 SMT1 4 M8 4 M1O
ered is large volume output to a disc or teletype requests for 5 SMT4 5 M9 5 M11
6 SMT2 6 SMT2 6 M6
information about the progress of a running program, the degree 7 M12 7 M13
8 SMT6 8 SMT3
of flexibility afforded by multiple processes far exceeds anything 9 SMT3 9 M14
which could have been built into the input-output system. Fur- 10 I O M15
3 If the process attempts to obtain new memory, scan upward in characteristics and behavior, the flexibility of the operations
through the process hierarchy until the topmost process is available on files is clearly critical. They must range from single-
reached. If at any time during this scan a process is found character input to the output of thousands of words.
for which the address causing the trap is legal, propagate A file is opened by giving its name as an argument to the
the memory assigned to it down through the hierarchy to appropriate instruction. Programs thus refer to all files symboli-
the process causing the trap.
cally, leaving the details of physical location and organization to
the system. r'f authorized, a program may refer to files belonging
Option 3 permits a process to be started with a subset of to other users by supplying the name of the other user as well
memory and later to reacquire some of the memory which was as the file name. The owner of a file determines who is authorized
not given to it initially. This feature is important because the to access it. The reader may compare this file naming mechanism
amount of memory assigned to a process influences the operating with a more sophisticated one [Daley and Neumann, 19651,bearing
efficiency of the system and thus the speed with which it will be in mind the fact the file names can be of any length and can be
able to respond to teletypes or other real-time devices. manipulated (as strings of characters) by the program.
Access to files is, in general, either sequential or random in
nature. Some devices (like a keyboard-display or a card reader)
The input-output system are purely sequential, while others (like a disk) may be either
The user machine has a straightforward but unconventional set sequentially or randomly accessed. There are accordingly two
of input-output instructions. The primary emphasis in the design major 1/0 interfaces to deal with these different qualities. The
of these instructions has been to make all input-output devices interface used in conjunction with a given file depends on whether
interface identically with a program and to provide as much the file was declared to be a random or a sequential file. The two
flexibility in this common interface as possible. Two advantages major interfaces are each broken down into other interfaces, pri-
result from this uniformity: it becomes natural to write programs marily for reasons of implementation. Although the distinction
which are essentially independent of the environment in which between sequential and random files is great, the subinterfaces are
they operate, and the implementation of the system is greatly not especially visible to the user.
simplified. To the user the former point is, of course, the important
one. Sequential J;k
It has been common, for example, for programs written to be
The three instructions CIO (character input-output), WIO (word
controlled from a teletype to be driven instead from a file on, let
input-output), and BIO (block input-output) are used to commu-
us say, the drum. A command exists which permits the recognizer
nicate with a sequential file. Each instruction takes as an operand
for the system command language and all of the subsystems to
ajile number. This number is given to the program when it opens
be driven in this way. This device is particularly useful for repeti-
a file. At the time of opening a file it must be specified whether
tive sequences of program assemblies and for background jobs
the file is to be read from or written onto. Whether any given
which are run in the absence of the user. Output which normally
device associated with the file is character-oriented or word-
goes to the teletype is similarly diverted to user files. Another
oriented is unimportant; the system takes care of all necessary
application of the uniformity of the file system is demonstrated
character-to-word assembly or word-to-character disassembly.
in some of the subsystems, notably the assembler and the various
There are actually three separate, full-duplex physical inter-
compilers. The subsystem may request the user to specify where
faces to devices in the sequential file mechanism. Generally, these
he wishes the program listing to be placed. The user may choose
interfaces are invisible to programs. They exist, of course, for
anything from paper tape to drum to his own teletype. In the
reasons of system efficiency and also, because of the way in which
absence of file uniformity each subsystem would require a separate
some devices are used. The interfaces are:
block of code for each possibility. In fact, however, the same
input-output instructions are used for all cases. Character-by-character (basically for low-speed, character-
The input-output instructions communicate with jiles. The
oriented devices used for man-machine interaction)
system in turn associates files with the various physical devices.
Programs, for the most part, do not have to account for the pecu- Buffered block 1/0 (for medium-speed 1/0 applications)
liarities of the various actual devices. Since devices differ widely Block 1/0 directly from user core (for high-speed situations)
298 Part 3 I The instruction-set processor level: variations in the processor Section 6 I Processors with multiprogramming ability
It should be pointed out that there is no particular relation be- shows the components of the character-by-character interface;
tween these interfaces and the three instructions CIO, WIO, and responsibility for its operation is split between the interrupt called
BIO. The interface used in a given situation is a function of the when the device signals for attention and the routine which proc-
device involved and, sometimes, of the volume of data to be trans- esses the user’s 1/0 request.
mitted, not of the instruction. The advantage of the full-duplex, character-by-character mode
Any interface may be driven by any instruction. of operation is considerable. The character-by-character capability
Of the three subinterfaces under discussion, the last two are means that the user can interact with his program in the smallest
straightforward. The character-by-character interface is, however, possible unit-the character. Furthermore, the full-duplex capa-
somewhat different and deserves some elaboration. Devices associ- bility permits, among other things (1) the program to substitute
ated with this interface are generally (but not necessarily) used characters on strings of characters as echoes for those received,
for man-machine interaction. Consider the case of a person com- (2) the keyboard and display to be used simultaneously (as, for
municating with a program by means of a keyboard-display (or example, permitting a character typed on a keyboard to pre-empt
a teletype). He types on the keyboard and the information is the operation of a process. In the case of typing information in
transmitted to the computer. The program may wish to make an during the output of information, a simple algorithm prevents the
immediate response on the display screen. In many cases this random admixture of characters which might otherwise result),
response will consist of an echo of the same character, so that the and (3) the ready detection of transmission errors.
user has the feeling of typing directly onto the screen (or onto Instructions are included to enable the state of both input and
the teleprinter). output buffers to be sensed and perhaps cleared (discarding un-
So that input-output can be carried out when the program is wanted output or input). Of course, it is possible for a program
not actually in main memory, the character-by-character input to use any number of authorized physical devices; in particular,
interface permits programs a choice of a number of echo tables; this includes those devices used as remote consoles. A mechanism
it further permits programs a choice of grade of service by per- is provided to permit output which is directed to a given device
mitting them to specify whether a given character is an attention to be copied on all other devices which are output linked to it
(or break) character. Thus, for example, the program may specify (and similarly for input). This is useful when communication
that each character typed is to be echoed immediately and that among users is desired and in numerous other situations.
all control characters are to result in activation of the program The sequential file has a structure somewhat similar to that
regardless of the number of characters in the input buffer. Alter- of an ordinary magtape file. It consists of a sequence of logical
natively, the program may specify that no characters are echoed records of arbitrary length and number. On some devices, such
and every character is a break character. By changing the specifi- as a card reader or the teletype, a file may have only one logical
cation the program can obtain an appropriate (and varying) grade record. The full generality is available for drum files, which are
of service without putting undue load on the system. Figure 6 the ones most commonly used. The logical record is to be con-
trasted with the variable length physical record of magtape or the
fixed length record of a card. Instructions are provided to insert
or delete logical records and increase or decrease them in length.
Other instructions permit the file to be “positioned” almost in-
stantaneously to a specified logical record. This gives the sequen-
Output interrupt tial file greater flexibility than one which is completely unaddressa-
routine ble. This flexibility is only possible, of course, because the file is
on a random-access device and the sequential structure is main-
tained by pointers. The implementation is discussed in the follow-
ing.
When reading a sequential file, CIO and WIO return certain
unusual data configurations when they encounter an end of record
or end of file, and BIO terminates transmission on either of the
conditions and returns the address of the last word transmitted.
In addition, certain flag bits are set by the unusual conditions,
Fig. 6. The character-oriented interface. and an interrupt may be caused if it has been armed.
Chapter 24 I A user machine in a time-sharing system 299
in order to access information in the He it is necessary only to Fig. 8. Bit table for allocation of space on the drum.
know the location of the first index block. It may be worthwhile
to point out that all users share the same drum. Since the system
2 No drum operations are required when a new block is
has complete control over the allocation of space on the drum,
needed or an old one is to be released.
there is no possibility of undesired interaction among users.
Available space for new data blocks or index blocks is kept track It may be preferable to assign the new block so that it becomes
of by a bit table, illustrated in Fig. 8. In the figure, each column accessible immediately after the block last assigned for the file.
represents one of the 72 physical bands on the drum allocated for This scheme will speed up subsequent reading of the file.
the storage of file information. Each row represents one of the
64256-word sectors around a band. Each bit in the table thus Random $la
represents one of the 4608 data blocks available. The bits are set Auxiliary storage files can also be treated as extensions of core
when a block is in use and cleared when the block becomes avail- memory rather than as sequential devices. Such files are called
able. Thus, if a new data block is required, the system has only random fiZes. A random file differs from a sequential file in that
to read the physical position of the drum, use this position to index there is no logical record structure to the file and that information
in the table, and search a row for the appearance of a 0. The is extracted from or written into the random file by addressing
column in which a 0 is found indicates the physical track on which a specific word or block of words. It may be opened like a sequen-
a block is available. Because of the way the row was chosen, this tial file; the only difference is that it need not be specified as an
block is immediately accessible. This scheme has two advantages output or an input file.
over its alternative, which is to chain unused blocks together: Four instructions are used to input and output words and blocks
1 It is easy to find a block in an optimum position, using the of words on a random file. To permit the random file to look even
algorithm just described. more like core memory, an instruction enables one of the currently
open random files to be specified as the secondury memory file.
Two instructions, LAS (load A from secondary memory) and SAS
(store A in secondary memory), act like ordinary load and store
instructions with one level of indirect addressing (see Fig. 9) ex-
cept, of course, that the data are in a random file instead of in
core memory.
Random files are implemented like sequential files except that
EOR/ E O F end of record indicators are not meaningful. Although as many
index blocks are used up as required by the size of a random file,
only those data blocks which actually contain information will be
attached to a random file. As new locations are accessed, new data
blocks are attached.
Subroutine$lea
Whereas it makes little sense to associate, say, a card reader with
Fig. 7. Index blocks and pointers to data blocks. a random file, a sequential file can be associated with any physi-
300 Part 3 1 The instruction-set processor level: variations in the processor Section 6 1 Processors with multiprogramming ability
lar file; hence, the requirement that the user, in opening a sub-
Main memory Secondary memory routine file, must specify whether the file is to be character or
word oriented. The system will thereafter do all the necessary
STAx ADDR ADDR packing and unpacking.
- ~-
(a) Subroutine files are the logical end-product of a desire to de-
couple a program from its environment. Since they can do arbi-
trary computations, they can provide buffers of any desired com-
plexity between the assumptions a program has made about its
Instruction
environment and the true state of things. In fact, they make it
logically unnecessary to provide an identical interface for all the
16345
input-output devices attached to the system; if uniformity did not
16345 1234567
exist, it could be simulated with the appropriate subroutine files.
E f f e c t (234567-A
Considerations of convenience and efficiency, of course, militate
(b)
against such an arrangement, but it suggests the power inherent
in the subroutine file machinery.
Fig. 9. Load and store form main and secondary memory. ( a ) Instruc-
tions. (b) Addressing.
Summary
cal device in the system. In addition, a sequential file may be The user machine described was designed to be a flexible founda-
associated with a subroutine. Such a file is called a subroutine $le, tion for development and experimentation in man-machine sys-
and the subroutine may thus be thought of as a “nonphysical” tems. The user has been given the capability to establish configura-
device. The subroutine file is defined by the address of a subroutine tions of multiple processes, and the processes have the ability to
together with information indicating whether it is an input or an communicate conveniently with each other, with central files, and
output file and whether it is word or character oriented. An input with peripheral devices. A given user may, of course, wish only
operation from a subroutine file causes the subroutine to be called. to use a subsystem of the general system (e.g., a compiler or a
When it returns, the contents of the A register is taken to be the debugging routine) for his particular job. In the course of using
input requested. Correspondingly, an output operation causes the the subsystem, however, he may become dissatisfied with it and
subroutine to be called with the word or character being output wish to revise or even rewrite the subsystem. The features of the
in A. The subroutine is completely unrestricted in the kinds of user machine not only permit this activity but make it easier.
processing it can do. It may do further input or output and any
amount of computation. It may even call itself if it preserves the
old return address. References
Recall that for sequential files the system transforms all infor- BrigHM; ComfW65; ConwM63; CorbF65; DaleR65; DennJ66; ForgJ65;
mation supplied by the user to the format required by the particu- LampB65; LichW65; McCaJ63; McCuJ65; SaltJ66; SchwJ64
The instruction-set processor level:
special-function processors
This part contains descriptions of processors that do not interpret general pro-
gramming languages; that is, they are not Pc’s. They are all p’s, however, since
they have an interpreter that determines not only the operations t o be taken, given
the current instruction, but the next instruction t o be obtained.
A Pi0 (Sec. 1) is a processor that controls T and Ms components. It manages
block or vector transmission between Ms or T and Mp.
A P.array (Sec. 2) processes both vectors and two-dimensional matrices. By
recognizing these data as fundamental units, programs (or algorithms) can be
expressed efficiently in terms of primitive operators. The chief advantage of these
P’s is their ability t o take advantage of the data structure for parallel interpretation,
thereby increasing processing speed.
A microprogram processor (Sec. 3) is designed t o interpret and process a data-
type which is a program. In effect, this processor is a computer within another
computer, programmed t o act as an interpreter.
A language processor (Sec. 4) interprets a data-type derived from the primitives
of a programming language. In contrast, a conventional processor interprets a
language based on fundamental hardware implementation primitives. The difference
is clearly apparent as increased complexity of the language processors.
301
Section 1
303
304 Part 4 I The instruction-set processor level: special-function processors Section 1 I Processors to control terminals and secondary memories
Another approach t o the design of a P.display is based on an interrupt system, and other tasks beyond P.display’s capa-
a P.microprogram which is shared among many T.displays bility.
[Rose, 19671. Yet another alternative, which has not yet been A clock should be built into the 338. The brightness or in-
tried, is to incorporate a Pi0 (P.display) as a special mode in tensity of a picture is determined both electronically (see the
a conventional Pc. Thus the P would interpret either conven- mode instructions for controlling intensity) and by the rate at
tional Pc instructions or P.display instructions. which the pictures are repeated. A clock would allow the time
P.display is the interpreter for the output of pictures or when pictures are started or drawn to be specified; thus the
graphics. The 338 utilizes data space efficiently simply because intensity would be independent of picture length.
the data are long variable-length strings (word vectors). The The 338 requires more hardware than a simpler Pc. However,
instruction requires almost no space to specify the data opera- a large amount of this hardware is used to control the genera-
tions and addresses; data are interpreted directly or immedi- tion of characters and lines. The lines (vectors) are drawn
ately in the instruction rather than via instruction addresses. using a DDS (Digital Differential Analyzer) technique. Perhaps
Another feature which allows a program to be efficiently one-half of the registers could be eliminated if the 338 were
encoded is the stack mechanism for storing subroutine link- not a P. A simpler alternative was constructed about a similar
ages. Subroutines in P.displays are actually programs which computer, the PDP-9, by Bell Telephone Laboratories and DEC,
form part of a more complete picture. Subroutines are actually using the approach of making the display only a K.
subpictures. Although the stack mechanism allows for recursive A more elaborate Pc interrupt system with reduced overhead
picture calls, the stack is used principally to save space and time would enable Pc t o take on the specialized program control
t o allow multiple T.displays t o use common picture programs. functions in the 338. Such a scheme might pass the program
A problem in the 338 which is common to all multi-P struc- or instruction counter parameter directly from P.display t o Pc.
tures is intercommunication among the P’s. Pc is the control- In this way, Pc or P.display would alternatively process part of
ling P, as is the case with most Pc-Pi0 structures. The P(’338) a single instruction stream, depending on the task.
has no trap to itself but relies on an interrupt signal to Pc. The Despite the problems of this early P.display, it has a sophis-
Pc processes both tasks which P.display might process, given tication which successors appear to be following.
Chapter 25
The DEC 338 display computer
Introduction
The C(disp1ay; ‘DEC 338) is a C(’DEC PDP-8) with a P.display independently. A photomultiplier connected through a fiber-optic
which can connect to T( #1:8; CRT; display; area: 9.375 x 9.375 bundle link is used as a light pen (a photosensitive sensor) to detect
in.2). The PMS structure is shown in Fig. 1, Chap. 5, describing spots on the T. The light pen allows the P.display to detect
the PDP-8. The Pc ISP is given in Appendix 1 of Chap. 5. whether a user has “pointed to” a displayed spot.
The C(338), although designed to stand alone, is generally used Pc and P.display access the same Mp; the total data rate avail-
as a satellite to a larger C, via an L(Dataphone). The rationale able from Mp is one 12-bit word/l.5 microseconds. The instruction
for using a C as a T is based on the bandwidth and storage require- times of P.display are a function of the point plotting times of
ments needed to maintain graphical picture displays. A human the T(CRT):0.3 microsecond to the next incremental unintensified
being manipulating pictures (rotation, scale change, and conver- point (approximately 0.010 inch away); 1.2 microseconds to an
sion of internal linked data structure to a picture structure) re- incremental intensified point; and 35 microseconds to a point
quires short response time; this requirement places high processing plotted at a random position.
demands on larger C’s. Thus this C(disp1ay) is a preprocessor for The state (registers) of C.display is given in the ISP description
larger, more general C’s. of Appendix 1 of this chapter. There are four parts of the state:
The actual T(CRT) is a 16-inch CRT with a 93/,-inch square the control registers for Program Flow State, the Picture State
viewing area covered by 1,024 x 1,024 (XY) points. The diameter (or position of beam), Console and Light-pen State, and Mp State.
of the points is -0.015 inch. The spot is magnetically deflected The instruction interpreter is fairly simple and is best described
and focused. All eight T(CRT)’s can be driven together or used by the state diagram (Fig. 1).The instructions are given in Tables
1 and 2. The remainder of the chapter discusses the P.display
instructions and the Pc instructions for communicating with P.dis-
Play*
Principle of operation
The actual picture is held stationary by repeatedly displaying
(intensifying) a particular point, line, etc. The number of times
a figure has to be displayed so that it appears stationary and does
not flicker depends on the CRT phosphor, the figure, and environ-
--
mental parameters. The generally accepted range is a plotting rate
of 20 50 plots/second; thus a complete picture has to be drawn
305
306 Part 4 I The instruction-set processor level: special-function processors Section 1 I Processors t o control terminals and secondary memories
Instruction Op Code
Bits 0:2 3 4 5 6 7 8 9 10 11
1
Parameters sett Scale Scale (0:l) set It pen It pen set 1 Intensity ( 0 : 2 )
I
1
Intensity
1 Scale 1 1 1 push
1
set Scale (0:1) set It pen It pen Memory field ( 0 : 2 )
set Scale ~ Scale (0:1) set It pen ~ It pen inhs inh Scale,
~
Data-Mode It pen
Set slaves 1 Group number (0:l) set unit 0 It pen Intensity set unit 1 1 It pen 1 Intensity
Spare
Instructions and their interpretation in P(display) more like data) are interpreted for the mode. When all the data-
Two instruction-set types are interpreted in the P.display: Data mode instructions have been interpreted, an escape instruction
State, in which instructions specify display information; and Con- returns the P.display to control state. A control instruction is issued
trol State, in which instructions specify program control informa- to select a mode and simultaneously place the display in data state.
tion (e.g., jumps, modes, etc.). A state diagram for the interpre-
tation process is given in Fig. 1. Increment mode. This mode is used to draw curves and alpha-
numeric characters and other small symbols. Two instructions are
Data-state instructions
stored per word. An instruction will cause the beam position to
There are seven instructions (which DEC calls modes) that can be moved one, two, or three times, in 0.010-inch increments, in
be executed while P.display is in data state. The instructions one of eight directions. Direction 0 is to the right, direction 1 is
(modes)are really substates of data state. The instructions (actually up and to the right, etc.
1 cmoo;v 1
2 of 2 escc inh X coordinate
-
vector 1 150
- 1,200
1 of 2
2 of 2
int
esc 1 1
f
-C
Delta Y
Delta X
vector
continue
short
1
1.8 - 24
1 of 2
2 of 2
1
int
esc
int
11 z I1
-t Delta Y I I
2 esc Delta X
vector
graph
plot
6 - 35 1 esc X/vf Y or X coordinate
spare
Vector mode. The vector mode is used to draw straight-line seg- A word in the dispatch table has the following format:
ments. This two-word instruction causes the beam position to be
Bit 0: If bit 0 is a 1,bits I to 11 are used to perform a control
moved along a line represented by an 11-bit delta y and an 11-bit
function as specified by particular control instructions.
delta x. If bit 0 is a 0, bits 2 to 11 are combined with SAR to
specify the address at which the character definition
Vector continue mode. This mode is used to draw a straight line program starts. (The address bit 2 is common to both
to the edge of the screen. It is similar to vector mode but causes the SAR and bit 2 of the dispatch word and so may
the line to be extended until an “edge” is encountered. be specified in either place or in both places.)
Bit 1: Determines the mode in which the character is to be
Short vector mode, The short vector mode is used to draw figures displayed. If bit 1 is a 0, the increment mode is used
composed of short line segments. A one-word instruction specifies to plot the character used; if bit 1 is a 1, the short
a 5-bit delta y and a 5-bit delta x quantity. It is transformed within vector mode is used to plot the character.
the display to the same format as vector mode and operates in
the same manner. Control-state instructions
The preceding modes move the beam by counting the X and There are six control-state instructions.
Y position registers. The counting is done at 1.2 microseconds per
step on an intensified move and at 0.30 microsecond per step on Parameter. Parameter is used to set values in scale, light-pen, and
a nonintensified move. intensity registers.
Point mode. Point mode is used for random point plotting. A Mode. Mode is used to set up the data-state mode (or data-mode
two-word instruction specifies new Y and/or X coordinates to be instruction). Mode also is used to stop the display.
placed into the Y and X position registers.
Conditional skip. The skip instruction tests the state of the
Graph-plot mode. This is used to draw curves of mathematical P.display and the pushbuttons.
functions. A one-word instruction has data for the Y or X position
register; at the same time, X or Y, respectively, is incremented Miscellaneous. These instructions include both tests and additional
by a count of one, two, four, or eight, depending on the scale parameter control.
factor.
Point and graph-plot modes operate at a rate depending upon Display jump and push-jump subroutine instructions. The display
the position of the new point with respect to the previous point. jump instruction has 15 address bits, so that a jump may be
If a point is only one-eighth of the screen away, the delay for executed to any location in the display file within the 32-kw
beam-settling time is 6 microseconds; otherwise the settling time memory.
is 35 microseconds. The display subroutine instructions are push-jump (an extension
of the jump instruction) and pop, the return from subroutine. The
Character generation option instructions. The alphanumeric char- push-jump works as follows: The current state of the display (Light
acters or special symbols which make up a character set are stored Pen Enable, Data Mode, Scale, and Intensity) is stored, along with
in Mp in increment mode or short vector mode. These characters the return address, in two successive locations in the first 4,096
can be arbitrarily defined. A &bit (or 7-bit) character code in the words of memory. The locations are determined by the pushdown
instruction is used to locate a word in a table in Mp called the pointer, PDP. This pointer is initially set by a Pc instruction. The
dispatch table. The base address of the table is specified by the normal jump is then executed.
Starting Address Register/SAR(0:5). To return from a subroutine, the pop instruction is executed.
SAR may be loaded by instructions from the Pc. The SAR It has no address bits. Its function is to return the display to a
represents the most significant 6 bits of a 15-bit memory address. previous state by sending the last words on the push-down stack
The character code represents the least significant 6 (or 7 ) bits. back to the display.
A seventh SAR bit, corresponding to the octal position 100, is used The stack approach to subroutining as implemented on the 338
with &bit characters as a case bit (Le., uppercase or lowercase has certain advantages over the jump to subroutine instruction
characters) and may be set or cleared with a control character. normally used in Pc’s:
Chapter 25 1 The DEC 338 display computer 309
1 Memory space is conserved since return address locations Set Display Address Counter from AC
are not required in each subroutine in memory. Set Push Button contents from AC
2 A subroutine can be called any number of times before Set miscellaneous flag and status bits from AC
return to the main routine. Set character generator SAR address
3 Since the state of the display is saved on the stack and
subsequently restored, subroutines are truly transparent; P.display status to Pc instructions
that is, after the return they leave the state of the display Read Push Down Pointer into AC
program the same as before the subroutine call. Read X register into AC
Read Y register into AC
4 The subroutines can either retain the same state or change
Read Display Address Counter into AC
the state of the display by using one or more of the “inhibit
restore” bits available in the pop instruction. The program- Read Status words 1, 2, 3, 4, 5 into AC (60 miscellaneous
mer can elect independently to inhibit restoration of mode, bits of flags, modes, etc.)
light pen, and scale, or intensity information.
Picture debugging modes. These modes aid programmed and pic-
Instructions in Pc for communicating with P(display) ture debugging. A bit can be set to override the nonintensify bit
Instructions in Pc communicate with P.display. The physical con- in data-mode instructions. When this bit is a 1, all points and
nection is by the S(’I/O Bus). The in-out transfer instructions in vectors are plotted, whether they are to be intensified or not. The
Pc are used to initialize and read the state of P.display. search enable instruction forces the display to run until a particu-
lar instruction type is found. The instruction type is specified by
P.display state initialization from Pc instructions the search enable instruction.
Set Push Down Pointer from AC
310 Part 4 I The instruction-set processor level: special-function Drocessors Section 1 I Processors to control terminals and secondary memories
Appendix 1
P.display S t a t e
Program Blow S t a t e
D A C B : 14> Display Address Counter; holds memory address o f display
instruction
P D P 6 l : I 1> Push Down Pointer t o stack holding subroutine r e t u r n addresses
I n t e rna 1 ,Stop c7enotes h a l t by a P.display i n s t r u c t i o n
E x t e r n a l Jtop denotes a request by Pc f o r P.display t o h a l t
D a t a s t a t e and ControlJtate are two mutually exclusive s t a t e s . D a t a s t a t e i n s t r u c t i o n s are interpreted by P.dispZay as p o i n t s ,
l i n e s , and characters t o be displayed on T . !There are 7 modes f o r specifuing the data t y p e s . The DataYFnode r e g i s t e r holds the
data type being i n t e r p r e t e d . ControlJtate i n s t r u c t i o n s include j w v t o subroutines using the stack, controlling P . d i s p l q s t a t e
r e g i s t e r s and switching t o a s p e c i f i c data mode,
Data-State
C o n t r o l ,State := 7 Data-State
DataJode/DM4:2> s p e c i f i e s i n t e r p r e t a t i o n of DataJtate i n s t r u c t i o n s
SAR<D :5> Starting Address Register: base r e g i s t e r of a dispatch t a b l e f o r
c a t t i n g character display subroutines
Picture Btate
x4:12> beam position; onZy integers i n range o s X I Ys
z ~ ~ + ~ ~ ~ ~
Y4:12> are plotted
Vertical&dgeflag/Vef denotes i f beam i s w i t h i n a displayable area
Hor izontal,edge,il ag/Hef s e t when beam moues outside the display area
Edge- I n t e r r u p t / E I
CHSZ Character S i z e , 0 i n d i c a t e s 6 b i t character s e t 1 i n d i c a t e s 7
b i t character s e t
used t o s e t increrfent s i z e f o r Dataaode i n s t r u c t i o n s , incre-
ments are x zSCa e
lntensityd):2> brightness o f displaued p o i n t s
Xdirnens i o n a l : 12 maximum dimension 0.f p l o t t i n g area, 9.375, 2 8 . 7 5 , 37.5, 75.0 i n
Y,dirnension<D:l>
on, t o displav a point or l i n e ; automatically turned off a t
Beam i n s t r u c t i o n comvletion
Mp S t a t e
M (0 :7] [0:4095] 4:
1 I> ppimar,u memory f o r P.display and Pc
I n s t r u c t i o n Format
instruction/i<O:lD The individual i n s t r u c t i o n s , f i e l d s are defined below. Each
i n s t r u c t i o n type has i t s own b i t f i e l d assignments.
en teru da taus t a t e := i q l l > common b i t s f o r s e v e m l i n s t r u c t i o n s
pbdense := iCD push button control bits
Chapter 25 I The DEC 338 display computer 311
pb,cleay := i<b
pbdomp 1ement i= i6>
pbdelectd):5> := i 4 : 1 1 >
scale&hange/sc := io> scale ( s i z e ) control b i t s
s ca 1 eUva 1 ue /sv<O : 1> := i<4:5>
1 ight&en&hange/lpc := id> l i g h t pen t e s t control b i t s
1 ightYpenYbit/lpb := iQ>
I n s t r u c t i o n I n t e r p r e t a t i o n Process
(7I n t e r n a l J t o p V 7 Externa1,Stop) + fetch
(instruction[O:I] c M[DAC:DAC+l] ; DAC c DAC + 1; n e x t
(ControLState A (instructioKD:l> = 2)) + (DAC c DAC + 1); 2 w instruction
(DatbState A ((Data Mode = 0) V (OatkMode = 2) V 2 w data
(Data Mode = 3))) + (OAC + DAC + I):
n e x t ~ n s t r u cion,execut
t ion) execute
I
PB,l<D;lI> ;=I i<o:11> grouv 1 push button t e s t and s e t i n s t r u c t i o n .format .for
Push Buttons 0 t o 5
PR,ldpcode := ( P E , l d : 2 > = 100)
grouv 2 (not d e f i n e d ) i s f o r W s h Buttons 6 t o 11
+(
PB,l&pcode h Contro1,State
p b d e n s e &? (pb,select<0:5>
DAC + OAC + 2 ) ;
= (PB<O:5' A pb,select<0:5>)) -
PBJ i n s t r u c t i o n execution
( S k i p test
I
Chapter 25 I The DEC 338 display computer 313
plot,increment,vector := (
icle + (move-lgosition; Contro1,State -1); move 1 and escape
i c l + (move,l,position); move 1
i c 2 + (move,l&osition; n e x t move,l+osition) move 2
ic3 + (move-lgosition; n e x t move,lGosition; next move 3
move,l ,pos it ion)
Move,l,position := ( sub process f o r moving beam
( i d = 0) + ( X t X + Scale); 1 o f 8 positions
) end I n s t r u c t i o n ~ e x e c u t i o n
I
Section 2
315
Chapter 26
Joseph E . Wirsching
Since the advent of the internally-stored program computer, those In general, the four or more instructions must be brought from
of us concerned with problems involving massive amounts of com- the memory to the instruction register once for each pair in the
putation have taken a one-operation, one-operand approach. But lists. This seems to be a great waste when only one arithmetic
there is a very large class of problems involving massive amounts operation is involved. Indeed it is, when one considers that the
of computation that may be thought of as one-operation, many- majority of computing work consists of the performance of highly
operand in nature. Some familiar examples are numerical integra- repetitive operations that are merely combinations of the simple
tion, matrix operations, and payroll computation. example given. Attempts have been made to alleviate this waste
This article proposes a computer, called NOVA, designed to by incorporating “instruction stacks” and “repeat” commands into
take advantage of the one-operation, many-operand concept. the instruction execution units of more recent computers.
NOVA would use rotating memory instead of high-cost random
access memory, reduce the number of program steps, and reduce
Example 2. Consider three lists (a’s, b’s and c’s), where we wish
the number of memory accesses to program steps. In addition it
to compute ( a + b) x c for each trio. There are two distinct
is shown that NOVA could execute typical problems of the one-
methods by which this can be accomplished: first, by forming
operation, many-operand type in times comparable to that of
( a + b) x c for each trio of numbers in the list, or second, by
modern high-speed random access computers.
forming a new list consisting of ( a + b) for each a and b, and then
Rotating memories were used in early computers because of
multiplying each c by the corresponding member of the new list.
low cost, reliability, and ease of fabrication. These machines have
Clearly the second method is wasteful of memory space and
been replaced by machines with more costly random access
wasteful of programming steps.
memories primarily to increase computing speed as the result of
Next, let us take a look at the memory requirements for these
a decrease in access time to both operands and instructions.
two examples. First, the instructions are kept in a high-speed
random access memory, and while the bulk of the variables need
The NOVA approach not be kept in a random access memory, they must be brought
to one before the algorithm can be performed. This extra transfer
Let us take two simple examples and use them to compare con-
may entail more instructions to perform the logistics. Thus the
ventional computing techniques with those proposed for NOVA.
simplicity of the overall program is directly related to the size
of the memory. The variables (a’s, b’s, etc.) are usually stored in
Example 1 . Consider two lists (a’s and b’s) of which the corre-
consecutive memory locations. Except for indexing this ordering
sponding pairs are to be added. With a conventional computer
of the data is not exploited.
this is done with a program that adds the first a to the first b,
In NOVA, lists of variables are kept on tracks of a rotating bulk
the second a to the second b, etc., and counts the operations. The
memory. When called for, the lists of variables are streamed
working part of such a program might consist of the following
through an arithmetic unit and the results immediately replaced
instructions:
on another track for future use. This process takes maximum ad-
Fetch a vantage of the sequential ordering of the variables. Instructions
Add b need only be brought to the instruction execution unit once for
Store ( a + h) each pair of lists rather than once for each operand; thus the
Count, Branch, and Index instructions need not be stored in a random access memory but
may also be stored on the rotating bulk memory. This departure
lDatarnation, vol. 12, no. 12, pp. 41-43, December, 1966. from the requirement for random access memory significantly
316
Chapter 26 I NOVA a list-oriented computer 317
u0,o v0,o - -
Solution of a network problem -
U0,l V0,l v0,o
Before going further into the structure of NOVA, let us consider u0,2 V0,z V0,l v0,o
a significant example, which shows that NOVA is well suited to . . v0,z V0,l
fit into the fast memory, three adjacent columns (or rows) are VJ,K
brought to the fast memory; as a new column is calculated, the
next column in sequence is brought in from bulk memory and the Fig. 2. Lists of variables.
oldest of the three is written to bulk memory. In this fashion one
proceeds across the array. This process is then repeated until some lists of variables rather than single variables, performing a single
significant physical occurrence happens and the problem is ended. operation for all mesh points in the array in sequence.
In NOVA, the variables are organized into separate lists rather Let us look more closely at the variables and their possible
than by mesh point. From a computational standpoint this is combinations. Let Ui,k and Vj,k be variables associated with the
possible since the main memory of NOVA may be essentially array of Fig. 1. These variables are listed sequentially by column
unlimited in size, at least exceeding the size of the largest present in Fig. 2, along with further lists of the Vcolumn shifted by various
network problems. One then proceeds to execute operations on increments.
With some concentration, one discovers in Fig. 2 that an arith-
metic operation between Uj,kand Vi,* is simply a matter of taking
the two columns as they exist and operating on them in pairs. To
combine Uj,k with a nearby neighbor, Vj,k--l, the V column is
shifted down one place, at which time the proper neighboring
variables are found opposite one another for the entire network.
At certain boundaries of the array some elements have no proper
neighbors. In NOVA these boundary elements must be handled
separately in the same way as they must be handled separately
in a conventional machine. In NOVA, calculations at boundaries
may be temporarily inhibited by having a third input to the arith-
metic unit which allows the calculation of a result for a pair of
operands to proceed or not, as appropriate. This third input is
defined as “conditions,” and is brought as a bit string to the arith-
1 2 3 . 1 . . J
metic unit concurrently with the operands. This bit string may
contain any number from one to several bits for each pair of
Fig. 1. Two-dimensional array. operands.
318 Pari 4 I The instruction-set processor level: special-function processors Section 2 1 Processors for array data
MEMORY
&,
F m
02
\
,
CONTROL
Structure
CO N DlTlONS
The most difficult problem to be solved in the proposed computer
is to synchronize movement of the columns of data that require CONDITIONS
offset. Buffers of various types could be used to solve this problem; TO
MEMORY
they could range all the way from rotating memory devices or
delay lines to core memories. The former are simple, direct, and
low in cost but are limited in their general capabilities. On the Fig. 4. Buffering in arithmetic unit.
other hand, a number of small random access buffer memories
could be used for offsetting lists of variables and for facilitating composed of several hundred tracks, each storing several thousand
special functions such as boundary calculations but at a higher words, with a total capacity between one and two million words.
equipment cost. Each track would have an individual read-write head. The heads
Figure 3 shows a block diagram of the organization of NOVA. would be organized in such a way as to attain a high word-transfer
The rotating memory, which might be a disc or drum, would be rate, perhaps as high as one million words per second. With this
in mind an ideal execution time for one addition would be the
time required to move two operands from the disc to the arith-
metic unit; i.e., 1-2 microseconds. The disc synchronizer would
be capable of simultaneously reading two lists of operands, writing
one list of results, and reading one list and writing one list of
conditional control information. In addition, instructions would
be read from another channel in small blocks.
The bit string of conditions coming from the memory is used
to control individual operations on pairs of operands in the lists,
and in essence each bit (or bits) is a subordinate part of the indi-
vidual operations. Conditions going to the memory are the sub-
sidiary result of the operation of one list upon another. These bit
strings may be used later as control during another list operation.
They want also to contain information on the occurrence of an
overflow or underflow, or on the presence of an illegal operand,
etc.
Figure 4 shows a suggested organization for the arithmetic unit
that incorporates five sets of alternating buffers. Two sets are for
lists of operands coming from the memory, one set for lists of
results going to the memory, and two sets for “conditions” (condi-
Fig. 3. Block diagram of NOVA computer. tional control information) coming from and going to the memory.
Chapter 26 I NOVA a list-oriented computer 319
Summary The structure of ILLIAC IV, a parallel-array computer con- control unit so that a single instruction stream sequenced
taining 256 processing elements, is described. Special features include the processing of many data streams.
multiarray processing, multiprecision arithmetic, and fast data-routing
Memory addresses and data common to all of the data
interconnections. Individual processing elements execute 4 x lo6 instruc-
processing were broadcast from the central control.
tions per second to yield an effective rate of lo9 operations per second.
Index terms Array, computer structure, look-ahead, machine lam Some amount of local control at the individual processing
page, parallel processing, speed, thin-film memory. element level was obtained by permitting each element to
enable or disable the execution of the common instructions
according to local tests.
Introduction
Processing elements in the array had nearest-neighbor con-
The study of a number of well-formulated but computationally nections to provide moderate coupling for data exchange.
massive problems is limited by the computing power of currently
available or proposed computers. Some involve manipulations of Studies with the original SOLOMON computer indicated that
very large matrices (e.g., linear programming); others, the solution such a parallel approach was both feasible and applicable to a
of sets of partial differential equations over sizable grids (e.g., variety of important computational areas. The advent of LSI cir-
weather models); and others require extremely fast data correlation cuitry, or at least medium-scale versions, with gate times of the
techniques (phased array signal processing). Substantive progress order of 2 to 5 ns, suggested that a SOLOMON-type array of
in these areas requires computing speeds several orders of magn- potentially lo9 word operations per second could be realized. In
tude greater than conventional computers. addition, memory technology had advanced sufficiently to indicate
At the same time, signal propagation speeds represent a serious that lo6 words of memory with 200 to 500-11s cycle times could
barrier to increasing the speed of strictly sequential computers. be produced at acceptable cost. The ILLIAC IV Phase I design
Thus, in recent years a variety of techniques have been introduced study during the latter part of 1966 resulted in the design discussed
to overlap the functions required in sequential processing, e.g., in this paper. The machine, to be fabricated by the Defense Space
multiphased memories, program look-ahead, and pipeline arith- and Special Systems Division of Burroughs Corporation, Paoli, Pa.,
metic units. Incremental speed gains have been achieved but at is scheduled for installation in early 1970.
considerable cost in hardware and complexity with accompanying
problems in machine checkout and reliability.
The use of explicit parallelism of operation rather than over- Summary of the ILLIAC IV
lapping of subfunctions offers the possibility of speeds which in- The ILLIAC IV main structure consists of 256 processing elements
crease linearly with the number of gates, and consequently has arranged in four reconfigurable SOLOMON-type arrays of 64
been explored in several designs [Slotnick et al., 1962; Unger, 1958; processors each. The individual processors have a 240-ns ADD
Holland, 1959; Murtha, 19661. The SOLOMON computer [Slotnick time and a 400-11s MULTIPLY time for 64-bit operands. Each
et al., 19621, which introduced a large degree of overt parallelism processor requires approximately lo4 ECL gates and is provided
into its structure, had four principal features. with 2048 words of 240-ns cycle time thin-film memory.
1 A large array of arithmetic units was controlled by a single Instruction and addressing control
The ILLIAC IV array possesses a common control unit which
' I E E E Trans., C-17, vol. 8, pp. 746-757, August, 1968. decodes the instructions and generates control signals for all
320
Chapter 27 I The ILLIAC IV computer 321
processing elements in the array. This eliminates the cost and storage of program constants, and (2) it permits overlap of common
complexity for decoding and timing circuits in each element. operand fetches with other operations.
In addition, an index register and address adder are provided
with each processing element, so that the final operand address Processor partitioning
a, for element i is determined as follows: Many computations do not require the full 64-bit precision of the
a, = a + ( b ) + (c,) processors. To make more efficient use of the hardware and speed
up computations, each processor may be partitioned into either
where a is the base address specified in the instruction, (b) is the two 32-bit or eight 8-bit subprocessors, to yield 51232-bit or
contents of a central index register in the control unit, and (ci) 2048 %bit subprocessors for the entire ILLIAC IV set.
is the contents of the local index register of the processing ele- The subprocessors are not completely independent in that they
ment i. This independence in operand addressing is very effective share a common index register and the 64-bit data routing paths.
for handling rows and columns of matrices and other multidimen- The 32-bit subprocessors have separate enabled/disabled modes
sional data structures [Kuck, 19681. for indexing and data routing; the 8-bit subprocessors do not.
Multiarra y configurations
To permit more optimal matching of array size to problem struc-
ture, the four arrays may be united in three different configura-
tions, as shown in Fig. 3. To enlarge the arrays, the end connections
of the PE strings are decoupled and attached to the ends of the
*------ other arrays to form strings of 128 or 256 processors. For multiarray
REAL TIME LINK
configurations all CUs receive the same instruction string and any
data centrally accessed. The control units execute the instructions
independently, however, with inter-CU synchronization occurring
only on those instructions in which data or control information
PARALLEL
ACCESS
DISK
- GENERAL
PURPOSE
COMPUTER
0-6500
must cross array boundaries. This simplifies and speeds up the in-
struction execution in multiarray configurations. The multiplicity
of array configurations introduces complexities in memory ad-
dressing which will be discussed in a later section.
4 Control unit
TO PERIPHERALS
AND COMPUTER NET The array control unit (CU) has the following five functions.
(e.g., constant). I 1l ll 1
Full word length (64 bits) communication exists between the PEO PE 1
___ PE61
Instruction processing
. .
CONTROL SIGNALS COMMON 0414 BUS 110 REWEST MOW F I F
ADVAST constructs the necessary address or data operands and FROM PES TO PES FROMIM FROM PES
whenever information, in the form of either data or control signals, The control unit can fetch either individual words or blocks of
must cross array boundaries. CU synchronization must be forced 8 words from the array memory to the local data buffer. In addi-
at all fetches of new instruction blocks, upon all data routing tion, it can fetch 1 bit selected from the 8-bit mode register of
operations, all conditional program transfers, and all configuration- each processing element to form a 64-bit word read into the CU
changing instructions. With these exceptions, the CUs of the accumulator. The CU program counter (PCR) and the configura-
several arrays run independently of one another. This simplifies tion registers are also directly addressable by the CU. Data
the control in the multiple-array operation; furthermore, it permits manipulations ( +, -, Boolean) are performed on a selected CAR
1/0 transactions with the separate array memories without steal- and the result returned to the CAR. Data to be broadcast to the
ing memory cycles from the nonparticipating memories. processing elements is inserted into the FINQ along with the
accompanying instruction and transmitted to the PESat the appro-
Memory addressing priate time.
Both data and instructions are stored in the combined memories
of the array. However, the CU has access to the entire memory, Configuration control
while each PE can only directly reference its own 2,048-word PEM. With the variety of array configurations for ILLIAC IV, it is
The memory appears as a two-dimensional array with CU access necessary to specify and control the subarrays which are conjoined
sequential along rows and with PE access down its own column. and to designate the instruction and data addressing. For this
In multiarray configurations the width of the rows is increased purpose each CU has three configuration control registers (CFC),
by multiples of 64. each of 4-bit length, where each bit corresponds to one of the four
The resulting variable-structure addressing problem is solved subarrays. The CFC registers may be set by the B 6500 or a CU
by generating a fixed-form 20-bit address in the CU as shown in instruction.
Fig. 5. The lower 6 bits identify the PE column within a given CFCO of each CU specifies the array configuration in which
array. The next 2 bits indicate the array number, and the remain- it is participating b y means of a 1in the appropriate bits of CFCO.
ing higher-order bits give the row value. The row address bits CFCl specifies the instruction addressing to be used within the
actually transmitted to the PE memories are configuration- array. In a united configuration it is thus possible for the instruc-
dependent and are gated out as shown. tion stream to be derived from any subset of the united arrays.
Addresses used by the PE’s for local operands contain three CFC2 specifies the CU data addressing form in a manner similar
components: a fixed address contained in the instruction, a CU to the CFC 1 control of instruction addressing.
Chapter 27 I The ILLIAC IV computer 325
The addressing indicated by both C F C l and CFC2 must be the multiplicand and data routing register, and S as a general
consistent with the actual configuration designated by CFCO, else storage register.
a configuration interrupt is triggered. An adder/multiplier (MSG, PAT, CPA), a logic unit (LOG),
and a barrel switch (BSW) for arithmetic, Boolean, and
Trap processing
shifting functions, respectively.
Because external demands on the arrays will be preprocessed
A 16-bit index register (RGX) and adder (ADA) for memory
through the B 6500 system computer, the interrupt system for the address modification and control.
control units is relatively straightforward. Interrupts are provided
to handle B 6500 control signals and a variety of CU or array faults An 8-bit mode register (RGM) to hold the results of tests
and the PE ENABLE/DISABLE state information.
(undefined instructions, instruction parity error, improper con-
figuration control instruction, etc.). Arithmetic overflow and under-
As described earlier, the PES may be partitioned into subproc-
flow in any of the processing elements is detected and produces a
essors of word lengths of 64, 2 x 32, or 8 x 8 bits. Figure 7 shows
trap.
the data representations available. Exponents are biased and rela-
The strategy of response to an interrupt is an effective FORK
tive to base 2. Table 1 indicates the arithmetic and logical opera-
to a single-array configuration. Each CU saves its own status word
tions available for the three operand precisions.
automatically and independently of other CU’s with which it may
previously have been configured. PE mode control
Hardware implementation consists of a base interrupt address
Two bits of the mode register (RGM) control the enabling or
register (BIAR) which is dedicated as a pointer to array storage
disabling of all instructions; one of these is active only in the 32-bit
into which status information will be transferred. Upon receipt
precision mode and controls instruction execution on the second
of an interrupt, the contents of the program counter and other
operand. Two other bits of RGM are set whenever an arithmetic
status information and the contents of CAR0 are stored in the
fault (overflow, underflow) occurs in the PE. The fault bits of all
block pointed to by the BIAR. In addifion, CAR 0 is set to contain
PES are continuously monitored by the CU to detect a fault condi-
the block address used by BIAR so that subsequent register saving
tion and initiate a CU trap.
may be programmed. Interrupt returns are accomplished through
a special instruction which reloads the previous status word and Data paths
CAR 0 and clears the interrupt. Each PE has a 64-bit wide routing path to 4 of its neighbors (kl,
Interrupts are enabled through a mask word in a special regis- ?8). To minimize the physical distances involved in such routing,
ter. The interrupt state is general and not unique to a specific the PES are grouped 8 to a cabinet (PUC) in the pattern shown
trigger or trap. During the interrupt processing, no subsequent in Fig. 8. Routing by distance 5 8 occurs interior to a PUC; routing
interrupts are responded to, although their presence is flagged in by distance +1 requires no more than 2 intercabinet distances.
the interrupt state word. CU data and instruction fetches require blocks of 8 words,
The high degree of overlap in the control unit precludes an which are accessed in parallel, 1 word per PUC, into a CU buffer
immediate response to an interrupt during the instruction which (CUB) 512-bit wide, distributed among the PUCs, 1 word per
generates an arithmetic fault in some processing element. To
alleviate this it is possible under program control to force non-
Table 1 PE data operations
overlapped instruction execution permitting access to definite fault
information. Operation time per element
~~
N E W S
CONTROL UNIT
DRIVERS/
1 1
-
MIR CDB
RECEIVERS
DRIVERS MODE
AND REGISTER
RECEIVERS (RGM)
R REGISTER ’
Jl
(RGR) *
1
y ADDRESS
REGISTERS
(MAR)
kMEMORy
A REGISTER
(RGA)
LEADING
DETECTOR
1
1
S E ( 15) F(481
64 BIT
32 BIT
81 82 B3 B4 B5 I B6 B7
0 BIT
S: SIGN
€:EXPONENT
Fig. 8. (a) Electrical connectivity for routing. ( b ) Physical layout.
F : MANTISSA
16 48
CUI CUZ cu3 cu4
MEMORY
REWEST
PEMl PEMZ PEM, PEM4
PARTIALLY TESTED
1 Executive control of the execution of array programs
2 Control of the multiple-array configuration operations
[zzl TESTED
in number of gates per system should be possible with comparable The remaining problems are (1)location of the faulty subsys-
reliability. tem, and (2) location of the faulty package in the subsystem.
It is only by virtue of high-density integration (50- to 100-gate Location of the faulty subsystem assumes the B 6500 to be
package) that the design of a three-million-gate system can be fault-free, since this can be determined by using the standard
contemplated. Reliability of the major part of the system, 256 B 6500 maintenance routines. The steps to follow are shown in
processing elements and 256 memory units, is expected to be in Fig. 10.
the range of lo5 hours per element and 2 x lo3 hours per memory The B 6500 tests the control units (CU) which in turn test all
unit. PES. PEMs are tested through the disk channel. This capability
The organization of the ILLIAC IV as a collection of identical for functional partitioning of the subsystems simplifies the diag-
units simplifies its maintenance problems. The processing ele- nostic procedure considerably.
ments, the memories, and some part of power supplies are designed
to be pluggable and replaceable to reduce system down time and
improve system availability. References
HollJ59; KuckD68; MurtJ66; SlotD62; UngeS58
330 Part 4 I The instruction-set processor level: special-function processors Section 2 1 Processors for array data
6 Instructions: IXL, IXLI, IXE, IXEI, IXG, IXGI. 6 Instructions: IXL, IXLI, IXE, IXEI, IXG, IXGI.
JX{ i7
11
6 Instructions:
Set J on comparison of X register and op-
erand. See above for meaning of L, E, G,
and I.
JXL, JXLI, JXE, JXEI, JXG, JXGI. 6 Instructions:
Set J on comparison of X register and op-
erand. See Section A2.2 for meaning of L,
E , G, and 1.
JXL, JXLI, JXE, JXEI, JXG, JXGI.
XI Increment PE index ( X register) by bits 48 Set 1 on comparison of S register and op-
through 63 of operand. erand. See Section A2.2 for meaning of L,
XI0 Increment PE index of bits 48 through 63 E , and G.
of operand plus one. 3 Instructions: ISL, ISE, ISG.
Set J on comparison of S register and op-
A2.3 Mode setting,/comparisons erand. See Section A2.2 for meaning of L,
EQB ' Test A and B for equality bytewise. E, and G.
GRB Test B register greater than A register 3 Instructions: JSL, JSE, JSG.
bytewise. ISN Set I from the sign bit of A register.
LSB Test B register less than A register bytewise. JSN Set J from the sign bit of A register.
CHWS Change word size. SETE Set E bit as a logical function of other bits.
Set 1 if A register is less than operand. L SETEO Set E l bit similarly.
means test logical; A means test arithmetic; SETF Set F bit similarly.
M means test mantissa. SETFO Set F1 bit similarly.
3 Instructions: ILL, IAL, IML. SETG Set G bit similarly.
Set 1 if A register is equal to operand. See SETH Set H bit similarly.
above for meaning of L, A, and M . SET1 Set 1 bit similarly.
SETJ Set J bit similarly.
3 Instructions: ILE, IAE, IME. SETCO Set Pth bit of CAR 0 similarly.
Set 1 if A register is greater than operand. SETC 1 Set Pth bit of CAR 1 similarly.
See above for meaning of L, A, and M. SETC2 Set Pth bit of CAR 2 similarly.
SETC3 Set Pth bit of CAR 3 similarly.
3 Instructions: ILG, IAG, IMG. IBA Set 1 from Nth bit of A register; bit num-
Set 1 if A register is equal to all zeros. ber is found in address field.
l(:Iz
3 Instructions: ILZ, IAZ, IMZ.
Set J from Nth bit of A register; bit num-
ber is found in address field.
J{,
Z
i} Set J under conditions specified in set of
instructions immediately above. SUB
operands.
Subtract operand from A register as 64-
bit quantities.
S} Add operand to A register. The R , N, M ,
0 S specify all possible variants of the arith-
15 Instructions: JLL, JAL, JML, JLE, JAE, JME, JLG, metic instruction. The meaning of each
JAG, JMG, JLZ, JAZ, JMZ, JLO, JAO, letter, if present in the mnemonic, is
JMO. R round result
L Set 1 on comparison of X register and op- N normalize result
I X { 2 I] erand. See Section A2.2 for meaning of L, M mantissa only
E, G, and I . S special treatment of signs.
Chapter 27 I The ILLIAC IV computer 333
16 Instructions: ADM, ADMS, ADNM, ADNMS, ADN, 16 Instructions: AND, ANDN, ANDZ, ANDO, NAND,
ADNS, ADRM, ADRMS, ADRM, NANDN, NANDZ, NANDO, ZAND,
ADRNMS, ADRN, ADRNS, ADR, ADRS, ZANDN, ZANDZ, ZANDO, OAND,
AD, ADS. OANDN, OANDZ, OANDO.
ADEX Add to exponent. CBA Complement bit of A register.
DV{R, N, M, S} Divide by operand. See AD instruction for CHSA Change sign of A register.
16 Instructions:
meaning of R, N, M, and S.
DVM, DVMS, DVNM, DVNMS, DVN,
DVNS, DVRM, DVRMS, DVRNM,
{ EOR { Exclusive OR A register with operand.
DVRNS, DVRN, DVRNS, DVR, DVRS, 16 Instructions: EOR, EORN, EORZ, EORO, NEOR,
DV, DVS. NEORN, NEORZ, NEORO, ZEOR,
EAD Extend precision after floating point ADD. ZEORN, ZEORZ, ZEORO, OEOR,
ESB Extend precision after floating point SUB- OEORN, OEORZ, OEORO.
TRACT. LEX Load exponent of A register.
LEX Load exponent of A register.
ML{R, N, M, S} Multiply by operand. See AD instruction OR A register with operand.
for meaning of R, N, M , and S.
16 Instructions: MLM, MLMS, MLNM, MLNMS, MLN, 16 Instructions: OR, ORN, ORZ, ORO, NOR, NORN,
MLNS, MLRM, MLRMS, MLRNM, NORZ, NORO, ZOR, ZORN, ZORZ,
MLRNMS, MLRN, MLRNS, MLR, MLRS, ZORO, OOR, OORN, OORZ, OORO.
ML, MLS. RBA Reset bit A register to ZERO.
SAN Set A register negative. RTAL Rotate A register left.
SAP Set A register positive. RTAML Rotate mantissa of A register left.
SBEX Subtract exponent of operand from expo- RTAMR Rotate mantissa of A register right.
nent of A register. RTAR Rotate A register right.
SB{R, N, M, S} Subtract operand from A register. See AD SAN Set A register negative.
instruction for meaning of R, N, M, and S. SAP Set A register positive.
16 Instructions: SBM, SBMS, SBNM, SBNMS, SBN, SBNS, SBA Set bit of A register to ONE.
SBRM, SBRMS, SBRNM, SBRNMS, SBRN, SHABL Shift A and B registers double-length left.
SBRNS, SBR, SB, SBS. SHABR Shift A and B registers double-length right.
NORM Normalize A register. SHAL Shift A register left.
MULT In 32-bit mode, perform MULTIPLY and SHAML Shift A register mantissa left.
leave outer result in A register and inner SHAR Shift A register right.
result in B register, with both results ex- SHAMR Shift A register mantissa right.
tended to 64-bit format.
A2.5 Logical
AND A register with operand. The left-
hand set of letters specifies a variant on
the A register, the right-hand set, on the
operand. The meaning of these variants is
not present use true
N use complement
Z use all ZEROS
0 use all ONES.
Section 3
Processors defined by a microprogram
Processors defined by a microprogram have only recently come chines they have designed. This formal ruse can be used to
into existence, although Wilkes suggested the idea in 1951. The make the design seem difficult but well founded-certainly not
discussion in Chap. 3 (page 71) suggests reasons why this arbitrary, Kampe truthfully admits to making decisions in a
controversial idea has taken so long to be adopted. somewhat arbitrary fashion.
The SD-2 microprogram structure, unlike that of the IBM Sys-
tem 360 models, has a P.microprogram which is similar to the
Microprogramming and the design of the control circuits external Pc which it defines. As such, the main question about
in an electronic computer this design is whether it is cheaper to have a single, hard-
Chapter 28 is an extension of an earlier paper by Wilkes. It wired Pc rather than a computer within a computer. The
includes an example of a microprogrammed processor (page Packard Bell 440 [Boutwell and Hoskinson, 19631is an example
337). In the earlier paper, The Best Way to Design an Automatic of a better-known Pc whose internal P resembles the SD-2.
Computing Machine [Wilkes, 1951a1, the essential ideas of The authors of this book feel that, when the internal and
microprogramming were first outlined. external P’s are so similar, it may be better to have a single
The observation that an instruction set, or ISP, should be Pwhich suits both needs. To gain speed and still define powerful
looked at as a program to be interpreted is the basis of micro- functions, Mp could be made up of both the conventional Mp
programming. The idea of an ISP is our acknowledgment that and a small, fast Mp.
we, too, view a processor as a program.
There is little to say about this chapter; it is historical, yet
timely and well written. Microprogramming, like other of Wilkes’ The Hewlett-Packard HP 9100A computing calculator
ideas, is present in many of our computers. The HP 9100A (Chap. 20) is discussed in Part 3, Sec. 4, page
235.
334
. \
Chapter 28
M . V. Wilkes / J. B. Stringer
335
336 Part 4 I The instruction-set processor level: special-function processors Section 3 I Processors defined by a microprogram
the right, while others will also involve the use of the adder. Any
particular micro-operation can be performed by applying pulses
simultaneously to the appropriate gates of the switching system.
In certain cases it may be possible for two or more micro-opera-
tions to take place at the same time.
It will be convenient to regard the control system as consisting
of two parts. A register is needed to hold the address of the next
order due to be executed, and another to hold the current order
while it is being executed, or at any rate during part of that time. I
Matrix B
Some means of counting the number of steps in a shifting operation r------1
-7 _ _ _ _ _ -
c
-
system will be called the control register unit. In any case the
operations which need to be performed on the numbers standing
in the control register unit during the execution of an order are,
like the operations performed in the arithmetical unit, regarded From
To a r i t h m e t i c a l
as being made up of a sequence of micro-operations, each of which unit, control condltiona'
is performed by the application of pulses to appropriate gates. registers, e t c . flip-flop
make the output from the decoding tree branch before it enters E register connected to the access circuits of the store; the
matrix A so that the nature of the micro-operation that is per- address of a storage location to which access is required
formed depends on the setting of the conditional flip-flop. is placed here
The micro-programme wired on to the matrices contains sec- F sequence control register; contains address of next order due
tions for performing the operations required by each order in the to be executed
basic order code of the machine. To initiate the operation it is
G register used for counting
only necessary that control in the micro-programme should be sent
to the correct entry point. This is done by placing the function
It was assumed when drawing up the micro-programme that there
digits of the order in the least significant part of register 11, the
was an adder-subtractor in the arithmetical unit with one input
other digits in this register being made zero. The micro-programme
permanently connected to register D, and a similar adder-sub-
is constructed so that when this number passes into register I,
tractor in the control register unit with one input permanently
control in the micro-programme is sent to the correct entry point. connected to register G. For convenience it was assumed that the
The switching system in the arithmetical unit may either be switching systems in each case were comprehensive enough to
designed to permit a large variety of micro-operations to be per- provide any micro-operation required. It was further supposed that
formed, or it may be restricted so as to allow only a small number the arithmetical unit provided for 20 digits and that the numbers
of such operations. In a machine with a comprehensive order code
0, 1 and 18 could be introduced at will into one of the registers
there is much to be said for having the more flexible switching
or the adder of the control register unit. Two conditional flip-flops
system since this will enable an economy to be made in the number are used. All micro-operations including those involving access to
of micro-orders needed in the micro-programme. the store are supposed to take the same amount of time. Reference
A similar remark applies in connexion with the degree of flexi-
will be made to this point in r54.
bility to be provided when designing the switching system for the
Table 1 gives the order code of the machine, and Table 2 the
control register unit. If the specification of the machine allows
micro-programme. Each line of Table 2 refers to one micro-order;
the same number of registers to be used in the arithmetical and the first column gives the address of the micro-order, the second
control sections, the construction of these two sections may be
column specifies the micro-operations called for in the arithmetical
identical except as far as the number of digits is concerned. In
unit of the machine, and the third column specifies the micro-
a new machine under construction in the Mathematical Labora-
tory, Cambridge, the registers are being constructed in basic units
each containing five registers and an adder-subtractor together Table 1
with the associated switching system. It is hoped that it will be
possible to use identical units in the arithmetical unit and in the Notation: Acc = accumulator
Accl = most significant half of accumulator
control register unit.
Accz = least significant half of accumulator
n = storage location n
C(X) = contents of X ( X = register or storage location)
3. Etample
Order Effect of order
An example will now be given to show the way in which a micro-
programme can be drawn up for a machine with a single-address An +
C(Acc) C(n) to Acc
order code covering the usual operations. It is supposed that the S n C(Acc) - C(n) to Acc
W TI C(n) to A C C ~
arithmetical unit contains the following registers:
Vn C(Accz).C(n)to Acc, where C(n) 2 0
Tn C(Acc1) to n, 0 to Acc
A multiplicand register Un C(ACC~) to n
B accumulator (least significant half) Rn C(ACC) .Z-(n+l) to ACC
Ln C(ACC).~"+' to ACC
C accumulator (most significant half) Gn <
If C(Acc) 0, transfer control to n; if C(Acc) 2 0, ignore
D shift register (i.e., proceed serially)
I n Read next character on input mechanism into
0n Send C(n)to output mechanism
The registers in the control register unit are as follows:
338 Part 4 1 The instruction-set processor level: special-function processors Section 3 1 Processors defined by a microprogram
Table 2
Notation: A , B , C, . . . stand for the various registers i n the arithmetical and control register units (see 03 of the text). ' C to D ' indicates that
the switching circuits connect the output of register C t o the input of register D; '(D+A) to C' indicates that the output of register A i s con-
nected t o the one input of the adding unit (the output of D is permanently connected to the other input), and the output of the adder to register C.
A numerical symbol n in quotes (e.g., In') stands for the source whose output is the number n in units of the least significant digit.
Conditional Next
fliP-.fEop micro-order
Ari thmeticul Control
unit register unit Set Use 0 1
0 F to G and E 1
1 ( G + ' l ' ) to F 2
2 Store to G 3
3 G to E 4
4 E to decoder -
A 5 C to D 16
S 6 C to D 17
H 7 Store to B 0
V 8 Store to A 27
T 9 C to Store 25
u 10 C to Store 0
R 11 B to D EtoG 19
L 12 C to D E to G 22
G 13 E to G 18
I 14 Input to Store 0
0 15 Store to Output 0
16 (D+Store) to C 0
17 (D- Store) to C 0
18 0 1
19 D to B ( R ) t to E
(G-'l') 20
20 C to D 21
21 Dto C (R) 11 0
22 D to C (L)$ to E
(G-'l') 23
23 B to D 24
24 D to B ( L ) 12 0
25 '0' to B 26
26 B to C 0
27 '0' to c to E
'18' 28
28 B to D E to G 29
29 D to B ( R ) ( G - ' l ' ) to E 30
30 C to D ( R ) 31 32
31 D to C 28 33
32 ( D + A ) to C 28 33
33 BtoD 34
34 D to B ( R ) 35
35 C to D ( R ) 36 37
36 D to C 0
37 (D-A) to C 0
t Right shift. The switching circuits in the arithmetic unit are arranged so that the least Significant digit of register C is placed in the most significant place of register
B d u r i n g right shift micro-operations, and the most significant digit of register C ( s i g n digit) is repeated (thus making the correction for negative numbers).
$ Left shift. The switching circuits are similarly arranged t o pass the most significant digit of register B t o the least significant place of register C d u r i n g left shift micro-
operations.
Chapter 28 I Microprogramming and the design of the control circuits in an electronic digital computer 339
operations called for in the control register unit. The fourth col- the wave-form generator reaching the decoding tree during the
umn shows which conditional flip-flop, if any, is to be set and the waiting period. This method, although quite feasible, appears to
digit which is to be used to set it; for example, (1)C, means that involve just the kind of complication which the present system
flip-flop number 1 is set by the sign digit of the number in register is designed to avoid. A more attractive system is to make the
C, while (2)G, means that flip-flop number 2 is set by the least machine wait on a conditional micro-order which transfers control
significant digit of the number in register G. In the case of uncon- back to itself unless the associated conditional flip-flop is set.
ditional micro-orders columns 5 and 7 are blank and column 6 Setting of this flip-flop takes place when the operation is com-
contains the address of the next micro-order to be executed. In pleted, and control then goes to the next micro-order in the se-
the case of conditional micro-orders column 5 shows which flip-flop quence. The machine is thus in a condition of ‘dynamic stop’ while
is used to operate the conditional switch and columns 6 and 7 waiting for the operation to be completed. This system has the
give the alternative addresses to which control is to be sent when advantage that no complication is introduced into the units sup-
the conditional flip-flop contains a 0 or a 1 respectively. plying the wave-forms to the decoding tree and that the control
Micro-orders 0 to 4 are concerned with the extraction of orders equipment required is similar to that already provided for other
from the store. They serve to bring about the transfer of the order purposes.
from the store to register E and then cause the five most significant
digits of the order to be placed in register I1 with the result that
control is transferred to one of the micro-orders 5 to 15, each of 5. Discussion
which corresponds to a distinct order in the machine order code. It will be seen that the equipment needed to execute a compli-
In this way the sequence of micro-orders needed to perform the cated order in the machine order code is of the same form as that
particular operation called for is begun. required for a simple one, namely outlets from the decoding tree
The way in which the various operations are performed can and diodes in the matrices. Quite complicated orders can, there-
be followed from Table 2. In the section dealing with multipli- fore, be built into the machine without difficulty. In particular,
cation, it is assumed that numbers lie in the range -1 < < x 1 arithmetical operations on numbers expressed in floating binary
and that negative numbers are represented in the machine by their form and other similar operations can be micro-programmed and
complements with respect to 2. It will be noted that the process it is found that they do not involve very large numbers of micro-
of drawing up a micro-programme is very similar to that of draw- orders. For example, a micro-programme providing for the float-
ing up an ordinary programme for an automatic computing ma- ing-point operations of addition, subtraction, and multiplication
chine and the problems involved are very much alike. needs about 70 micro-orders. The switching system in the arith-
metical unit must, of course, be designed with these operations
in view. The decoding tree and matrices of a parallel machine
4. The timing of micro-operations with 40 digits in the arithmetical unit and provision for 256
The assumption that all micro-operations take the same length micro-orders would only amount to about 15%of the total equip-
of time to perform is not likely to be borne out in practice. In ment in the machine, so that it appears that such a machine can
particular in a parallel machine it may not be possible to design well be provided with built-in facilities of considerable complexity.
an adder in which the carry propagation time is sufficiently short The number of micro-orders needed in a complicated micro-
to enable an addition to be performed in substantially the same programme can sometimes be reduced by making use of what
length of time as that taken for a simple transfer. It will be neces- might be called micro-subroutines. For example, when two num-
sary, therefore, to arrange that the wave-form generator feeding bers have to be added together in a floating binary machine, some
the decoding tree should, when suitably stimulated by a pulse from shifting of one of them is usually necessary before the addition
one of the outputs from matrix A, supply a somewhat longer pulse can take place. By making the micro-orders for this shifting opera-
than that normally required. Other operations may take many times tion serve also when a multiplication is called for, considerable
as long to perform as an ordinary micro-order; for example, access saving is effected.
to and from the store (particularly if a delay store is used) and Four registers is the bare minimum needed in the arithmetical
operation of the input and output devices of the machine. The unit in order to enable the basic arithmetical operations to be
sequence of operations in the micro-programme must therefore performed. If any extension or refinement of the facilities provided
be interrupted. One way of doing this is to prevent pulses from is required, it may be necessary to increase the number of registers.
340 Part 4 1 The instruction-set processor level: special-function processors Section 3 1 Processors defined by a microprogram
For example, four registers are not sufficient to enable a succession 6. Microprogramming applied to serial machines
of products to be accumulated without the transfer of intermediate All the discussion so far has been with reference to parallel ma-
results to the store, since the accumulator must b e clear at the chines because the technique described in this paper is most
beginning of a multiplication. The addition of one register enables adapted to that type of machine. It is, however, possible to design
the accumulation of products to be provided for in the micro- a serial machine along the same lines. In a parallel computer with
programme. If this register is associated with the outlet from the an asynchronous arithmetical unit every gate requires only one
store, it also enables some of the waiting time for storage access kind of wave-form to operate it and the timing of that wave-form
to be eliminated. To do this the micro-programme is arranged to is not critical. In a serial machine, on the other hand, different
call for a number from the store as soon as it is known that the gates require different wave-forms and the same gate may require
number will be required and to continue with other necessary different wave-forms at different times; further, all these wave-
micro-operations before finally proceeding to use the number. The forms must be critically timed. These complications may be
‘dynamic stop’ would occur just before the number is required for handled by including in the micro-control unit a third matrix, C,
use. Another way of saving time is to arrange, in the case of those for selecting the appropriate wave-form for each micro-order. The
orders which permit it, for the next order to be extracted from main wave-form, routed by the decoding tree and matrix A, opens
the store before the operation currently being performed has been a gate which is fed by a wave-form selected by matrix C. This
completed. enables a wave-form of correct duration to be applied to any
The minimum number of registers required in the control selected gate in the arithmetical or control sections of the ma-
register unit of the machine for the simplest mode of operation chine.
is three. If extra registers are provided facilities similar to those
provided by the B-lines in the machine at Manchester University References
could be included in the micro-programme.
WilkM5la; BoutE63; FlynM67; GreeJ64, 66; MercR57; Patz67; RosiR69;
TuckS67; WilkM58b, 69; WebeH67
Chapter 29
Thomas W. Kampe
Summary This paper presents the design of a parallel digital computer Functional requirements
utilizing a 20-psec core memory and a diode storage microprogram unit.
The design of the computer (known, for a variety of reasons, as
The machine is intended as an on-line controller and is organized for ease
the SD-2) was undertaken to supply a computer capable of mod-
of maintenance.
A word length of 19 bits provides 31 orders referring to memory loca- erately fast arithmetic with perhaps five decimal places of accu-
tions. Fourteen bits are used for addressing, 12 for base address, one for racy and 3000 or more words of storage. Furthermore, the com-
index control, and one for indirect addressing. A 32nd order permits the puter must reside in a hostile environment (a small house, 0” to
address bits to be decoded t o generate special functions which require no 85°C temperature), withstand severe shocks, and b e maintained
address. by men with only two weeks training on the system. The volume
The logic of the machine is resistor-transistor; the arithmetic unit is limitation is 40 cubic feet. Within this space must reside the
a bus structure which permits many variants of order structure. control computer, memory, power supplies, complete maintenance
In order to make logical decisions, a “general-purpose” logic unit has facilities, and sufficient input/output equipment to handle 20 shaft
been incorporated so that the microcoder has as much freedom in this area
position outputs, 30 such inputs, numerous switch settings, and
as in the arithmetic unit.
20 or more display or relay signals.
The final specification (or blow) was that 15 months were
available from the start of preliminary design to the delivery of
an operating instrument with debugged program.
Introduction
This paper discusses the logical design of a binary, parallel, real-
time computer. Only those aspects of packaging and circuitry Design analysis
which bear directly on this topic will be considered. The maintenance requirement was evidently the major problem.
Since the specifications for the job a computer is to perform In order to achieve the simplicity required, two design criteria
are not enough to fix the design, the logical designer is faced with were necessary.
an undetermined system. One of his main functions is to analyze First, the computer had to be readily understood. This implied
the system in its natural environment, i.e., with malfunctions, that the usual clever logical tricks such as intensive time sharing
operator errors, etc., and to supply the remainder of the side of control and arithmetic were undesirable.
conditions which do fix the design. Second, if built-in maintenance facilities were to be kept sim-
In this discussion, the exposition will be directed toward the ple, the machine must be designed with this in mind.
design philosophy which led to a machine now being built. In Since temperature and reliability were important, an extremely
order to accomplish this, we shall consider the functional require- conservative approach had to be taken with respect to component
ments, their analysis in terms of the state of the art, the basic performance.
design decisions, and, finally, a description of the computer as it With the schedule requirements, a machine which could be
stands. designed and released in pieces was needed. Since the control
system is usually the most troublesome part of a computer to
‘ I R E Trans., EC-9, vol. 2, pp. 208-213, June, 1960. design, a simple control was needed.
341
3 4 2 Part 4 I The instruction-set processor level: special-function processors Section 3 1 Processors defined by a microprogram
The volume available, together with the schedule, required a but sometimes messy, or to use some microcontrolled logic-
logical design with natural packaging properties in the sense that generating scheme.
it should break, in a natural way, into logical packages of a reason- In this case, the latter alternative was taken. A unit, called (for
able size having a minimum of interpackage communication. several obscure reasons) the alteration unit, was designed which
amounted to a three-address, one-bit unit. It can generate any
Boolean function of two binary variables and transmit this value
Design decisions to another variable. A special set of logic was needed for detecting
The need for 2000 operations per second poses a serious access zeros.
problem with a serial memory, unless one resorts to several simul- Because of the rather wild nature of the inputs, it seemed
taneously operating control units which are neither small nor desirable to include a trapping mode. The logic for this was made
simple. Hence, a random access memory seemed advisable. Mag- an adjunct to the alteration unit.
netic core memories at 85°C are a problem, but they can b e built, The circuitry chosen was resistor-transistor logic, which yields
provided memory cycle time is not too short. The memory was either Sheffer stroke or NOR logic, as one prefers, high or low
chosen as 4096 words of core storage, with a 20-psec cycle time. true logic, and p-n-p or n-p-n transistors. In this case, the com-
The requirement for training a man in two weeks to maintain bination was high true logic and p-n-p transistors, so that the
the machine argues for a simple-structured parallel machine. logical operation is Sheffer stroke. Because of temperature and
Providing that much use is made of asynchronous transfer, there reliability requirements, the maximum frequency available was a
are a variety of simple maintenance methods, particularly if a bus 250-kc square wave. This gave a cycle time of 4 p e c available
structure is adopted. Also, asynchronous, or semi-asynchronous, for asynchronous transfer in any sequence of logic.
parallel machines require only average performance of a set of An index register seemed advisable because of the amount of
components, not of any particular component; the central limit data processing. Thus, additions were needed for indexing, arith-
theorem of statistics can come to the aid of reliability. This ap- metic, and counter advance. It seemed undesirable to have more
proach was finally adopted. than one parallel adder, so that an adder accessible to all registers
The simplicity of both design and understanding is aided by was chosen. This was another argument for a bus structure.
the use of a microprogram control system. Further, maintenance Because of the multiplicity of problems being handled simul-
is made rather simple by two provisions on the maintenance con- taneously, one index register was not really enough. Rather than
sole. add another register, indirect addressing was chosen.
The first of these is a manner of going through the micro- At this point, one needs 12 bits for address, one for index
program on a step-by-step basis. While this tests little of the tagging, and one to specify whether the address is direct or in-
dynamics, it can often locate totally defective parts, and it helps direct, or 14 bits for operand selection. Thirty-two orders was a
factory checkout immeasurably. tight minimum, so the minimum word length was 19 bits. Since
The second is a means of taking out the microprogram unit and this was consistent with five decimal place accuracy, it was tenta-
substituting a set of switches. This permits a maintenance man tively chosen. It was decided, however, to design a structure
to exercise specific registers, or the memory, at will. basically suited to any length word.
This is a powerful tool, and is almost free with a microprogram Shifting is necessary to multiply and divide and is required on
control. Finally, and rather pragmatically, microprogramming two registers, yet shift registers for asynchronous operation are
permits “last minute” changes in machine operation without seri- complex. Hence, it was decided to put the shift facility on the
ous hardware modifications. This approach was chosen. data transfer bus. By providing complementing here, subtraction
Regardless of the control used, at various times in the process could be generated.
of executing orders, decisions must be made. Occasionally these It was decided to use two-complement arithmetic, first because
are on a single bit, more often on two, and occasionally on more of the simplicity of the multiply-divide logic, and second because
than two. If one excludes order decoding, only such functions as it avoids the whole negative zero question.
zero detection require the use of more than two bits. At this point, The precise number of microsteps needed was determined by
the logical designer is faced with a rather sticky decision: whether a trial microprogram. The machine was designed for u p to 512
to design a specific set of decision logic, which is cheap to build microsteps although only 384 are now used. Eight bits were in
Chapter 29 1 The design of a general-purpose microprogram-controlled computer with elementary structure 343
I
I
I
[ ORDER I As a number moves from b to d, one of five operations may
be performed; uiz., normal, shift left one bit, shift right one bit,
complement or shift left 5 bits. The last is used for automatic fill
and in connection with the microprogram unit control.
As an example, to add the number in the A and D registers,
three microprogram steps would be needed. First, transfer A to
Fig. 1. Computer block diagram. G, D to F, and finally a to A; 12 psec would be required.
344 Part 4 I The instruction-set processor level: special-function processors Section 3 1 Processors defined by a microprogram
that someone will find a useful reason for popping into the middle
n . of divide or some other command. There is no feature of a ma-
aL]
MEMORY
- I I
chine, however pathological, which cannot be exploited by a
programmer.
The actual decoding of these nine bits is accomplished partly
by logic, and partly by current switching of the clock pulse. A
diode matrix is used to convert the microsteps into control signals.
No more than 15 micro operations may be called out on a single
step, including selection of the next microorder.
When stepping the microregister, a ploy is used to reduce the
number of diodes. Instead of specifying the next step, the micro-
coder specifies the bits of J which he wishes to reverse. Instead
of the minimum latency coding of earlier days, the microcoder
0 of the SD-2 must do minimum diode coding. This is roughly anal-
ogous to asking for a fast, efficient computer program containing
a minimum of 1’s. The author, as well as others, has spent endless
MEMORY
hours trying to devise a computer program to do such microcoding,
with no results.
One may note in passing that the man who wrote the micro-
code, Tomo Hayata, has for several years specialized in advanced
programming problems. Wilkes’ views,l that logical design will
in the future be done by programmers, seem to be verified here.
Because of the limited microarithmetic available here, micro-
coding of the highest order is a must, since each microstep is 4
psec of time.
Fig. 3. Arithmetic unit detail. For simple orders (e.g.,extract), the processes of order procure,
indexing (but not indirect addressing), operand procure and exe-
cution can be compressed into the time for two memory cycles,
Figure 4 is a diagram of the microprogram unit. The eight-bit Le., 40 psec. Each indirect reference adds another memory cycle
J register, augmented by the TO flip-flop of the alteration unit, ‘Private communication; Aug. 17, 1959.
is decoded for up to 512 steps. Students of microprogramming will
recognize the Wilkes model in its pure form [Wilkes and Stringer,
19531. The “next” value of the microprogram register may be
chosen in one of three ways.
First, the value may be controlled by the microprogram itself.
Second, five bits of the bus, corresponding to the order portion
of the word, may be entered; the other three bits are set to zero.
In this manner, the order decoding is accomplished.
Third, all eight bits of the J register may be filled from the
d bus. In practice, the order is shifted five bits to the left, pre-
senting eight bits of the address to get the J register. In this
manner, one may generate “no address” commands.
In principle, the programmer may start on any microstep which
amuses him; in practice, only a limited number of these will yield
no-address orders, the other steps being used for parts of add,
subtract, order procure, etc. The author has no doubt, however, Fig. 4. Microprogram unit.
Chapter 29 I The design of a general-purpose microprogram-controlled computer with elementary structure 345
to this time. Only on multiply, divide, and shift does the ultra- The lines going into the logic unit are actually two busses. Any
simple structure begin to be expensive in time. logic source may read to either bus. The logic unit has four control
If the temperature requirement were not imposed, the clock wires from the microprogram unit, specifying which of the 16
frequency could be doubled, materially improving the perform- Boolean functions of the two busses is to be put on the output
ance of the machine on multicycle orders. bus. This value is then routed to the appropriate logic destination.
Figure 5 is a block diagram of the alteration unit. It consists The output flip-flops have inputs from the logic unit, and their
of gates which permit entry of conditions within the computer outputs go to various control points in the machine. Three major
or the outside world, flip-flops used as working storage, flip-flops, points are: (1)establishing whether a memory cycle is read/restore
including TO, to make its conclusions known to all and sundry, or erase/write; (2) setting the initial carry in the adder; and (3)
a five-bit tally register ( I ) , a circuit to detect a zero on the d bus, determining what value shall shift into the vacant spot on a left
and the trap logic. There are as many as 20 input gates, 9 storage or right shift.
flip-flops and 10 output flip-flops, exclusive of TO. The initial carry is used for more than simply adding one to
The 1 register can change its contents in one of two ways, viz., a value; since the logic is two complement, but the one comple-
counting down by one, or by accepting an entry from the d bus. ment one is transmitted on the bus, the initial carry is, in general,
It may transmit intelligence in two ways, viz., to the b bus, or one during subtraction and zero during addition.
by notifying the input gate system that, should anyone care, it
has just counted past zero.
The zero detector signals the truth of the statement that d is Microprogram details
identically zero. In practice, it checks only the lower digits, not Figure 6 gives circuit details of the microprogram decode system.
the sign. This is related to the existence of the number -1 in The nine flip-flops used are broken into two groups, one of four,
a two-complement system, which is the system’s answer to the the other of five flip-flops. These are decoded into, respectively,
negative zero of a one’s complement logic. 16 and 32 wires. In each group, one and only one wire goes nega-
The trap logic is as follows: one of the output signals of the tive. When the clock signal, of 2 p e c width, is applied to the
alteration unit signals whether or not the system is receiving trap emitters of the first set of 16 gates, it is passed by the selected
signals; if it is not, the trap logic makes a note of callers. When gating transistor. From the collector of this transistor, it is routed
the system is again accepting those signals, it transmits whether to the emitter of a set of 32 transistors; again, only one can pass
or not signals have been received, and resets its memory to zero. current. Thus, the clock signal is routed to one of 16 x 32 x 512
The timing is such that n o trap signal will ever be lost. lines. Diodes on the selected line then cause this signal to be
routed to appropriate gates in the arithmetic or alteration unit.
By appropriate placement of diodes, a microstep can operate
a variety of gates, the number of which is limited by the current
available.
Some of the microcontrol wires return to the J register so that
+ TRAP LOGIC the microcoder may control the selection of the next microstep.
INPUT
GATES This register is so designed that the actual change of state is
ZERO inhibited until the clock goes negative.
d
DETECT
+
LOGIC While each output of the decoding trees may go to 16 bases,
STORAGE
UNIT - only one transistor of the 16 will have a signal on the emitter;
FLIP-FLOPS
-
--+
thus only one must be driven.
From an engineering point of view, the control of a computer
OUTPUT is an elaborate timing system. A microprogram unit is thus a
e FLIP-FLOPS
+ programmable timing generator. The gating transistor/diode de-
coding system is but one of many ways t o achieve this.
Wilkes has observedl that, with the diode system, one has an
Fig. 5. Alteration unit. IM. V. Wilkes, private communication; Aug. 17, 1959.
346 Part 4 I The instruction-set processor level: special-function processors Section 3 1 Processors defined by a microprogram
- SELECT n
V i SELECTED
acute packaging problem. He and his co-workers have been led postcalculation (obtaining a rounded quotient with a correct re-
to consider the use of switch-core decoding [Wilkes et al., 1958al. mainder) which further boosted its time.
Eachusl and his co-workers have evolved yet another switch- Because of the asynchronous nature of transfer, it is not possible
core system which does not depend on coincident current switch- to read into and out of a register simultaneously. Hence, shifting
ing. one register requires two steps, or 8 psec per bit, and double-length
shifting requires 16 psec. This is painful.
Because of the short words, four double-length orders were
Order code
microprogrammed: add, subtract, clear and add, and store. These
Since the order code is only a small problem in the design of a take a total of 60 psec to execute.
microprogrammed machine (GOTT SEI DANKE), there is little A rich collection of branch orders was included. BRanch Un-
need to dwell on it. There are several comments of design interest, conditionally, BRanch Negative, and BRanch Zero are self-
however. explanatory. BRanch on B is the tally loop order which decreases
We were unable, with this structure, to get the multiplication (B) by one, and branches if it does not go negative. BR1, BR2,
below five microsteps per iteration, nor the divide below six, thus BR3, and BR4 are sense toggle branch; if the toggle is set, it is
costing respectively 20 and 24 psec per bit dealt with. Moreover, turned off and the program branches. These sense toggles are
division required some precalculations (overflow detect) and some actually storage flip-flops T1, T2, T3, and T4 of the alteration unit.
'Dr. Joseph Eachus of Minneapolis-Honeywell, private conversation; Sep- These may be set by other orders. T1 is also used as an overflow
tember, 1959. mark.
Chapter 29 I The design of a general-purpose microprogram-controlled computer with elementary structure 347
The machine has a “dynamic” idle. When it is halted, either Several conclusions can be drawn here, however, The bus
externally or by order, this fact is observed by the microprogram, structure is easy to fabricate and maintain; this has been proven
through the alteration unit, whereupon the microprogram goes on the MILSMAC, a breadboard for the SD-2. It is a highly flexible
into a tight loop, continuously asking, “Can I go? Can I go? Can structure, permitting wide variation in order code with no change
I go?. . . .” Two forms of halting are provided. In “Halt and in arithmetic unit. At the same time, the components are cascaded
Display,” registers are presented; in the other halt, the console to a point where one has the absurd situation of fast-switching
lights are left unaltered. A manual halt is equivalent to halt and in a relatively slow computer. A designer of a bus-structured
display. machine would do well to consider alternatives, such as multiple
For an addressed order, bit positions one through five are sent busses, accumulators, etc., to permit more parallelism when speed
into the microprogram unit. During order procure, the micro- is important.
program examines bits zero and six for indirect addressing and The use of a special-purpose logic unit, such as the alteration
index modification. unit of the SD-2, gives a freedom of design not possible with a
A nonaddress order is recognized by the binary equivalent of special-purpose logic. At the same time, it uses more parts, is slow
31 in the order bits; the microprogram unit causes the order word in handling multiple variable problems, and requires a great deal
to shift left 5 bits, and the 8 high bits of the “address” field enter of control input. It appears to be a weapon of opportunity.
the J register. The use of microprogramming is much the same as the general
logic unit. Its flexibility and speed of design are unquestionable.
Also, it uses more parts than a special-purpose control.
Conclusion There is no real substitute for a special-purpose design. The
This paper is not intended to be an argument in favor of the use of generalized elements in computer design can be justified
general acceptance of the SD-2 structure as an ideal. Like all only by the side conditions, never by the basic specification.
computers, the SD-2 is a state-of-the-art device, intended not only Where simplicity and speed of design are major items, their use
to meet the needs of the problems at hand, but also, more impor- seems indicated.
tantly, to meet the side conditions of its use. In a vague analogy, Wilkes once presented a paper on the best way to design a
the computer specification is like a partial differential equation. computer and launched the microprogramming notions. The
The logical designer must choose the boundary conditions and author would like to comment that if ease and reliability of design
solve the problem, or at least approximate the solution. are criteria, he was absolutely correct.
With today’s emphasis on system speed performance, some
serious mental gear-shifting on the designer’s part is required in
order to design a simple machine. It goes against the grain of
instinct and experience. A posteriori, the SD-2 could have been
made even simpler, particularly with respect to several peripheral References
areas not discussed in the paper. KampTGO; WilkM53a; WilkM58a
Section 4
348
Chapter 30
The general-purpose digital computer, by virtue of its large ca- more expeditiously in a computer designed to handle it than by
pacity and general-purpose nature, has opened the possibility of interpretation in a computer designed with a quite different com-
research into the nature of complex mechanisms per se. The chal- mand structure. The mismatch between the IPL’s designed and
lenge is obvious: humans carry out information processing of a current computers is appreciable: 150-machine cycles are needed
complexity that is truly baffling. Given the urge to understand to do what one feels should take only 2 or 3 machine cycles. (It
either how humans do it, or alternatively, what kinds of mecha- will become apparent that the difficulty would not be removed
nisms might accomplish the same tasks, the computer is turned by “compiling” instead of “interpreting,” to resurrect a set of
to as a basic research tool. The varieties of complex information well-worn distinctions. The operations that are mismatched to
processing will be understood when they can be synthesized: when current computers must go on during execution of the program,
mechanisms can be created that perform the same processes. and hence cannot be compiled out.)
The last few years have seen a number of attempts at synthesis The purpose of this paper is to consider an IPL computer, that
of complex processes. these have included programs to discover is, a computer constructed so that its machine language is an
proofs for theorems [Newell et al., 1956, 1957b1, programs to information processing language. This will be called language
synthesize music [Brooks et al., 1957b],programs to play chess IPL-VI, for it is the sixth in the series of IPL’s that have been
[Bernstein et al., 1958; Kister et al., 19571, and programs to simulate designed. This version has not been realized interpretively, but
the reasoning of particular humans [Newell et al., 19581. The feasi- has resulted from considering hardware requirements in the light
bility of synthesizing complex processes hinges on the feasibility of programming experience with the previous languages.
of writing programs of the complexity needed to specify these Some limitations must be placed on the investigation. This
processes for a computer. Hence, a limit is imposed by the limit paper will be concerned only with the central computer, the
of complexity that the human programmer can handle. The command structure, the form of the machine operations, and the
measure of this complexity is not absolute, for it depends on the general arrangements of the central hardware. It will neglect
programming language he uses. The more powerful the language, completely input-output and secondary storage systems. This does
the greater will be the complexity of the programs he can write. not mean these are unimportant or that they present only simple
The authors’ work has sought to increase the upper limit of com- problems. The problem of secondary storage is difficult enough
plexity of the processes specified by developing a series of lan- for current computing systems; it is exceedingly difficult for IPL
guages, called information processing languages (IPL’s), that re- systems, since in such systems initial memory is not organized in
duce significantly the demands made upon the programmer in his neat block-like packages for ease of shipment to the secondary
communication with the computer. Thus, the IPL’s represent a store.
series of attempts to construct sufficiently powerful languages to Nor is it the case that one would place an order for the IPL
permit the programming of the kinds of complex processes previ- computer about to be described without further experience with
ously mentioned. it. Results are not entirely predictable. IPL’s are sufficientlydiffer-
The IPL’s designed so far have been realized interpretively on ent from current computer languages that their utility can be
current computers [Newell and Shaw, 1957al. Alternatively, of evaluated only after much programming. Moreover, since IPL’s
course, any such language can be viewed as a set of specifications are designed to specify large complicated programs, the utility
for a general-purpose computer. An IPL can be implemented far of the linguistic devices incorporated in them cannot be ascer-
tained from simple examples.
‘Proc. WJCC, pp. 119-128, 1958 One more caution is needed to provide a proper setting for
349
350 Part 4 1 The instruction-set processor level: special-function processors Section 4 I Processors based on a programming language
this paper. Most of the computing world is still concerned with istics of hardware and program that determine what memory cells
essentially numerical processes, either because the problems can be regarded as “next to” a given cell, plays a fundamental
themselves are numerical or because nonnumerical problems have role in the organization of the information processing. This is
been appropriately arithmetized. The kinds of problems that the obviously true for serial memories like tape; it is equally true from
authors have been concerned with are essentially nonnumerical, random access memories. In random access memories the topo-
and they have tried to cope with them without resort to arithmetic logical structure is derived from the possibility of performing
models. Hence the IPL’s have not been designed with a view to arithmetic operations on the memory addresses that make use of
carrying out arithmetic with great efficiency. the numerical relations among these addresses. Thus, the cell
with address 1435 is next to cell 1436 in the specific sense that
the second can be reached from the first by adding one to the
Fundamental goals and devices number in a counter.
The basic aim, then, is to construct a powerful programming In standard computers use is made of the static topology based
language for the class of problems concerned. Given the amount on memory addresses to facilitate programming and computation.
and kind of output desired from the computer, a reduction in the Index registers and relative addressing schemes, for example, make
size and complexity of the specification (the program) that has to use of program arithmetic and depend for their efficacy upon an
be written in order to secure this output is desired. orderly matching of the arrangement of information in memory
The goal is to reduce programming effort. This is not the same with the topology of the addressing system.
as reducing the computing effort required to produce the desired When memory is organized in a list structure, the relation
output from the specification. Programming feasibility must take between information storage and topology is reversed. The topol-
precedence over computing economics; since it is not yet known ogy of memory is continually modified to adapt to the changing
how to write a program that will enable a computer to teach itself needs of organization of memory content. No arithmetic operations
to play chess, it is premature to ask whether it would take such on memory addresses are permitted; the topology is built on a
a computer one hour or one hundred hours to make a move. This single, asymmetric, modifiable, ordinal relation between pairs of
is not meant as an apology, but as support for the contention that, memory cells which is called adjacency. The system contains
in seeking to write programs for very large and complicated tasks, processes that make use of the adjacency relations in searching
the overriding initial concerns must be to attain enough flexibility, memory, and processes that change these relations at will inex-
abbreviation, and automation of the underlying computing proc- pensively in the course of processing.
esses to make programming feasible. And these concerns have to A list structure can be established in computer memory by
do with the power of the programming language rather than the associating with each word in memory a n address that determines
efficiency of the system that executes the program. what word is adjacent to it, as far as all the operations of the
In the next section a straightforward description of an IPL computer are concerned. Memory space of an additional address
computer is begun. To put the details in a proper setting, the associated with each word is given up, so that the adjacency
remainder of this section will be devoted to the basic devices relation can be changed as quickly as a word in memory can be
that IPL-VI uses to achieve a measure of power and flexibility. changed. Having paid this price, however, many of the other basic
These devices include: organization of memory into list structure, features of IPLs are obtained almost without cost: unlimited
provision for breakouts, identity of data with program, two-stage hierarchies of subroutines; recursive definition of processes; vari-
interpretation, invariance of program during execution, provision able numbers of operands for processes; and unlimited complexity
for responsibility assignments, and centralized signalling of test of data structure, capable of being created and modified to any
results. extent at execution time.
to “break out” of the format and to use more general modes of control, contains at any given moment a large number of parallel
specification than the format permits. Devices for breakouts ex- active programs, frozen in the midst of operation and waiting until
change processing time for flexibility. Several devices achieve this called upon to produce the next operation or piece of data. This
in IPL-VI. Each is associated with some part of the format. identity of data with program can be attained only if the proc-
As an illustrative example, 1PL-VI has a single-address format. essing programs require for their operation no information about
Without breakout devices, this format would permit an informa- the structure of the data programs, only information about how
tion process to operate on only a single operand as input, and to receive the data from them.
would permit the operand of a process to be specified only by
giving its address. Both of these limitations are removed: the first Two-stage interpretation
by using a special communication list to store operands, the second To identify the operand of an IPL-VI instruction, a designating
by allowing the address for an operand to refer either to the operation operates on the address part of the instruction to pro-
operand itself or to any process that will determine the operand. duce the actual operand. Thus, depending on what designating
The latter device, which allows broad freedom in the method operation is specified, the address part may itself be the operand,
of specifying an operand, illustrates another important facet of may provide the address of the operand, or may stand in a less
the flexibility problem. Breakouts are of great importance in re- direct relation to the operand. The designating operation may even
ducing the burden of planning that is imposed on the programmer. delegate the actual specification of the operand to another desig-
It is certainly possible, in principle, to anticipate the need for nating operation.
particular operands at particular stages of processing, and to pro-
vide the operands in such a way that their addresses are known Invariance of program during execution
to the programmer at the appropriate times. This is the usual way In order to carry out generalized recursions, it is necessary to
in which machine coding is done. However, such plans are not provide for the storage of indefinite amounts of variable informa-
obtained without cost; they must be created by the programmer. tion necessary for the operation of such routines. In 1PL-VI all
Indeed, in writing complex programs, the creation of the plan of the variable information is stored externally to the associated
computation is the most difficult part of the job; it constitutes the routine, so that the routine remains unmodified during execution.
task of “programming” that is sometimes distinguished from the The name of a routine can appear in the definition of the routine
more routine “coding.” Thus, devices that exchange computing itself without causing difficulty at execution time.
time for a reduction in the amount of planning required of the
programmer provide significant increases in the flexibility and Responsibility assignments
power of the language. The automatic handling of such processes as erasing a list, or
searching through a list requires some scheme for keeping track
Identity of data with programs of what part of the list has been processed, and what part has
In current computers, the data are considered “inert.” They are not. For example, in erasing a program containing a local sub-
symbols to be operated upon by the program. All “structure” of routine that appears more than once within the program, care
the data is initially developed in the programmer’s head and must be taken to erase the subroutine once and only once. This
encoded implicitly into the programs that work with the data. The is accomplished by a system for assigning responsibility for the
structure is embodied in the conventions that determine what bits parts of the list. In general, the responsibility code in IPL-VI
the processes will decode, etc. handles these matters without any explicit attention from the
An alternative approach is to make the data “active.” All words programmer, except in those few situations where the issue of
in the computer will have the instruction format: there will be responsibility is the central problem.
“data” programs, and the data will be obtained by executing these
programs. Some of the advantages of this alternative are obvious: Centralized signalling of test results
the full range of methods of specification available for programs The structure of the language is simplifiedby having all conditional
is also available for data; a list of data, for example, may be speci- processes set a switch to symbolize their output instead of pro-
fied by a list of processes that determine the data. Since data are ducing an immediate conditional transfer of control. Then, a few
only desired “on command” by the processing programs, this specialized processes are defined that transfer control on the basis
approach leads to a computer that, although still serial in its of the switch setting. By symbolizing and retaining the conditional
352 Part 4 I The instruction-set processor level: special-function processors Section 4 1 Processors based on a programming language
information, the actual transfer can be postponed to the most way previously explained. The address of the first word in the
convenient point in the processing. The flexibility obtained by this sequence is the name of the list. A special terminating symbol T ,
device proves especially useful in dealing with the transmission whose link is irrelevant, is in the last word on every list. A simple
of conditional information from subroutines to the routines that list is illustrated in Fig. 1; its name is L,,,, and it contains two
call upon them. symbols, S, and S,.
The symbols in a list may themselves designate the names of
other lists. (The symbols themselves have a special format, so that
General organization of the machine they are not names of lists but designate the names in a manner
The machine that is described can profitably be viewed as a that will be described.) Thus, a list may be a list of lists, and each
“control computer.” It consists of a single control unit with access of its sublists may be a list of lists.
to a large random-access memory. This memory should contain An example of a list structure is shown in Fig. 2. The name
lo5 words or more. If less than lo4 words are available in the of the list structure is the name of the main list, L,,,,. L,,, contains
primary memory, there will probably be too frequent occasions two sublists, L,,, and L,,,, plus an item of information, l,, that
for transfer of information between primary and secondary storage is not a name of a list. L,,, in its turn consists of item I, plus
to make the system profitable. another sublist, L,,,, while L,,, contains just information, and is
The operation of the computer is entirely nonarithmetic, there not broken out further into sublists. Each of these lists terminates
being no arithmetic unit. Since arithmetic processes are not used in a word that holds the symbol T.
as the basis of control, as they are in standard computers, such
a unit is inessential, although it would be highly desirable for the Available space list
computer to have access to one if it is to be given arithmetic tasks. A list uses a certain number of cells from memory. Which cells
The computer is perfectly capable of proving theorems in logic it uses is unimportant as long as the right linkages are set up. In
or playing chess without an arithmetic adjunct. executing programs that continually create new lists and destroy
old ones, two requirements arise. When creating a list, cells in
Memory memory must be found that are not otherwise occupied and so
The memory consists of cells containing words of fixed length. are available for the new list. Conversely, when a list is destroyed
Each word is divided into two parts, a symbol and a link. The (when it is no longer needed in the system) its cells become avail-
entire memory is organized into a list structure in the following able for other uses, but something must be done to gain access
way. The link is an address; if the link of a word a is the address to these available cells when they are needed.
of word b, then b is adjacent to a. That is, the link of a word The device used to accomplish these two logistic functions is
in a simple list is the address of the next word in the list. the available space list. All cells that are available are linked
The symbol part of a word may also contain an address, and together into the single long list. Whenever cells are needed, they
this may be the address of the first word of another list. As indi- are taken from the front of this available space list: whenever cells
cated earlier, the entire topology of the memory is determined are made available, they are inserted on the front of the available
by the links and by addresses located in the symbol parts of words. space list just behind the fixed register that holds the link to the
The links permit the creation of simple lists of symbols; the links first available space. The operations of taking cells from the avail-
and symbol parts together, the creation of branching list structures. able space list and returning cells to the available space list in-
The topology of memory is modified by changing addresses in volve, in each case, only changes of addresses in a pair of links.
links and symbol parts, thereby changing adjacency relations
among words. The modification of link addresses is handled
directly by various list processes without the attention of the
programmer. Hence, the memory can be viewed as consisting of
symbol occurrences connected together by mechanisms or struc- s2 T
ture whose character need not be specified.
The basic unit of organization is the list, a set of words linked
together in a particular order by means of their link parts, in the Fig. 1. A simple list.
Chapter 30 I A command structure for complex information processing 353
List of current instruction addresses (CIA), L,. At any given CIA list L2
moment in working sequentially through a program, there will be CClA list list . L3
Camporator
a whole hierarchy of instructions that are in process or interpreta-
tion, but whose interpretation has not been completed. These will Memory
include the instruction currently being interpreted, the routine
to which this instruction belongs, the superroutine to which this
routine belongs, and so on. The CIA list is the list of addresses
of this hierarchy of routines. The first symbol on the list gives the
address of the instruction currently being interpreted; the second
symbol gives the address of the current instruction in the next
higher routine, etc. In this system it proves to be preferable to Fig. 3. Machine information transfer paths.
354 Part 4 1 The instruction-set processor level: special-function processors Section 4 I Processors based on a programming language
register, A, of Fig. 3. Then operation b is decoded and performed in memory. Notice that the operation permits an indefinite number
on s. The cycle is then repeated using f to fetch the next instruc- of stages of delegation, since if c, = 4, there will be a further
tion. delegation of the designation operation to e, and d, in word a,.
The last designation operation, c = 5, provides both for dele-
The operation codes gation and a breakout. With c, = 5, d, is interpreted as a process
that determines s. Any program whatsoever, having its initial
The simple interpretation cycle previously described provides
instruction at d,, can then be written to specify s. When this
none of the powerful linguistic features that were outlined at the
program has been executed, an s will have been designated, and
beginning of the paper: hierarchies of subroutines, data programs,
the interpretation will continue by reverting to the original cycle,
breakouts, etc. These features are obtained through particular b
that is, by applying b, to the s that was just designated. It is
and c operations that modify the sequence of control. The opera-
necessary to provide a convention for communicating the result
tion codes will be explained under the following headings: the of process d, to the interpreter. The convention used is that d,
designation code, sequence-controlling operations, save and delete
will leave the location of s in L,,, the standard communication cell.
operations, communication list operations, signal operations, list
operations, and other operations. Sequence-controlling operations
Appendix 2 lists the 35 b operations. The first 12 of these are the
The designation code ones that affect the sequence of control. They accomplish 5 quite
The designation operation, c, operates on the address, d, to desig- different functions: executing a process ( b = 1, lo), executing
nate a symbol occurrence, s, that will serve as input, or operand, variable instructions ( b = 2), transferring control within a routine
for the operation b. The designation operation places the address ( b = 3, 4,5 ) , transferring control among parallel program struc-
of the designated symbol, s, in the address register. tures ( b = 0, 6, 7, 8, 9,), and, finally, stopping the computer
The designation codes proposed, based on their usefulness in ( b = 11).
coding with the IPL’s, are shown in Appendix 1. The first four, A routine is a list of instructions; its name is the address of
c = 0, 1, 2, or 3, allow four degrees of directness of reference. the first word in the list. To execute a routine, its name (Le., its
They are usable when the programmer knows in advance where name becomes the s of the previous section) is designated and to
the symbol, s, is located. To illustrate their definition, consider it is applied the operation b = 1, “execute s.” The interpreter
an instruction a,, with parts b,, e,, d,, and e,, which can collec- must keep track of the location of the instruction that is being
tively be called s,. The address part, d,, of this instruction may executed in the current routine and return to that location after
be the address of another instruction d, = a,; the address part, completing the execution of the instruction (which, in general, is
d,, of a, may be the address of a,, etc. a subroutine). All lists end in a word containing b = 10, which
The code c , = 1 means that s is the symbol whose address is terminates the list and returns control to the higher routine in
d,, that is, the symbol s,. In this case the designating operation which the subroutine just completed occurred. (The symbol T is
puts d,, the address of s,, in the address register. The code c , = 2 really any symbol with b = 10.)
means that s is s,; hence, the operation puts d,, the address of Figure 5 provides a simple illustration of the relations between
s3, in the address register. The code c, = 3 puts d,, the address routines and their subroutines. In the course of executing the
of s4, in the address register. Finally, c, = 0 designates as s the routine L,, (i.e., the instructions that constitute list L,,), an in-
actual symbol in a, itself; hence, this means that b is to operate struction, (1,0, L,,), is encountered that is interpreted as “execute
on s,. Therefore, this operation places u1 in the address register. L,,.” In the course of executing L,,,, an instruction is encountered
The remaining two designation operations, c = 4 and 5, intro- that is interpreted as “execute L,,.” Assuming that L,,, contains
duce another kind of flexibility, for they allow the programmer no subroutines, its instructions will be executed in order until the
to delegate the designation of s to other parts of the program. terminate instruction is reached. Because of the 10 in its b part,
When c1 = 4,the task of designating s is delegated to the symbol this instruction returns control to the instruction that follows L,,
of the word d, = u2. In this case, s is found by applying the in Lz0.When the final word in L,, is reached, the operation code
designation operation, c2 of word a,, to the address, d,, of word 10 in its b part returns control to Ll0; which then continues with
u2. An operation of this kind permits the programmer to be the instruction following L,,,. (Only the b part, b = 10, of the
unaware of the way in which the data are arranged structurally terminal word in a routine is used in the interpretation; the c and
356 Part 4 1 The instruction-set processor level: special-function processors Section 4 I Processors based on a programming language
Location Symhol Link The result is the exact situation obtained before the last “save”
~~
was performed.
LlOO. . . . . . . . . . . . 1 1 . . . . . . . . . . . . . . . . Lzoo In the description of the “delete” operation up to this point,
Lpoo., . . . . . . . . . . . . . . . . I 1 . . . . . . . . . . . . . . . . . t
only the changes it makes in the “push-down” list, in this case
L,,,, have been considered. The operation does more than this,
A new cell, which happened to be L,,,, was obtained during however; “delete s” also erases all structures for which the symbol
the “save” operation from the available space list, L,, and a copy s (II and I, in the examples) is responsible. When a copy of a
of I, was put in it. The symbol in L,,, can now be changed without symbol is made, e.g., the operation that initially replaced I , by
losing I, irretrievably. Suppose a different symbol is copied, for I, in L,,,, the copy is not assigned responsibility for the symbol
example, 12, into L,,,. Then: ( e = 0 was set in the COPY). Thus, no additional erasing would
be required in the particular “delete” operation illustrated. If, on
the other hand, the I, that was moved into Lloo had been respon-
Location Symbol Link sible for the structure that could be reached through it (if it were
the name of a list, for example), then a second “delete” operation,
LlOO. . . . . . . . . . . . . . . . . I 2 . . . . . . . . . . . . . . . . . . L2oo
L p o o . . . . . . . . . . . . . . . . . . . 1 1 . . . . . . . . . . . . . . . . . .t putting I, back into L,,,, would also erase that list and put all
its cells back on the available space list. Thus “delete” is also
equivalent to “erase” a list structure.
Although I, has been replaced in L,,,, I , can be recovered by
performing the “delete” operation, b = 13. Before the “delete” Communication list operations
operation is explained, it will be instructive to show what happens In describing a process as a list of subprocesses, the question of
when the “save” operation on L1,, is interated. If it is executed inputs and outputs from the processes has been entirely by-passed.
again, it will make a copy of I,. Therefore: Since each subroutine has an arbitrary and variable number of
operands as input, and provides to the routine that uses it an
Location Symbol Link arbitrary number of outputs, some scheme of communication is
~ ~
required among routines. The communication list, L,, accom-
LlOO., . . . . . . . . . . . . . . I p . . . . . . . . . . . . . . . . . . . L300 plishes this function in IPL.
L3oo . . . . . . . . 12 . . . . . . . . . . . . . . L o o
That the inputs and outputs to a routine be symbols is required.
Lzoo . . . . . . . . . . . . . . . . - 1 1 . . . . . . . . . . . . . . . . . . . t
This is no real restriction since a symbol can be the name of any
list structure whatever. Each routine will take as its inputs the
Notice that the cell L,,,,, in which the copy of symbol I, is first symbols in the L, list. That is, if a routine has three inputs,
retained, was not affected at all by this second “save” operation. then the first three symbols in L, are its inputs. Each routine must
Only the top cell in the list and the new cell from the available remove its inputs from L, before terminating with b = 10, so
space list are involved in the transaction of saving. The same as to permit the use of the communication list by subsequent
process is performed no matter how long the list that trails out routines. Finally, each routine leaves its outputs at the head of
below L,,,; thus, the save operation can be applied as many times list Lo.
as desired with constant processing time. The b operations 14 through 19 are used for communication
The “delete” operation, b = 13, applied to the symbol I, in in and out of L,. Their one common feature is that, whenever they
L,,,, will now be illustrated. This operation puts the symbol and put a symbol in L,, they save the symbol already there, that is,
link of the second word in the list, L,,,, into the first cell, L,,,, they push down the symbols already “stacked” in Lo. Likewise,
and puts L,,, back on the available space list, with the following whenever a symbol is moved from L, to memory, the symbol below
result: it in L, “pops up” to become the top one. (To be precise, the
358 Part 4 I The instruction-set processor level: special-function processors Section 4 1 Processors based on a programming language
responsibility bit travels with a symbol when it is moved. Hence unknown symbol need not exist; that is, the symbol referred to
for example, b = 16 and 17, do not, unlike the “delete” operation, may contain a b = 10 operation, which means that the end of the
erase the structure for which lL, is responsible.) list has been reached. Consequently, the signal is always set “on”
The four operations, b = 14, 15, 16, and 17, are the main in-out if the symbol is found, and “off” if the symbol is not found. One
operations for Lo.Two options are provided, depending on whether of the virtues of the common signal is apparent at this point, since,
the programmer wishes to retain the s in memory ( b = 14 and if the programmer knows that the symbol exists, he will simply
16) or destroy it ( h = 15 and 17). (The move in operation 15 has ignore the signal. Instruction formats that provide for additional
the same significance as in I6 and 17; the responsibility bit moves addresses for conditional transfers would force the programmer
with the symbol, and the symbol previously in the location of s, to attend to the condition even if it only meant leaving a blank
is recalled.) space in the program.
Operation b = 18 is a special input to aid in the breakout To illustrate how these search operations work, Fig. 6 shows
designation operation, c = 5. Recall that the latter operation re- a list of lists, L,,,, and a known cell, L,,,. Cell L,,, contains the
quires d to place the location of s, the symbol it determines, in reference to the list structure. The programmer does not know
Lo. Operation 18 allows the process d to accomplish this. how the list, L,,,, is referenced. He wants to find the last symbol
Operation b = 19 provides the means for creating structures. on the last list of the structure. His first step is (30, 1, L,,,) which
It takes a cell, for example, L,,,, from available space, and puts replaces the reference by the name of the list, L,,,. He then
its name, as the symbol (0,0, L,,,), in the location of the designated searches down to the end of list L,,, by doing a series of opera-
symbol, s. The symbol s, previously in this location is pushed down tions: (32, I, Ll,,). Each of these replaces one location on the list
and saved. by the next one. In fact, a loop is required, since the length of
the list is unknown. Hence, after each “find the next word” opera-
Signal operations tion, he must transfer, on the basis of the signal, back to the same
Ten 6 operations are primarily involved in setting and manipu- operation if the end of the list hasn’t been reached. The net result,
lating the signal bit. Observe that the test of equality ( b = 20 and when the end of the list is reached, is that the location of the
21) is identity of symbols. Since there is nothing in the system last word on list L,,, rests in L,,,. Since in this example he wants
that provides a natural ordering of symbols, inequality tests like to go down to the end of the sublist of the last word on the main
s> lL,, are impossible. (E, means the symbol in Lo.)It is neces- list, he next performs (31, 1, Lloo).This operation replaces the
sary to be able to detect the responsibility bit ( b = 22), since there location of the last word with the name of the last list, L,,,,. Now
are occasions when the explicit structure of lists is important, and the search down the sublist is repeated until the end is again
not just the information they designate. Finally, although the signal reached, at this point the location of the last symbol on the last
bit is just a single switch, it is necessary to have two symbols, one list is in L,,,, as desired. The sequence of code follows:
corresponding to “signal on” and the other to “signal off” ( b = 26
and 27), so that the information in the signal can be retained for Location Symbol Link
later use ( b = 28 and 29).
The sense of the signal is not arbitrary. In general “off” is used
to mean that a process “failed,” “did not find,” or the like. Thus,
in operations h = 6 and 7, the failure to find a “stop interpreta-
tion” operation sets the signal to “off .” Likewise, the end of a list
will by symbolized by setting the signal to “off.”
List operations
Both the “save” and “delete” operations are used to manipulate The operations, b = 33 and 34, allow for inserting symbols in
lists, but besides these, several others are needed. The three opera- a list either before or after the symbol designated. The lists in
tions, 6 = 30, 31, 32, allow for search over list structures. They this system are one-way: although there is always a way of finding
can be paraphrased as: “get the referent,” “turn down the sublist,” the symbol that follows a designated symbol, there is no way of
and “get the next word of the list.” They all have in common that finding the symbol that precedes a designated symbol. The “insert
they replace a known symbol with an unknown symbol. This before” operation does not violate this rule. In both operations,
Chapter 30 1 A command structure for complex information processing 359
Execute subroutine ( b = 1 )
Fig. 6. Example of finding last item of last sublist.
When “execute s” is to b e interpreted, the address register already
contains the location of s, which was brought in during the first
33 and 34, a cell is obtained from the available space list and stage of the interpretation cycle. L,, the current instruction
inserted after the word holding the designated symbol. (This is address list (CIA), holds the address of the instruction containing
identical with the first step of the “save” operation.) In the “insert the “execute” order. A “save” operation is performed on L,, and
before” operation (b = 33) the designated symbol, s, is copied into s is transferred into L,, which ends the operation. The result is
the new cell, and 1L, is moved into the previous location of s. to have the interpreter interpret the first instruction on the next
In “insert after” ( b = 34), the designated symbol is left unchanged, sublist, and to proceed down it in the usual fashion. Upon reaching
and lL, is moved into the new cell. In both cases lL, is moved, the terminate operation, b = 10,the delete operation is performed
that is, it no longer remains at the head of the communication on E,, thus bringing back the original instruction address from
list. which the subroutine was executed. Now, when the interpretation
cycle is resumed, it will proceed down the original list. Thus, the
Other operations two operations, save and delete, perform the basic work in keeping
This completes the account of the basic complement of operations track of subroutine linkage.
for the IPL computer. These form a sufficient set of operations
Parallel programs
to handle a wide range of nonnumerical problems. To do arith-
metic efficiently, one would either add another set of b’s covering A single program structure, that is, a routine with all its sub-
the standard arithmetic operations or deal with these operations routines, and their subroutines etc., requires a CIA list in order
externally via a breakout operation on b (not formally defined here) to keep track of the sequence of control. In order to have a number
that would move a frill symbol into a special register for hardware of independent program structures, a CIA list is required for each.
interpretation relative to external machines: adders, printers, L, is the fixed register which holds the name of the current CIA
tapes, etc.
The set of operations has not been described for reading and
writing the various parts of the word: b, c, d, e, and f (although
it may be possible to automatize this last completely). These
operations rarely occnr, and it seemed best to ignore them as well LIOO
Interpretation
This section will describe in general terms the machine interpre-
tation required to carry out the operation codes prescribed. There
is not enough space to be exhaustive, therefore selected examples
will be discussed. Fig. 7. Information transfers in c = 2 operation.
360 Part 4 1 The instruction-set processor level: special-function processors Section 4 1 Processors based on a programming language
list. The name of the CIA list for the program structure which completely oblivious to the processing and structure that were
is to be reactivated on completion or interruption of the current involved in determining what was the first symbol of data. Simi-
program structure is the second item on the L, list, etc. Therefore, larly, although it is not shown, the processing program is able to
the L, list is appropriately called the current CIA list. The “save” get the second symbol of data at any time simply by doing a
and “delete” operations are used to manipulate L, analogously “continue parallel program lL,,,” ( b = 7).
to their use with L, previously described. One virtue of the use of data programs is the solution it offers
Appendix 3 gives a more complete schematic representation for “interpolated’ lists. In working on a chess program, for example,
of the interpretation cycle. It has still been necessary to represent one has various lists of men: pawns, pieces, pieces that can move
only selected b operations. more than one square, such as rooks, queens, etc. One would like
a list of all men. There already exists a list of all pieces and a
list of all pawns. It would be desirable to compose these lists into
Data programs a single long list without losing the identity of either of the short
In the section on list operations a search of a list was described. lists, since they are still used separately. In other words form a
There the data were passive; the processing program dictated just list whose elements are the two lists, but such that, when this list
what steps were taken in covering the list. Consider a similar of lists is searched it looks like a single long list. Further, and this
situation, shown in Fig. 8, where there is a working cell, L,,,, is the necessary condition for doing this successfully, one cannot
which contains the name of a list, L,,,. L,,, is a data program. afford to make the program that uses this list of lists know the
There is a program that wants to process the data of L3,,,, which structure. The operation “execute s” ( b = 1)is precisely the opera-
is a sequence of symbols. This program knows L,,,. To obtain the tion needed to accomplish this task in a data program. It says “turn
first symbol of data, it does (6,1, L,,,), that is, “execute the parallel aside and go down the sublist s.” Since it does not have the opera-
program whose name is in L,,,.” The result is to create a CIA tion b = 0, it is not “data.” It is simply “punctuation” that
list, L,,,, put its name in L,,,, and fire the program. Some sort describes the structure of the data list, and allows the appropriate
of processing will occur, as indicated by the blank words of L,,,. symbols to be designated. Figure 9 shows a data list of the kind
Presumably this has something to do with determining what the just described. The authors have taken the liberty of writing in
data are, although it might be some bookkeeping on L,,,’s experi- the names of the chessmen.
ence as a data file. Eventually L,,, is reached, which contains (0, The stretch of code that follows shows the use of a data program
1, This operation stops the interpretation, and returns con- for a “table look up” operation. The table has arbitrary arguments,
trol to the original processing program. The first symbol of data each of which has a symbol for its value. A,, A,, etc. have been
is defined to be lL8,,. The processing program can designate this used to represent the arguments. To find the value corresponding
by 4L,,,, since the sequence of c = 4 prefixes in L,,, and L , to argument A,, for example, A, is put in the communication cell
pass along the interpretation until it ultimately becomes IL,,,. with (14, 0, A,). Then the data program is executed with (6, 0,
Now the processing program can proceed with the data. It remains .)&J Control now lies with the table, which tests each argument
against the symbol in the communication lists: Le., A,, and sets
the signal accordingly. The program stops interpreting ( b = 8) at
the word holding the value only if the arguments are the same.
In this case it would stop, designating L,,,. If no entry was found,
Before 8.i.Lloo After 8,1,Lloo
of course, control would return to the inquiring program with the
L u 3 o ~ L G TL I o o r I L m T l signal off.
L LlOO. , .. . .. . ... . . . .
t
Fig. 8. Example of a data program.
Chapter 30 I A command structure for complex information processing 361
programmer; e.g., the save and delete operations, one can think
of alternative ways to realize an IPL computer. At one extreme
L
,, I j,O. Lzoo H1 , o , L , o o T are interpretive routines on current computers, the method that
the authors have been using. This is costless in hardware, but
expensive in computing time. One could also add special opera-
tions to a standard repertoire to facilitate an interpretive version
+,
O,O, Queen
O,O, K-Rook + of the language. Probably much more fruitful is the addition of
a small amount of very fast storage to speed u p the interpreter.
Finally, one could wire in the programs for the operations to get
even more speed. It is not clear that there is any arrangement more
direct than the wired in program because of the need of the inter-
Fig. 9. Application of a data program to chess. preter to use the whole capability of its own operation code.
References
Conclusions
ShawJ58; BernA58; BrooF57b; KistJ57; NeweA56, 57a, 57b, 58
The purpose of this paper has been to outline a command structure
for complex information processing, following some of the concepts APPENDIX 1 c OPERATIONS (DESIGNATING OPERATIONS)
used in a series of interpretive languages, called IPL’s. The ulti-
c Nature of operation for ( a ) = b c d e.
mate test of a command structure is the complex problems it
allows one to solve that would not have been solved if the coding 0 (a) is the symbol s.
language were not available. 1 d is the address of the symbol s.
At least two different factors operate to keep problems from 2 d is the address of the address of the symbol s.
being solved on computers: the difficulty of specification, and the 3 d is the address of the address of the address of the symbol s.
effort required to do the processing. The primary features of this 4 d is the address of the designating instruction that deter-
command structure have been aimed at the specification problem. mines s.
The authors have tried to specify the language requirements for
5 d is the address (name) of a process that determines s.
complex coding, and then see what hardware organization allowed APPENDIX 2 b OPERATIONS
their mechanization. All the features of delegation, indirect refer-
encing, and breakout imply a good deal of interpretation for each b Nature of operation
machine instruction. Similarly, the parallel program structure SEQUENCE-CONTROL OPERATIONS
requires additional processing to set up CIA lists, and when a data 0 Stop interpreting; return to previous program structure.
symbol is designated, there is delegated interpreting through 1 Execute process named s.
several words, each of which exacts its toll of machine time. If 2 Interpret instruction s.
one were solely concerned with machine efficiency, one would 3 Transfer control to location s.
require the programmer to so plan and arrange his program that 4 Transfer control to location s, if signal is on.
direct and uniform processes would suffice. Considering the size 5 Transfer control to location s, if signal is off.
of current computers and their continued rate of growth toward 6 Execute parallel program s; turn signal on if stops; off if not.
megaword memories and microsecond operations, it is believed 7 Continue parallel program s; turn signal on if stops; off if not.
that the limitation already lies with the programmer with his 8 Stop interpreting, if signal is on.
limited capacity to conceive and plan complicated programs. The 9 Stop interpreting, if signal is off.
authors certainly know this to be true of their own efforts to 10 Terminate.
program theorem proving programs and chess playing programs, 11 Halt; proceed on go.
where the IPL languages or their equivalent in flexibility and also
in power have been a necessary tool. SAVEAND DELETE
OPERATIONS
Considering the amount of interpretation, and the fact that 12 Save s.
interpretation uses the same operations as are available to the 13 Delete s (and everything for which s is responsible).
362 Part 4 I The instruction-set processor level: special-function processors Section 4 I Processors based on a programming language
Summary A system design is given for a computer capable of direct language translation is accomplished, is a waste of time and money
execution of FORTRAN language source statements. The allowed types to the user since he must pay for this time though he gets no
of statements are the FORTRAN DO, GO TO, computed GO TO, Arith- problem answers from it. Secondly, the user has specified the
metic, READ, PRINT, arithmetic IF, CONTINUE, PAUSE, DIMENSION logical flow and arithmetic details of his solution in the source
and END statements. Up to two subscripts are allowed for variables and language. However, when the machine “hangs up” or when he
no FORMAT statement is needed. The programmer’s source program is
attempts to debug his program, all he finds displayed on the
converted to a slightly modified form while being loaded and placed in a
machine console is the machine language. (On large machines he
Program Area in lower memory. His original variable names and statement
numbers are retained in a Symbol Table in upper memory, which also serves gets equivalently an esoteric print-out in a symbolic form of
as the data storage area. During execution of the program each FORTRAN machine language.) To overcome these difficulties one could use
statement is read and interpreted at basic circuit speeds since the machine an interpretive translator of the source language instead, but the
is a hardware interpreter for these statements. The machine corresponds historical deficiencies of interpreters, loss of memory space and
therefore to a “one-pass,load-and-go”compiler except, of course, that there loss of speed of execution have caused this solution to be shunned.
is no translation to a different machine language. It is estimated that the Another solution is also possible-design a machine which
control circuitry for this machine will require on the order of 10,000 diodes executes an algebraic language directly as its “machine language.”
and 100 flip-flops. This does not include arithmetic circuitry. This approach is based on a recognition that once the allowable
syntax and associated semantics of language statements have been
T e r n Digital computer system, digital machine design, direct
firmly specified it is a matter of choice whether to write a compiler,
.ion of FORTRAN, FORTRAN computer system, FORTRAN lan-
to write an interpreter or to build an interpreter out of hardware.
machine, hardware interpreter.
The software choice has been almost overwhelmingly to write a
compiler. Since the choice of hardware interpreter, or machine,
Introduction has not been made, and in fact has hardly been explored to any
great extent, a study has been made in order to see if this choice
The algebraic languages, in particular FORTRAN in this country,
leads to a system which is competitive with the usual software
have had enormous impact on the utilization of computers for
system. It should be understood that such a machine has not been
scientific and engineering computation. They were designed in
constructed. However, the design2 is sufficiently complete that
large part to overcome the annoyance of lengthy learning time
construction seems feasible.
and the laborious attention to detail needed to use a basic machine
language.
These annoyances are overcome by providing a language which
is closer to English in form, and freer of “bookkeeping” details, Language-design philosophy
than the usual machine languages, and by providing a machine Since the machine language is to be an algebraic one it seemed
language program, called a compiler or translator, to convert from reasonable to choose a simple subset of the most commonly used
the source program written by a user to an object program execut- one, FORTRAN. This eliminates the necessity for inventing still
able by a computer. Thus the original drawbacks are overcome another such language and allows attention to be focused on
but the discrepancy between the external language of the user machine design. In fact, the subset chosen is quite close to that
and the internal language of the machine leads to at least two known as “Preliminary FORTRAN for the IBM 1620,” which is
others. The compilation run of the machine, during which the complete enough to be quite useful, but which does not include
‘ I E E E Trans., EC-16, vol. 4,pp. 485-499, August, 1967. 2See h a 1 technical report for Contract AF 19(628)-2798.
363
364 Part 4 1 The instruction-set processor level: special-function processors Section 4 1 Processors based on a programming language
such innovations as subroutines, etc. In addition, the usual “built READ, List These statements cause data to be read
in” subroutines SIN (x), COS (x), etc., are not included. Their in- PRINT, List or printed, respectively, in accordance
clusion would require additional effort for their hardware imple- with the specified list of variables which
may be subscripted; however, the “implied
mentation which did not appear to be worth expending at this
DO” feature h a s not been implemented.
time. No FORMAT control is available with this
The FORTRAN statement types which are accepted by the machine, therefore no statement number
machine as machine language are in the table that follows.’ need be given.
DIMENSION u, u , . . . This statement has the effect of reserv-
Stutement Comment ing memory space for the subscripted
variables G. Each u stands for a variable
a=b The value of the arithmetic expression b name followed by parentheses enclosing
is stored in the memory location referenced one or two constants.
by the variable name a, which may have
up to two subscripts.
GO TO n Program control is transferred to the No distinction is made in this machine between fixed (integer)
statement numbered n. and floating point (real) variables. These may have names of any
length, starting with any alphabetic character.
GO TO (nl, n2, . . . , nm),i Program control is transferred to one of
the statements numbered nl, n2, . . . , n, Fixed point constants may be specified, in a program or as data,
depending on the value of i at the time as any combination of one to four numeric characters preceded
this statement is executed. by a + or - sign. however, these are converted to an internal
Program control is transferred to the decimal floating point number and so there are no restrictions on
statement numbered nl if the algebraic “mixed mode” expressions. Statement numbers must be unsigned
expression e is negative, to that num- fixed point constants, which are not so converted since they only
bered n2if e is zero, and to that numbered affect program control and not arithmetic processing.
n3 if e is positive.
Floating point comtants are specified in the form of a mantissa
PAUSE Program execution is halted until restarted of one to four numeric symbols preceded by a decimal point (and
by console switch.
a + or - sign). These are followed by the character E and a single
DO n i = ml, m2, m3 All statements following this one in the (positive or negative) digit representing the power of ten in the
program, including the statement num- usual scientific notation.
bered n. are executed repeatedly. The
These constraints on number size and format are made to
first execution is with i equal ml, i is in-
cremented by the value of m3 before each simplify certain circuits and could easily be relaxed if desired. The
succeeding execution. This continues until restriction to a two-subscript maximum for subscripted variables
i is greater than m2 at which time pro- is similarly motivated.
gram control is transferred either to the Internally, all numerical data require three %bit words (Fig.
statement following n or to that statement 1).The first two words contain the four-digit mantissa, packed two
required by the DO sequencing rules for
per word in a 4-bit code for each digit. A decimal point is assumed
DO nests. If m3 is not given it is under-
stood to be 1. to exist to the left of the most significant digit. The most significant
two bits of the third word are zero. The third bit is 0 if the
CONTl N U E This statement has the effect of the “no
operation” instruction in conventional mantissa is positive, or 1 if it is negative, and similarly the fourth
machines. Program control goes to the bit is 0 or 1 if the exponent is, respectively, positive or negative.
next statement in the program unless the The single exponent digit occupies the least significant four bits
CONTINUE is the last statement in the of this word. All other characters occupy a full 8-bit word of which
range of a DO. In this case normal DO the two most significant are 1’s. Any numeric characters which
sequencing takes place.
are symbols of a variable, e.g., the “2” in ABZX, also occupy a
END This statement generates a control signal full word of this type. Statement numbers are simply packed 2
to start execution of the program. digits per word and always occupy 2 full words.
Some familiarity with the FORTRAN language is assumed. Before proceeding with the description of the overall charac-
Chapter 31 I System design of a FORTRAN machine 365
i
Word1{1011 1 0 1 1 1 0 1 1 I l l 1 1 The memory around which the machine is designed is a 4096-
Mantissa word, 8-bit-per-word, random-access core mem0ry.l It is treated
[I I by the control circuits as though it consisted of three distinct
-
Word 2 0 1 0 1 1 1 1 1 1 10 1 0 1 1
regions.
Word 3 {I I I I I 1 1 1 I
0 0 0 1 0 1 0 0 1 Input/output ( I / O )buffer: One statement at a time is loaded
sequentially into memory locations 0-99. The six-bit paper
-\\
Not used Exponent
Exponent sign
Number sign
tape codes are first converted to internal (often different)
six-bit memory codes and stored in the six least significant
positions of the &bit words. The carriage return symbol is
encoded into a special “end-of-statement” symbol repre-
Fig. 1. Data format in memory. sented in the paper as “$.” When this symbol is read the
tape is also automatically stopped.
2 Symbol table area: Memory locations 4095 and sequentially
teristics of a machine that loads and executes the language speci-
downward in memory hold the programmer’s names for
fied above, it may be well to indicate two basic design goals.
variables, statement numbers, etc., as well as “pointers” to
machine addresses, plus empty (before execution) locations
1 The card deck or tape containing the Hollerith or BCD for data.
version of the English language form of a source program
should be the only deck or tape required at any time to 3 Program area: Memory locations 100 and sequentially up-
execute the program. ward hold the FORTRAN program, in a slightly modified
form.
2 Once this program is loaded into memory and execution
started, any look “into the machine” should reveal infor-
mation in the same form in which it was entered. Thus if Operating modes
the program is executing X = A + B , then one should find
“ X ” , “=”, “A”, “+ ”, “R”, at least in their BCD form. The load mode circuits control the input of FORTRAN statements.
They place certain information in the Symbol Table Area and the
The second goal has been compromised somewhat as far as the modified form of the FORTRAN statements in the Program Area.
internal representation of the program is concerned in the interest It is while in this mode that the necessary searches for variable
of execution speed. However, all such compromises have been kept names take place and machine addresses are assigned. These ad-
dresses replace portions of the variable names in the statement
to a minimum. In addition, the mechanisms by which one can take
such looks “into the machine” are such as to conceal these com- as it appears in the Program Area. Similar processing replaces
promises. programmer-assigned statement number references in the Program
Area with various internal “pointers” for control of GO TO, DO,
and IF statements. This modification is done so that statement
Memory organization execution in the execute mode can proceed at high speed. In short,
the FORTRAN statement in the Program Area is modified to the
The machine is, in effect, a hardware version of on “one-pass-
extent that variable names are replaced by actual data addresses
load-and-go” compiler and it operates in two modes. In the load and statement number references are replaced by actual addresses
mode FORTRAN statements are read. They are analyzed as re- of statement locations in the Program Area. This translation is
quired and stored in memory. When the last statement has been done once only, when the statement is analyzed in the load mode.
stored, the execution mode is entered and program execution It might be noted here that because of the “one-pass’’ nature of
begins at the first executable statement that was read. The input/
the translation (a given statement is analyzed only once), certain
output device for the machine design is a Flexowriter Model SPD.
Programs are assumed to be punched onto a paper tape, one ‘5-ps cycle time, EE Co Model 781.
366 Part 4 I The instruction-set processor level: special-function processors Section 4 1 Processors based on a programming language
of the pointers correspond to indirect addresses. Figure 2 shows It is put into the Symbol Table followed by the value
a sketch of the overall system control and Tables 2 to 7 show to of the current Program location. The statement number
what extent the original statements have been altered. is also put.into the Program Area starting at this location
and the Program Counter incremented appropriately,
Le., by 2 since two 8-bit words are used.
Loading a program b The statement number is found in the Symbol Table
because it has been previously referred to by an IF or
A program, which is punched in a paper tape, is loaded into
GO TO. The current value of the Program Counter is
memory by energizing the tape read circuit which reads a state- placed into the two memory locations following the
ment on the tape, including the end-of-statement symbol &, into statement number. (These were left blank when the
the 1/0 buffer. The read circuit is then de-energized. The least statement number was previously processed.) The state-
significant 6 bits of each word of the buffer hold the internal BCD ment number is put into the Program Area and the
representation of each symbol. Program Counter is incremented.
A scan circuit (Fig. 3) now picks u p each symbol in the state- c The statement number is found in the Symbol Table
ment from left to right and as each symbol is decoded it reacts because it has been previously referred to by a DO
as follows. statement. A description will be deferred until the DO
statement loading is described since the circuit's behav-
1 If the first symbol is a digit, control is turned over to a ior is more meaningful in that context.
Statement Number Load circuit. This circuit shifts the 2 After a statement number has been processed in this fashion
statement number digit by digit into a register (SHR). The or if the first symbol in the statement was not a digit (no
maximum allowable length of a statement number is 4 digits statement number was assigned) then the scan circuit con-
and all statement numbers are carried internally in this tinues to pick up each symbol from left to right until it
form, i.e., a programmer's statement number 13 is carried is able to classify the statement as to type. It then turns
in 2 words as 0013. A search is now made of the Symbol over control to the appropriate loading circuit as indicated
Table area. One of three possibilities exists: in Fig. 3.
a The statement number is not found in the Symbol Table.
All of these loading circuits put the statements into the Pro-
gram Area after replacing variable names and statement number
references in the program with addresses or pointers. They also
replace reserved names such as GO TO or CONTINUE with a
I single 8-bit code (token). Each unique variable name in the pro-
gram, however, is also stored in the Symbol Table once using an
8-bit code for each symbol. For nonsubscripted variables the three
words following the name are reserved for the data that will be
Input- Program
T 1/0 buffer
t
7 Execute associated with this name when the program is executed. Sub-
scripted variable names are found in DIMENSION statements
which must precede the use of these variables in the program.
In this case as many locations following the name are reserved
as have been computed from the DIMENSION statement. The
name in the Symbol Table is preceded by a special symbol a , to
output
indicate that it is a subscripted variable. In addition, the first of
the two subscript values in the DIMENSION statement is also
stored immediately following the name. This number is needed
during program execution for constructing the proper element
Read /print of the array specified by a subscripted variab1e.l The address of
--
'A pointer to the next available location in the Symbol Table is also stored
Fig. 2. FORTRAN computer system. for speed in Symbol Table searching.
Chapter 31 I System design of a FORTRAN machine 367
- statement
I-
Process
DIMENSION
P- Process
ARITHMETIC
DO Process
DO
Process
GO TO
h
D
-
COMPUTED GO TO Process
“On COMPUTED GO TO
Paper-tope
control
program switch
CKT Process
Off I ‘
PRINT
Process
PRINT
.
IF J Process
IF
PAUSE Process
PAUSE
L CONTINUE
’CONTINUE
Process
Process
end
the data location replaces all symbols of the variable name in the (SMU). These circuits indicate either that the name or statement
Program Area except for the first. This symbol, which must be number is already in the Symbol Table or it is not. Thus the first
alphabetic, is retained in the Program Area as an indicator that appearance of a variable name, statement number, or reference
this is indeed a variable. All special symbols such as (,), -, +, to a statement number causes it to be put into the Symbol Table.
etc. are simply stored sequentially in the Program Area in the &bit Subsequent references merely utilize these previously assigned
BCD form as they appear in the original statement. data or Program addresses. Therefore each name or statement
Statement numbers in IF and GO TO statements are similarly number is stored in the Symbol Table only once with an exception
replaced by the address in the Symbol Table which holds the noted below. In general, the programmer’s statement is altered
address in the Program Area of the statement having that number. only in the above described fashion. However, for ease of execution
Note that this is an indirect address to the statement. Statement the computed GO TO has its index parameter name, i.e., the “i”
numbers in DO statements are dealt with somewhat differently in GO TO (nl, n2. . . . , nm),i, changed from the position following
as will be explained later. Because variable names and statement the parenthesis to a position preceding the parenthesis,
number references can appear many times in a program, these The DO statement requires the most complex loading algo-
searches of the Symbol Table are controlled by two special circuits, rithm. Basically, the idea is to place the DO statement itself,
the Variable Match Unit (VMU) and the Statement Match Unit essentially unchanged, into the Program Area but to extract the
368 Pari 4 I The instruction-set processor level: special-function Drocessors Section 4 I Processors based on a programming language
range statement number (which specifies the last statement in the functions is the same in each case. From the English language
range of the DO) and put it into the Symbol Table. It is there description of the function a sequential circuit state diagram is
preceded by a special symbol A, designating it as being referenced constructed. The circuit is then synthesized from the state diagram
by a DO, and followed by the Program Area address of the corre- using established methods. The state diagrams of the Arithmetic
sponding DO statement. The DO statement in the Program Area Statement Loading circuits and the Variable Match Unit, which
has its original statement number replaced by a special symbol, are used during Loading, are shown in the Appendix.
A, and an internal address which is determined as follows (see The hardware implementation of the state diagram of the
Table 6). Variable Match Unit is also described there.
I
4087 00 Representation of
4086 05 Statement 5
4085 02
0106
0107
0108
I
40 Address of the address
8 1 of Statement 10
DO n i = m,, m2, m3 (or DO n i = m,, m2)
This circuit is energized (i.e., caused to leave its initial state) either
4084 50 0109 40 Address of the address by a DO token or by the h token. Its action is different in these
4083 00 0110 77)of Statement 150
two cases and will be described separately.
4082 10
I
408 1 03 Address of
4080 50 Statement 10
0111
0112 *
)
Table 4
4079 0 1
4078 50
Symbol table Program area
4077 05 Address of 00
1
4076 53 Statement 150
0250
0251 05
Address contents Address contents
0100 IF
4095 A
4094 0101 (
4093 0102 A
0350 00 4092 0103 40
0351 10 409 1 B 0104 94
4090 0105 -
4089 0106 B
4088 0107 40
0553 01 4087 00 0108 90
0554 50 4086 10 0109 )
4085 03 Address of
4083 00
I
4084 50 Statement 10
0 1 10
0111
0112
of three signal lines depending on whether the number is zero,
positive, or negative. The IF circuit senses these lines and reacts
as follows.
4082 20
4081
4080
0 1 13
0114
0115
I
40 Address of the address
8 1 of Statement 20
Table 6
B
3484 01 14 04 3453 A 0145 40
3483 I 0115 3452 0146 91
3482 T 01 16 L 345 1 0147 (
348 1 0117 34 3450 0148 1
3480 0118 77 0149 34
3479 0119 $ 0150 81
3478 L 0120 DO 0151
3477 0121 h 0152 J
3476
3475
3474 A
0122
0123
0124 J
I
01 Address of preceding
01 DO in the nest
0153
0154
0155
34
68
)
3473 00 0125 34 0156 $
3472 05 0126 68 0157
-
-
3471
3470
3469
I
01 Address of 2nd
21 DO in nest
J
0127
0128
0129
w
34
3468 0130 64
the letter E and a single positive or negative digit indicating a Both registers are set to zero initially. If the first character is a
power of ten. Numbers must be separated by a comma to distin- minus sign, the bit in the mantissa sign position of X is set to one.
guish them, since no FORMAT information is available and the (The internal form of data representation was described earlier
read circuits ‘‘squeeze out” blanks. in the section on Language-Design Philosophy.) If it is a plus sign
The first set of digits starting at the beginning of I/O buffer, no action is required since a zero in the mantissa sign position
memory address 0, is read into a 24-bit register (which is the size indicates a positive mantissa. Further action depends on the next
of the three &bit memory words required for data). Numerical character.
information in the 1/0 buffer is in a 6-bit code. The two most
1 If the next character is numeric (or if there was no sign
significant bits are 0 if the code is for a numeric character. The given and the first character is numeric) this must he a fixed
Q
placing of information into the 24-bit register is easier to under- point constant. The four bits of numeric information are
stand if we consider it as a 16-bit mantissa register M , which can gated to the least significant four positions of register M .
hold four decimal digits, and an 8-bit sign and exponent register If the next character is numeric, M is shifted left four posi-
X , which can hold 2 bits of sign information and an exponent digit. tions and this character is also gated to the least significant
Chapter 31 1 System design of a FORTRAN machine 373
Table 7 most significant two digits in M are stored and it is then decre-
mented appropriately to store the remainder of the data.
Symbol table Program area
The remaining data in the 1/O buffer are then stored one by
Address contents Address contents one in sequence at the addresses given by the remainder of the
READ list. A subscripted variable on this list requires additional
4095 A 0100 DO
4094
arithmetic operations to compute the correct address from the
00 0101 h
4093 05 0102 01 current index values and the original DIMENSION information
4092 01 0103 22 stored in the Symbol Table. These operations will be given later
4091 01 0104 I in the Arithmetic Statement description.
4090 I 0105 40 When the $ token in the I/O buffer is reached, the next char-
4089 0106 89
acter in the READ list is read. If this character is also the $. token
4088 0107
4087 0108 00 then the circuit returns to its initial state. If, however it is not,
4086 00 0109 01 then the Flexowriter is again energized such as to read data into
4085 05 0110 04 the 1/0 buffer, and processing proceeds as before until reading
4084 01 0111 , of the $ of the READ statement returns the circuit to its initial
4083 16 0112 01
state.
0113 50
0114 04 The PRINT statement circuit operates in almost exactly inverse
0115 % fashion and will not be described in detail. The list variables are
0116 00 used in sequence to extract data from the proper memory locations
0117 05 and place it in the M and X registers. The contents of these regis-
I
0118 01 Address of the
0119 01 DO statement
0120 CONTINUE
ters are then put sequentially into the 1 / 0 buffer, together with
6-bit codes for the decimal point, plus and minus signs, commas,
0121 ? and the E symbol at appropriate places. All data are thus output
0122 in floating point form. When the $. token is read, the Flexowriter
print circuits are energized and the circuit returns to its initial
state.
position. This continues until the comma is read. The nu- Example.
meric code for four is now gated to the least significant four
positions of X . Since the arithmetic unit assumes a decimal READ, A, B, C(I, I ) $
point at the left of all data, this action insures that a fixed PRINT, B, C(I, I)+$
point number is properly interpreted.
The appearance of the Symbol Table and Program Area should
2 If the next character after the sign (if there is one) is a be apparent from previous examples. Since this would add little
decimal point this must be a floating point number. In this to the description of circuit action they will be omitted.
case the following digits are stored into M as indicated
above, but three shifts of M are always taken, whether or a=b
not four digits are stored in M . This is required to insure
proper interpretation of the number. If a comma follows The Arithmetic Statement execution unit is energized by any 8-bit
the series of digits no further action is taken. If an E follows alphabetic character code. This first character of the variable name
then the digit following it is placed in the least significant represented above as “a” is discarded. Then either the following
4 positions of X . If a minus sign is found following the E four-digit data address is saved or the data address of a subscripted
a setting of the exponent sign position of X precedes this variable is computed and saved in a register. After reading and
action. The comma is then read. discarding the = symbol, the circuit executes the expression h
in accordance with the given sequence of arithmetic operator
After this first piece of data has been placed in M and X , the symbols, +, -, *, /, which are used to control the arithmetic
alphabetic character following the READ token is read and dis- unit. The partial results at any time during the execution are stored
carded. The next 4 digits are used as the address in which the in the 1/0 buffer area which is, of course, otherwise unused during
374 Part 4 I The instruction-set processor level: special-function processors Section 4 I Processors based on a programming language
Arithmetic Statement execution. These storage areas for partial Therefore the first right parenthesis after the F causes 1,
results are called di,, di,, where i specifies the “level” at which to equal zero. This condition causes the value stored in
computation is taking place,, i is equal to zero until a left paren- d,, to be placed in the SR. The value of i is decremented
thesis is encountered which increases the current value of i by to 2.
1. An exception occurs if the left parenthesis immediately follows tz being 3 (and t + ,
= t * = 0) causes the computation,
the = symbol. In this case the level remains at zero. It is also d,, + d,, * SR to be stored in dzo. The next two paren-
necessary to store control information which relates to these par- theses after F caiise I, to equal zero. Therefore, this result
tial results. is placed in the SR. The value of i is decremented to 1.
Two control values are required at every level. The count of ,
Since t, is equal to 2 and t * is equal to 1 the computation
left parentheses at any i level is stored as a number, Zi. Before d,,/SR is made and stored in d,,. The final parenthesis after
i is incremented, the incompleted arithmetic operations still re- the F causes 1, to equal zero. Therefore this result goes to
quired at the current level are indicated by giving an indicator SR. i is decremented to zero.
t, the value 1, 2, or 3. Also needed are indicators t + and t * to ,
Since to is one and t + is zero the computation, d,, + SR,
distinguish + from - and * from /. To clarify the significance of is made and the result is stored in do,.
these control values an analysis will be made of the following ex-
The + G causes the computation do, + G to be made and
pression, which contains some unneeded but legitimate sets of
stored in do,. The final two parentheses cause 1, to be zero;
parentheses: therefore the value in do, is placed in SR. (If another right
A = ((B + ( C / ( ( D+ E * ( F ) ) ) )+ G))$ parenthesis were found, this would cause an error condition
to be indicated.) The 3 symbol causes the contents of SR
1 The circuit reads and saves the address of A, then reads to be stored at the previously saved memory address for A.
and discards the = which puts the circuit at the level i = 0.
The first two left parentheses cause I, to be set to 2. The Any subscripted variable addresses are computed easily from
value of B is stored in do,,. The plus sign followed by a left the initial DIMENSION statement information, saved in the Sym-
parenthesis cause the indicator to to be set to 1 to indicate bol Table, and the current value of the subscripts. Assume the first
the condition “ B + (”. Since we might in other cases find data location for an array A(1,J ) is stored at a location Abase + 1.
“ B - (”, to is set to zero to indicate the plus sign. If the DIMENSION statement read DIMENSION A(5, 10) then
2 The left parenthesis also causes i to be incremented to one + +
the computation, Abase 5 * (J - 1) I, gives the correct data
and since it is the only one at this level, I , is also set to address for any nonzero value of I and J . (This is true only if a
1. The value of C is stored in d,,,. The division symbol complete data word is stored per memory word; in this machine
followed by a left parenthesis causes t , to be set to 2 to the expression is slightly more complicated.)
indicate the condition “C/(”. Since we might find “C*(” in In this machine the partial result locations d,, and d,, are
other cases, t * is set to 1 to indicate the division. actually 3 words long, of course, to accommodate the data. An
3 The left parenthesis also causes i to be incremented to 2 additional word is used to store control information where 4 bits
and the next left parenthesis increments 1, to 2. The value are used for ti. t+ ,, and t * and the remaining 4 bits for the
of D is stored in d,, and the value of E put into d,,, respec- Zi count. The i counter therefore is actually incremented or decre-
tively. The multiplication symbol followed by a left paren- mented by 7 instead of one. Thus at any level, of which there
thesis causes t, to be set to 3 to indicate the condition
can be 14 since the 1/0 buffer is 100 words long, the li count
+ , ,
“ D E * (”. t + and t * are each set to zero to indicate
can be as great as 15. This is more than adequate since it allows
the plus and multiplication symbols, respectively.
for 210 left parentheses, which is much longer than the 1/0 buffer
4 The left parenthesis before the F causes i to be incremented length.
to 3 and Z:, to be set to 1. The value of F is placed in d3,. Since the appearance of the Symbol Table and Program Area
The Arithmetic Statement circuit always puts the final
would add little to this discussion, an example will be omitted.
value computed at any level into the arithmetic unit regis-
ter, SR. It does this whenever Zi = 0 for any i. Clearly Zi
must be decremented by one for each right parenthesis. Conclusion
‘Basic circuit operation at any level is described in the earlier report. See We have illustrated in some detail that a machine for direct trans-
page 363, footnote 2. lation of a simple algebraic language is possible. It would therefore
Chapter 31 I System design of a FORTRAN machine 375
seem that further investigation be made of the economic position is to set either the OK, AOK or EOL flip-flops. These flip-flops
of this solution vis-k-vis the software compiler solution. Unfor- respectively indicate that the ST either:
tunately, the present authors are not sufficiently versed in compiler
construction to make such a comparison. holds the variable in question as a result of previous loading,
The actual construction of such a machine as an independent or
unit is probably not reasonable except under particular circum-
that the variable is subscripted and has been previously
stances in which only small one-shot scientific problems form the loaded by the DIMENSION statement loading circuit, or
bulk of the computing. However, as an adjunct to a larger general
purpose machine, it may well serve a need as a hardware inter- that the End-of-List (EOL) token was found, indicating the
absence of the variable in the ST.
preter for widely used higher level languages.
As a result of a fairly complete design of the control circuits
of this machine, it is estimated that 10,000 diodes and 100 flip-flops The state diagram for this circuit is shown in Fig. 4. When
would be needed for these alone (not including arithmetic circuits). triggered by the START VMU signal in state 0, the circuit goes
The design techniques used are simple and straightforward but to state 1, the next clock pulse sends it to state 2 from which it
rather expensive. These designs should probably only be consid- starts its search of the ST. In going from 1 to 2, the 1/0 Counter
ered for use with integrated circuitry. (CIO) contents are saved in register SCIO since the name may
have to be scanned again. The Symbol Table Counter (STC) is
initialized to 4095 since the ST is scanned sequentially downward.
References
If a character of a variable name in the 1/0 buffer is found
AndeJ61; BashT64; International Business Machines Corporation, General in the corresponding position of a name in the ST, the character
Information Manual; FORTRAN, Form F28-807401,December, 1961; IBM is said to be matched. The VMU proceeds from state 2 to state
1620 FORTRAN: Preliminary Specifications, Form J2R-4200-2, April, 1060 3 if the first character of the name under scan matches. Otherwise
the state changes from 2 to 8, if the NO MATCH signal is given.
APPENDIX’
The MATCH or NO MATCH signals are generated as a result
The variable match unit (VMU) (Fig. 4) of comparing the contents of the ST location undergoing the scan
(the contents reside in the Memory Buffer Register, MBR), with
The Symbol Table at the end of the load mode should contain the contents of the register COMP which has the character from
all variable names used by the program, together with empty the 1/0buffer. The first character is put into COMP by the calling
locations reserved for data associated with these names. The Pro- circuit, thereafter the VMU picks them up in the 3-4 transition.
gram Area at the end of the load mode should have a program The C l O and STC counters are incremented and decremented,
in which all variable names have been modified in that only the respectively, and the VMU oscillates between states 3 and 4 as
first letter is retained, followed by the Symbol Table address of long as matching continues. This comparison process will termi-
the data associated with this name. Since any variable name may nate when, either an arithmetic operator So, is read from the 1/0
appear many times in a program, a search is required, during the buffer sending the circuit to state 6 from state 3, or the ST contents
loading, to see if the name already exists in the Symbol Table. cause a NO MATCH signal with respect to the contents of the
The search of the Symbol Table (ST) consists of comparing each COMP unit causing the transition from state 4 to 5.
name there with the variable name in the statement being loaded. In state 6, if a digit is next read from the ST, corresponding
All statements are loaded by an appropriate circuit of Fig. 3 from in position to the appearance of the operator from the 1/0 buffer
the 1/0 buffer and into the Program Area of the memory. There- clearly the names are the same and the OKFF is set to 1, and
fore the variable name in the Statement exists physically in the the transition from 6 to 0 is macle. On the other hand, if another
I/O buffer. alphameric character in the ST corresponds to an operator, So,
It is the function of the VMU to make this search when ener- in the 1/0 buffer, the names are not the same and the transition
gized or “called” by the loading circuits for DIMENSION, DO, from 6 to 5 is made. In state 5 the circuit just reads to the end
computed GO TO, READ, PRINT, IF and Arithmetic statements of the nonmatching name in the ST. A digit at the end of this
in which variable names appear. The output action of the VMU
name causes the transition 5-7 during which the STC is stepped
‘Symbols used in this Appendix are described in Table 8. over the 3 data locations to the next ST entry and the CIO reini-
376 Part 4 1 The instruction-set processor level: special-function processors Section 4 I Processors based on a programming language
R E A D (STC)
d/SET OKFF
c VMU/- \
*
/READ I/O
/
// :OL/SET EOLFF
a/ STC
R E A D (ST
d -+STCL
SAVE -STCM
/ N O MATCH/-
I-
d
d -STCL
SAVE-STCM
S C I O -CIO
R E A D 110
S y / t STC
READ (STC)
c/ /READ (STC)
tialized to the start of the name being sought. The first character is set to 1 and state 0 is reached. If the mismatch was due to a
in this name is read and placed in COMP as circuit goes to 2. A at the present ST location, the STC is decremented by 5 which
As stated earlier, when the first character from the 1/0 buffer steps over the 2 four-digit numbers stored after a A and the circuit
does not match the contents of ST, the state becomes 8. If the returns to 2 to try a match on the next ST entry. If the mismatch
mismatch was caused by the EOL token in the ST the EOLFF is caused by a digit then this is statement number information
Chapter 31 1 System design of a FORTRAN machine 377
-- ARITH STAT/-
v W
START VMU
A
3
+ -
I
PROG
,
9 CP-(SSAR)
8 LSFF.1 RES LSFF
START READ
/
'
/
,/
I
I /
I /ADJUST SHR
A /SHIFT SHR
-SHR
or it puts the data address into the program. The 8-bit BCD forms into the Symbol Table before going to state 5 . State 5 is that from
of the operator symbols are simply put into the program. The which all further loading is accomplished. Variable names are
constants are put into the program after conversion to machine separated by operators, which are loaded into the program by the
form. The state diagram of this circuit is shown in Fig. 5 . The cycle in state 5 (So -+ PROG). Note the convention that So repre-
scan circuit signal ARITH STAT sends the circuit from 0 through sents any operator symbol not explicitly specified on another exit
1 to 2. The scan circuit has saved the address of the beginning from 5. Any variable names cause a transition to state 3 with the
of this statement in a register SCIO. This is used to initialize the same output action as from state 2. Floating point constants are
CIO so that this statement can be read from the beginning. loaded via states 5-9-5. A decimal point indicates a floating point
The first symbol of an arithmetic statement, which must be a constant and takes the circuit to state 7. (Note that a minus sign
variable and not a digit, takes the circuit to state 3 after this preceding a constant is simply an operator and is processed in state
symbol has been put into the program (S, + PROG) and the VMU 5.) The SHR is cleared in preparation for the storing of the follow-
initialized and started. Any one of the VMU signals is possible ing digits in state 7. When E is received the digits of the fraction
and valid and simply forces the circuit to state 5. During the 3-5 in the SHR are left adjusted (ADJUST SHR), if there are less than
transition the circuit loads the appropriate address into the pro- four of them, and placed in the program area. The exponent sign
gram when the name has matched. If it has not matched any is found in the transition 8 to 9. The exponent digit together with
existing name the circuit first goes to state 4 and puts the name the exponent sign bit is stored in the program area during the
Chapter 31 1 System design of a FORTRAN machine 379
9 to 5 transition. Fixed point constants are handled in state 6. The for the VMU are listed at the end of this Appendix. The largest
important difference is that the digits are not left adjusted in the number of microsteps for a transition from one state to another
SHR and a 04 is put into the program as the exponent since a is 8, which occurs in the transition from state 8 to state 2. Once
decimal point is assumed to precede the first data word. See this maximum number of microsteps is determined, a control cycle
Fig. 1. counter is constructed, which can count as high as this maximum.
The $ takes the circuit to its initial state. If this statement Since in this case the number is 8 we need 3 flip-flops to realize
happens to be the last in a DO nest, the Statement Number Load it. In addition, a “one hot line” decoder is needed such that at
circuit has set the LSFF to 1. It has also put the ST address of each count one and only one line of the decoder has a “one” at
the word following the A symbol of the first DO of the nest into its output. Also needed is a state diagram counter which realizes
the SSAR register. Since the program counter (CP) now holds the the “skeleton” of the state diagram. This skeletal counter tells us
correct exit address for this DO statement it is placed at the which state we are in and which to change to, given the present
address given by the SSAR during the transition to state 0. During input signal or symbol. Thus the skeletal counter “knows” that
the transition the signal START READ is also sent to the paper if the circuit is in state 2 and a MATCH signal is present, it should
tape reader in order for it to put the next Statement into the 1/0 change to state 3 upon receipt of a change state signal. The real-
buffer. ization of such a skeletal counter has been described [Bashkow,
19641. Now we use the outputs of the skeletal counter which will
Hardware implementation of the VMU state diagram indicate to us the state we are in, the outputs of the decoder of
Each function mentioned in the paper plus some other auxiliary the control cycle counter, and the input lines (Sv, So, MATCH,
ones are initially represented in a state diagram form, such as the NO MATCH) and connect them as shown in Fig. 6. Each AND
state diagram for the loading of the Arithmetic Statement gate in this figure has 3 inputs except those not requiring input
(Fig. 5) and the Variable Match Unit (VMU) (Fig. 4). line information. One input comes from the input set (So, S,,
We will describe the method used to realize a circuit which MATCH, etc.). The second input comes from the state diagram
will perform the function defined by a given state diagram (SD). skeletal counter which indicates a unique state of the state dia-
As an example we will use the VMU. All the information needed gram, and finally the third comes from the control cycle counter.
is present on the SD. The operations on the right-hand side of The output of each AND gate is a line indicating a unique micro-
the “/,, in the SD are the output operations required to be per- step. The AND’s feed OR gates, which actually energize the given
formed. In order to implement these operations we must specify microstep. For example the output lead of the “READ” Or gate
the actual register gating signals, memory read and write signals, is connected to the “READ” terminal of the memory.
arithmetic unit signals, etc., required by them. We will call these If we assume that the control cycle counts in sequence 1, 2,
various signals the microsteps of an output operation. Therefore etc., then the lead numbered 1 will go to the first microstep of
to realize the SD of a given function we must implement the each sequence. The one numbered 2 will go the second, etc.
microsteps corresponding to the output operations. Therefore we see that the following microsteps should be executed
We begin by listing from the state diagram some output opera- in the order listed below for states 0, 1, 2, 5 of Fig. 4. The circuit
tions and their corresponding microsteps. For example, in state which causes the execution is shown in Fig. 6.
2 of Fig. 4, if a MATCH signal is present we are supposed to
increment the CIO counter and then read the 1/0 buffer. State 0 and START VMU
-
Consequently the microsteps required are: CHANGE STATE
-
State 1 CIO + SCIO
TCIO This signal causes the CIO to be incremented by one.
CIO- MAR This signal causes the CIO to be gated to the 0100 0000 1001 0101 STC
STC MAR
memory address register.
READ
READ This signal initiates a memory read cycle.
CHANGE STATE
CHANGE STATE This signal causes the VMU to go from state
State 2 and MATCH
2 to state 3.
INCREASE CIO
Therefore the execution of the above microsteps, in that order, CIO + MAR
would implement the 2-3 transition of Fig. 4. Some microsteps READ
380 Part 4 I The instruction-set processor level: special-function processors Section 4 I Processors based on a programming language
S t a r t VMU
Ld Change ST
S"
MATCH CIO-SCIO -
States
' NO MA1
01000000100~0101-STC
-0 pi
1 - STC -MAR
2
VARIABLE ~
__
4, - 17 '
I
Clock
Reset t o 0
E
t-
t-
CHANGE STATE In state 0 of Fig. 4 a START VMU signal takes it to state 1. This
State 2 and NO MATCH is accomplished by the top AND of Fig. 6. The only microstep
CHANGE STATE needed is CHANGE STATE. In state 1 of Fig. 4, the next clock
State 5 and S, pulse (after reaching state 1)causes a transition to state 2. In this
DECREASE STC case we need to save CIO contents in register SCIO, (CIO + SCIO)
STC + MAR set the STC to 4095 (4095 + STC shown above in BCD form) and
READ get the contents of the address now in the Symbol Table Counter
CHANGE STATE (READ(STC)).This latter is implemented by the two microsteps
State 5 and d STC -+ MAR followed by a READ command to the core memory.
DECREASE STC This transition from 1 to 2 of Fig. 4 is accomplished by the next
DECREASE STC 5 AND gates shown in Fig. 6. The next AND gates shown accom-
DECREASE STC plish the transition from state 2 to 3 if there is a MATCH. The
SCIO + CIO next AND accomplishes the transition from 2 to 8 if there is NO
CIO + MAR MATCH (in this case nothing need be done). Finally the lowest
READ two groups of AND gates implement the required microsteps as
CHANGE STATE the circuit changes from state 5 to 7 if a 4-bit digit code is sensed
or causes the circuit to remain in state 5 after decrementing the
STC if an 8-bit variable code is read.
Chapter 32
A microprogrammed implementation
of EULER on IBM System/360 Model 301
Helmut Weber
Summary An experimental processing system for the algorithmiclanguage the System/360 family are compatible. The true “machine lan-
EULER has been implemented in microprogramming on an IBM System/360 guage” of these processors is their microprogram language. This
Model 30 using a second Read-only Storage unit. The system consists of a language is on a lower level than the “360 language”; it contains
microprogrammed compiler and a microprogrammed String Language In- the elementary operations of the machine as operators and the
terpreter, and of an 1/0 control program written in 360 machine language.
elements of the data flow and storage as operands.
The system is described and results are given in terms of microprogram
and main storage space required and compiler and interpreter performance Now it is conceivable to compile a program written in a higher
obtained. The role of microprogramming is stressed, which opens a new level language into a microprogram language string. This string
dimension in the processing of interpretive code. The structure and content would undoubtedly contain substrings which occur over and over
of a higher level language can be matched by an appropriate interpretive in the same sequence. We could call these substrings procedures
language which can be executed efficiently by microprograms on existing and move them out of the main string, replacing their occurrence
computer hardware. by a procedure call symbol, followed by a parameter designator
pointing to the particular procedure. Our object program then
takes on the appearance of a sequence of call statements. From
Introduction
here it is only a final step to eliminate the call symbols and furnish
Programs written in a procedure-oriented language are usually an interpreting mechanism which interprets the remaining se-
processed in two steps. They are first translated into an equivalent quence of “procedure designators.”
form which is more efficiently interpretable; then the translated The process just described will result in the definition of a string
text is interpreted (“executed”) by an interpretation mechanism. language and the development of a microprogrammed interpreta-
The translation process is a data-invariant and flow-invariant tion system to interpret texts in this string language. The situation
operation. It consists of two parts-an analytical part, which is similar to the System/360 case: the string language corresponds
analyzes the higher level language text, and a generative part, to the 360 language. Programs written in a higher level language
which builds up a string of instructions that can be directly inter- are compiled into string language text to be stored in main storage.
preted by a machine. The analytical part of the translator depends The string language interpreter corresponds to the microprogram
on the higher level language; the generative part depends on a
set of instructions interpretable by a machine. Historically there
was only one set of instructions which could be interpreted effi-
ciently by a machine, its “machine language.” Figure 1 outlines
this scheme.
Some of the processors of the IBM System/360 family are
microprogrammed machines. On them the “360 machine lan-
guage” is interpreted not by wired-in logic but by an interpretive
microprogram, stored in control storage, which in turn is inter-
preted by wired-in logic. Therefore, in a certain sense the 360
language is not the “machine language” of these processors but
the (efficiently interpretable) language in which the processors of
Fig. 1. Processing programs written in higher level languages via trans-
‘Cvmm. ACM, vol. 10, no. 9, pp. 549-558, September, 1867. lation to machine language.
I82
Chapter 32 I A microprogrammed implementation of EULER on IBM System/360 Model 30 383
which interprets 360 language texts. It consists of a recognizing possible so that the translated program is as compact as possible
part to read the next consecutive string element and to branch and will take up less storage space than the eqnivalent machine
to an appropriate action routine and of action routines to execute language program under the scheme of Fig. 1.
the particular procedure called for by the string element. These ideas are applied in an experimental microprogram sys-
The essential difference between our situation and the 360 case tem for the higher level language EULER [Wirth and Weber,
is that the string language reflects the features of the particular 1966a and 1966133 described below. Problem areas in this approach
higher level language as well as the features of the particular are indicated and some ideas for future development are offered.
hardware better than the general purpose 360 language.
What is gained by defining this string language and by provid-
ing a microprogrammed interpreter for it? From the method of Special considerations for EULER
definition described, it can be seen that the elements of the string The higher level language EULER [Wirth and Weber, 1966a and
language correspond directly to the elements of the higher level 1966bl is a dynamic language. This means that for programs
language after all simplifying data-invariant and flow-invariant
written in it many things have to be done at object code execution
transformations have been performed. But the elements of the
time which can be done at compile time for other languages.
string language are also well-adapted to the microprogram struc-
EULER also contains basic functions which do not have compara-
ture of the machine. Therefore, during the compiling process (see
ble basic counterparts in the machine languages of most machines.
Fig. 2) only a minimum of generation is necessary to produce the
To compile machine code for these dynamic properties and for
string language text. The compiler is shorter and runs faster. those special functions would require rather lengthy sequences of
But the more important aspect is that object code execution machine language instructions, which would consume considerable
is also faster. The string language interpreter in case 2 will be object code space and require high object code execution time.
coded to take care of all necessary operations in a concise form, Therefore, for a language like EULER, interpretation at the string
whereas in case 1 it will be necessary to compile a whole sequence language level by an interpreter into which the dynamic features
of machine language instructions for an elementary operation in and special functions are included by microcode will yield much
the higher level language. Examples of this are the compilation higher object code economy and object code performance than
of 360 code for an add operation in COBOL of two numbers with
compilation to machine language and interpretation of this ma-
different scaling factors or the compilation of machine instructions
chine language.
for table lookup or search operations, etc. In these cases the string
Three examples from EULER are given here.
language interpreter of Fig. 2 will execute a function much faster
than the machine language interpreter of Fig. 1 will execute the
1. Dynamic type handling. To a variable in EULER, constants of
equivalent sequence of machine language instructions. Therefore,
varying type can be assigned dynamically. For example in
object code execution will be faster in scheme 2.
If object code performance is not as much in demand as object A t 3; . . .; A c 4.51,,-,5; . . . ; A c true; . . . ; A t ' . . .';
storage space economy, the string language interpreter can also
the quantities assigned to the variable A have the types: integer,
be written such that the string language is as tightly packed as
real, logical, procedure. Therefore, in EULER each quantity has
to carry its type indicator along and each operator operating on
a variable has to perform a dynamic type test. The adding operator
Input Doto + for instance in A + B has to test dynamically whether both
operands are of type number (integer or real). This type testing
Analyrir
Intermediate Ovtput
is done by the String Language Interpreter in minimum time,
Higher-Level --t
Doto
whereas it would require extra instructions if the program were
intermediate t e x t
to be compiled to 360 machine language.
I
I"tcrprcter
2. Recursive procedures and dynamic storage allocation. In
EULER, procedures can be called recursively, e.g.,
Fig. 2. Processing programs written i n higher level languages via trans-
lation t o interpretive language. F c 'formal N ; if N = 0 then 1 else N * F(N - 1)';
384 Part 4 1 The instruction.set processor level: special-function processors Section 4 I Processors based on a programming language
and storage is allocated dynamically, e.g., The system is an experimental system. Not all the features of
EULER are included,-only the general principles that are to be
new N; . . .; N t 4; . . .; begin new A; A t list N; demonstrated. The restrictions are:
In order to cope with these problems the EULER execution system
uses a run time stack. Each operation is accompanied by stack 1 Real numbers are not included; only integers are recog-
pointer manipulations which by the microprogram can be accom- nized.
plished in minimum time (in general, even without extra time 2 The interpreter microprograms for the operators Divide,
because they are overlapped with the operation proper), whereas Integer Divide, Remainder, and Exponentiation have not
extra instructions would be required, if the program were com- been coded.
piled. 3 The type 'symbol' is not included.
4 No garbage collector is provided. Therefore, the system
3. List processing. EULER includes a list processing system, and comes to an error stop if a list processing program has used
lists are of a general tree structure, e.g., up all available storage space (32K bytes).
A c (3, 4, (5, 6, 7), true, '. . .'); Also for reasons of simplicity, the system is written only for
List operators are provided like tail and cat and subscripting: a 64K System/36O Model 30 and the storage areas for tables,
compiled programs, stacks and free space are assigned fixed ad-
B c A [ 3 ] ; C +- B cat A; C t tail C; dresses.
The string language into which source programs are translated
The string language interpreter handles list operations directly and is defined as closely as possible to the interpretive language used
efficiently by special microprograms. If the program would be in the definition of EULER [Wirth and Weber, 1966a and 1966b].
compiled to 360 machine language, a sequence of instructions The question whether this is the ideal directly interpretable lan-
would be required for each list operation. guage corresponding to the EULER Source language given the
Model 30 hardware is left open. Also no attempt is made to define
the string language so that it becomes relocatable for use in time
EULER system on IBM System/360 Model 30
sharing or conversational processing mode.
An experimental processing system for the EULER language has The three storage areas used by the execution system are:
been written to demonstrate the validity of these ideas. It is a
system running under the IBM Basic Operating System and con- 1 Program area
sists of three parts: 2 Stack
3 Variable area
1 A translator, written in Model 30 microcode.' This trans-
lator is a one-pass syntax-driven compiler which translates Program area. A translated program in string language consists of
EULER source language programs into a reverse polish a sequence of one-byte symbols for the operators (+, -, begin,
string form. end, c, go to, etc.). Some of the symbols have trailer bytes associ-
2 An interpreter, written in Model 30 microcode,l which ated with them; for instance, the symbol +number has three
interprets string language programs. trailer bytes for a 24-bit absolute value of the integer constant.
3 An 110Control Program written in 360 machine language.2
This IOCP links the translator and interpreter to the oper-
ating system and handles all 110 requests of the translator
and interpreter.
The symbol reference (@) has two trailer bytes, one containing
the block number ( b n ) ,the second one the ordinal number (on).
Stored in the second Read-only Storage (Compatibility ROS) of Model
30.
"he 360 microprograms are stored in the first Read-only Storage (360
ROS) of the Model 30.
Chapter 3 2 I A microprogrammed implementation of EULER on ISM System/360 Model 3 0 385
The operators then, else, and, or and ' have two trailer bytes Variable area. The variable area is an area (32K bytes long) of
containing a 16-bit absolute program address, e.g., 32-bit words used for the storage of values assigned to variables
and lists (and also for auxiliary words in procedure descriptors;
1-1 see type procedure in Fig. 3). The format of the entries is exactly
the same as the format of the stack entries (see Fig. 3), the only
Other operators with trailer bytes are label and the list-building exception being that a mark can never occur in the variable area.
operator.
Microprogramming the IBM System/360 Model 3 0
Stack. The execution time stack consists of a sequence of 32-bit [Fagg et al., 19641
words. It contains block and procedure marks to control the proc- Microprograms are sequences of microprogram words. A micro-
essing of blocks and procedures and temporary values of the program word is composed of 60 bits and contains various fields
various types. The first 4-bit digit of a word in stack always is which control the basic functions in the IBM System/360 Model
a type indicator. The format of these words is given in Fig. 3. 30 CPU. These basic functions are storage control, control of the
I Type procedure
Type undefined
IoW/"/A U II
I
I
mp: mark pr,inter. p<,int- i n the stack lncarion of t h e mark for
the hlnck f o r prrjcediire, in which the p r r ~ e d u r eis defined.
link: pointer tr, a a n r d in variahle area which contains
additional infnrmatir,n.
h n : hlock niimher r,f t h e t,lnrk for procedure) in which t h e
procedure is defined
I value: magnitude in hexadecimal ( < 169 pa: 16-hit program arldresq. where string code for procedure
starts.
Type logical
value. t r u e 1 13> u p M f l / J length: numher of elenicrit; in list I < 163)
false 0
Ioc: 16-bir lucatinn f,f first liit element in variable area (lists
are stored i n coiiFecutive storage locations).
Mark
Type lahel I
I
Fig. 3. Format of words in stack and variable area.
386 Part 4 1 The instruction-set processor level: special-function processors Section 4 I Processors based on a programming language
2 BUS
,
I
’ CARRY
data flow registers and the Arithmetic-Logic-Unit (ALU ), micro- 5 is as for microprogram words in the second Read-only Storage
program sequencing and branching control, and status bit-setting unit (Compatibility ROS) if the machine is equipped with the 1620
control. Microprogram words are stored in a Card Capacitor Compatibility Feature. The meaning of the microprogram word
Read-only Storage (CCROS). Fetching one niicroprogram word fields is explained in connection with Fig. 6 which shows the
and executing it takes 750 nsec, the basic machine cycle. symbolic representation of a microprogram word together with
Figure 4 shows in simplified form the data flow of the IBM an example as it appears on a microprogram documentation sheet.
System/360 (IBM 2030 CPU). It consists of a core storage with The fields of the microprogram word can be grouped in five
up to 65,536 8-bit bytes and a local storage (accessible by the categories:
microprogrammer but not explicitly by the 360 language pro-
grammer), a 16-bit storage address register (M, N), a set of 10 %bit 1 ALU control fields: CA, CF, CB, CG, CV, CD, CC
data registers (I, J, . . . , R), an arithmetic-logic-unit (ALU), con- 2 Storage control fields: CM, CU
necting 8-bit wide buses ( Z , A, B, M, N-bus), temporary registers
3 Microprogram sequencing and branching fields: CN, CH,
(A, E), switches and gates. CL
Figure 5 shows the more important fields of a microprogram
word. Only 47 bits are shown. Other fields contain various parity 4 Status bit setting field: CS
bits and special control bits. The field interpretation given in Fig. 5 Constant field: CK
Chapter 3 2 I A microprogrammed implementation of EULER on IBM System/360 Model 30 387
0000
000 1
0
I
0
1
Wrlte MS
NOOCC~SIL S
*
X
R
L
o
1 *t 0
L
c
L
+
-
+o
+I
?; ;N
&
;
LL-SS
0010
001 1
RO ** Store 3f X D 2 X
* H H
Through Thr
X
m
And
Or
Ht-S4
~zcs4,~z-s~
_-__ SI IJ-MN Y X K 3
_ _ -.
* * *
~
Fig. 5. IBM System/360 Model 30 microprogram word. (Detailed explanation is provided in text.) The field inter-
pretation is given for microprogram words in compatibility ROS if the machine is equipped with the 1620 compati-
bility feature. Fields marked contain designators not explained here in order not to confuse the basic principles.
'I*"
tion with injection of a 1 (+1) (for instance, to simulate subtraction For instance, if CH = 8, then the bit R2 is transferred to X6; if
in connection with the B-input inverter), addition with saving the CH = 6, then X6 is set to one if in the last ALU operation a carry
carry in bit 3 of register S (+O,Save C, and +l,Save C), and had occurred. It is set to zero if no carry had occurred. X7 is
addition using an old carry stored in bit 3 of register S and saving controlled by CL. If, for instance, CL = 0, then X7 is set to zero;
the new carry in this same bit (+C,Save C). Other codes specify if X7 = 5, then X7 is set to one if both digits in R are valid decimal
logical operations (AND, OR, XOR). digits (Le., RO. . .R3 5 9 and R4. . .R7 5 9), X7 is set to zero if
The CD-field specifies into which register the result of the ALU either digit in R is not a valid decimal digit (Le., RO. . .R3 9 >
operation is gated. Any one of the 10 data registers can be speci- >
or R4. . .R7 9). This microprogram sequencing scheme allows
fied. Z means that the ALU output is gated nowhere and will be a four-way branch after the execution of each microprogram word.
lost.
Status bit setting. The CS-field allows the unconditional or condi-
Storage control fields. On the line designated “storage” in Figure tional setting of certain status bits to be specified, combined in
6, a storage statement can appear. It will specify whether this Register S. If, for instance, CS = 3, then S4 is set to one if the
microcycle is a ready cycle, a write cycle, a store cycle or a result of the ALU operation performed in this microprogram cycle
no-storage access cycle, and from where the storage address is shows a zero in the high digit (Le., ZO = Z1 = 2 2 = 23 = 0); S4
supplied (CM-field) and whether storage access is to main storage is set to zero otherwise. At the same time, S5 is set to one if the
or local storage (CU-field).Note that a full storage cycle (1.5 psec) result of the ALU operation shows a zero in the low digit (Le.,
corresponds to two read-only storage cycles (750 nsec). 24 = Z5 = Z6 = 2 7 = 0); S5 is set to zero otherwise. If CS = 9,
The codes CM = 3, 4, or 5 specify read cycles. The addresses then S2 is set to one if the result of the ALU operation is not
are supplied from the register pairs I], UV, and LT, respectively. zero (i.e., at least one of the bits ZO. . .Z7 is equal to 1). If the
A read cycle reads one byte of data from core storage into the result of the ALU operation is zero, then S2 is not changed.
storage data register R.
A write cycle regenerates the data from the storage data regis- Constuntfield. The 4-bit CK-field is used for various purposes. One
ter R at the address supplied in the last read cycle. instance explained in the ALU statement is to supply a constant
A store cycle acts exactly as a write cycle except that it inhibits B-source for an ALU operation. Other examples not explained here
in the read cycle immediately preceding it the insertion of the any further are the addressing of a few specific scratchpad local
data byte from storage into the R-register. storage locations, module switching (replacement of the high part
The CU-field specifies whether storage access should be to main W of the ROS address), and the control of certain special functions.
storage (MS) or to a local storage of 256 bytes not explicitly ad-
dressable by the 360 language programmer. Symbolic representation of microprograms. Microprograms are
symbolically represented as a network of boxes (Fig. 6) each
Microprogram sequencing and brunching. Each microprogram representing a microword, connected by nets indicating the pos-
word is stored at a unique address in ROS. A 13-bit ROS address sible branching ways. Figure 7 gives an example of a microprogram
register (W3. . .W7, X0. . .X7) holds the address of the word being (to be explained in the next section). There exist programming
executed. For the symbolic representation of a microprogram (Fig. systems to aid in the development of microprograms. They contain
6) the ROS address is given in hexadecimal in the upper right symbolic translators to translate the contents of a box according
corner, and the last two bits of this address are repeated in binary to Fig. 6 into the contents of the actual fields of the microprogram
on the upper margin. word according to Fig. 5 . A drawing program generates documen-
After execution of a microprogram step, the next sequential tation (Fig. 7 is drawn with such a program). These systems usually
word will not be executed. Instead the address of the next word also contain programs for simulation and generation of the actual
to be executed is derived as follows. The high five hits (W) remain ROS cards.
the same, unless they are changed by a special command in the
microword, not explained here (so-called module switching). The
next six bits (XO. . .X5) are supplied from the CN-field (written String language interpreter for EULER
in hexadecimal in the symbolic representation of Fig. 6). The low The string language interpreter for EULER is entirely written in
two bits are set according to conditions specified in the CH and Model 30 microcode. It consists of a few microprogram steps to
CL fields. X6 is set according to the condition specified by CH. read the next sequential symbol from the program string and to
Chapter 32 I A microprogrammed implementation of EULER on IBM System1360 Model 30 389
do a function branch on the symbol and of a group of micropro- trailer bytes. If an and finds the value true, then it deletes this
gram routines which perform the necessary operations for the value from the stack and proceeds to the next symbol in the pro-
program byte read. These routines also take care of dynamic type gram string (to evaluate the second operand of and). Similarly if
testing and stack pointer manipulations. The routines are equiva- an or operator finds the value true, then a branch occurs to the
lent to the routines described in the definition of the string lan- program address given in the two trailer bytes. If an or finds the
guage for EULER [Wirth and Weber, 1966a and 1966b]. value false, then it deletes this value from the stack and proceeds
Figure 7 shows, as an example, the microprogram to interpret to the next symbol in the program string. The then operator is a
the program string symbols and (internal representation X'52''), conditional branch code: it deletes the logical value from the
or X'50' and then X'53'. These operators test if the highest entry stack. If this value was false, then a branch is taken to the program
in the stack is a value of type logical. The logical operators in address given in the two trailer bytes. If this value was true, then
EULER work in the FORTRAN sense, not in the ALGOL sense: the next symbol in the program string is executed.
if after the evaluation of the first operand the result is determined The pointer to the symbol in the program string (the instruction
(false for and, true for or), then the second operand is not evalu- counter) is located in the functionally associated pair of registers
ated but skipped over. If an and operator finds the value false, I and J in the Model 30. The pointer to the left-most byte of the
then a branch occurs to the program address given in the two highest entry in the stack (the stack pointer) is located in the two
registers U and V in the Model 30.
'X 'mi' represents the hexadecimal number composed of the digits n In the following the individual steps in this microprogram are
(n = 0 , . , , ,9, A , . . , ,F). explained in more detail.
390 Part 4 I The instruction-set processor level: special-function processors Section 4 1 Processors based on a programming language
Location Location
Address in Figure Description Address in Figure Description
1161: c1: The instruction counter IJ addresses main stor- to 11C9 occurs. If the operator was and or then
age. The addressed byte i n main storage is (G6 = 1) and the value was true (S5 = 0),
read out into the storage data register R. The then branching to 11CA occurs.
instruction counter is updated by adding 1 to
11CB: G5: This word is executed for the operators and and
register J. A possible carry is saved to be added
then when the value was false. Here the type
to 1.
test is made. If the type was not logical (S4 = 0).
1117: c2: The operator has been read out from main then a branch t o l l C l occurs. If the type was
storage into R. It is also transferred (through correct, then the microprogram proceeds to
the ALU)to register G. A four-way branch occurs fetching the trailing program address (two bytes)
on the two highest bits RO and R1 of the oper- to store it as the new instruction counter in IJ.
ator. For the operators 52, 53, and 50 this This is done for the and operator (G7 = 0) in
branch goes to ROS word 1171, whereas other this word and the following two words l l C 3
operators cause a branch to 1170, 1172, or and 111E; for the then operator (G7 = 1) it is
1173, indicated by the three lines not continued. done in this word and the words 11C3 and 111F.
1171: c3: To complete the updating of the instruction llC3, J6, J7: The two bytes trailing of the operators and or
counter, the carry f r o m 1 1 6 1 is added into I. 111E: or are stored as the new instruction counter IJ.
The first byte of the highest entry of the stack The operation is completed. The microprogram
is addressed by UV and read out into R . A fur- branches back t o 1161 to read out the next
ther four-way branch on the operator is made operator.
(G2, G3). For our operators the branch goes to llC3, J6. L7: The two bytes trailing of the operator then are
115D. 111F: stored as the new instruction counter in IJ. The
carry-saving bit S3 is forced to zero.
115D: c4: The high order byte of the highest stack entry
has been read out of storage into R. It contains llCE, N8, N9: The stackpointer is decremented by four (the
the type of entry in the high digit and if this 1144: operator '-' means complement add) which in
type was logical then i t contains the value true effect deletes the highest entry from the stack.
(1) or false (0) in the second digit. This byte is Observe that when these two words are entered
tested by adding X'DO' to i t and observing the from l l l F (then operator with value false) the
result, ignoring the carry. S4 is set t o 1 when microprogram will not go through 1145 be-
the type was 3 (logical) otherwise to 0. S5 is cause we have forced S3 t o zero in lllF. The
operation is completed, and the microprogram
set to 1 when the low digit of this byte was 0
branches back to 1161 to read out the next
(value false), S5 is set t o 0 when the low digit
operator.
of this byte was 1 (value true). Another four-
way branch occurs on the bits G4 and G5 of the llC8: J5: This word is executed for the operator or when
operator. If the operator is 50(or), 5 1 (cannot the value was true. Similarly as in l l C B , the
typetest is taken. For types not logical a branch
occur), 52 (and), or 53(then), then a branch to
to l l C l occurs. If the type was correct, then
11C4 occurs.
the microprogram proceeds t o fetching the
11C4: L4: The next byte is read from the program string, trailing program address (two bytes) to store it
as the new instruction counter in IJ (words
it is the high byte of the two-byte program ad-
l l C 3 , 111E).
dress trailing the operator. The instruction
counter is updated again by adding a 1 t o J, llC9: N5: This word is executed for the operator or when
saving a possible carry. Another four-way branch the value was false. A typetest is made. If the
occurs on the bit G6 of the operator and the type was correct, then the trailing program ad-
value of the stack entry. If the operator was dress is skipped and IJ is updated by 1 twice
and or then (G6 = 1) and the value was false in 11C4, 11C9 (possible carries out of J handled
(S5 = l), then branching to llCB occurs; if in 11CF or 1145). The stackpointer is decre-
the operator was or (G6 = 0) and the value mented by four in l l C E , 1144.
was true (S5 = O ) , then branching to l l C 8 11CA: 45: This word is executed for the operators and and
occurs. If the operator was o r (G6 = 0) and then when the value was true. A typetest is
the value was false (S5 = l), then branching made. If the type was correct then the trailing
Chapter 3 2 I A microprogrammed implementation of EULER on IBM System/360 Model 3 0 391
The total ROS space requirement for the String Language In-
Location
terpreter is:
Address in Figure Description
It can be seen from Fig. 7 that the execution times of the EULER compiler
microprograms including the readout of the operator (I-Cycle) are
The translator to translate EULER source language into the Re-
the following:
verse Polish String Language is a one-pass, syntax-driven compiler.
and 6 pet' (8 microprogram steps) The syntax of the language and the precedence functions F and
or 6 psec (8 microprogram steps) G over the terminal and nonterminal symbols are stored in table
then 6 psec for value true (8 microprogram steps) form in Model 30 main storage. There is also main storage space
7.5 p e c for value false (10 microprogram steps) reserved for translation tables for character delimiters and word
delimiters and for a compile time stack, a name table, and, of
In order to compare this with a hypothetical EULER system
course, for the compiled code. All these areas are at fixed storage
for System/360 language, let us assume that the compiler produces
locations because of the experimental nature of the system.
in-line code (which probably will give the highest performance
The microprogram consists of the following parts:
although it will be very wasteful with respect to storage space).
Then a reasonable sequence for and might be: A routine reads the next input character from the input
CLI 0 (STACK), LOGFALSE buffer to translate it to a 1-byte internal format, if it is a
BE ANDFALSE delimiter, or to collect it into a name buffer if it is part
of an identifier, or to convert it to hexadecimal if it is part
CLI 0 (STACK), LOGTRUE
of a numeric constant and to collect the number into a
BNE TYPEERR
buffer. This “prescan” requires 100+ microwords.
SH STACK, = ’4’
As soon as an input unit is collected (delimiter, identifier,
Timing: true: YO psec; false: 32 psec.
number) the main parsing loop is entered which makes use
This comparison seems to indicate that the microprogram in- of the precedence tables and the syntax table in main stor-
terpreter is about an order of magnitude faster than the equivalent age. This syntactic analyzer loop requires 100- micro-
program in 360 language. However, this comparison will only yield words.
such a high factor for functions of EULER which do not have When the parsing loop identifies a syntactic unit to be
simple System/360 language counterparts (as for instance the reduced, it calls the appropriate generation routine which
list-operators, begin-, end-, and procedure-call-operator) or where performs essentially the functions described as the semantic
the overhead for dynamic testing and stackpointer manipulation interpretation rules in the EULER definition. The micro-
is heavy as in the above example of the logical operations. For program space required for these programs amounts to
approximately 250 ROS words.
functions which do have System/360 language counterparts and
which are slower so that the overhead is relatively lighter as, for If a syntactic error is detected, the system signals an error
instance, arithmetic operations (especially for real numbers), the and does not try to continue with the compilation process.
microprogrammed interprete- will still be faster than the System/ Though this procedure is totally inadequate for a practically
360 language program, but not by a factor of 10. useful system, it was deemed sufficient to prove the essential
point. For this minimum error analysis and for linkage to
‘ T h e cases where carries occur in the IJ and UV updating are disregarded the 360 microprograms (IOCP), approximately 60 micro-
for timing purposes. words are required.
392 Part 4 I The instruction-set processor level: special-function processors Section 4 I Processors based on a programming language
The total compiler microprogram space is therefore approxi- which utilize existing computer hardware to a much higher degree
mately 500 ROS words. The total main storage space required is than conventional programming systems.
approximately 1200 bytes. Among the thoughts which are raised by this scheme are the
The speed of this compiler is limited by the speed of the card- following:
reader of the system (1000 cards/minute). This excellent per-
formance has three main reasons: (1)EULER as a simple prece- There should be an investigation to determine the ideal
directly interpretable languages which correspond to higher
dence language is a language extremely easy to compile. (2) The
level languages. Although several attempts have been made
functions of a compiler are mainly of a table lookup and bit and
to define string languages for interpretive systems (for in-
byte-testing type. Microprogramming is extremely well-suited for
stance in Wirth and Weber [1966a and 1966bI and Mel-
these kinds of operations. (3) Since the target language is String bourne and Pugmire [1965]),to the author’s knowledge no
Code and not, for example, 360 Machine Language, the generative work has been published which attacks this question in a
part of the compiler is relatively short. general and theoretically founded manner.
It is very difficult to assess the individual contributions of these
A proliferation of interpretive languages and the develop-
three main reasons to the high compiler performance. Therefore,
ment of microprogrammed interpreters can be justified
it is not possible at this stage to make a statement as to whether when better tools are developed to reduce the cost of
the nature of the language EULER or the fact that the compiler
microprogramming. It is necessary that we be able to ex-
is microprogrammed is the dominant factor. press microprogramming concepts (and also machine design
concepts) in a higher level language form and that we
develop compilers which translate the microprograms from
Development of the microprogram higher level language form to actual microcode. Also, good
Since there is no higher level language to express microprogram microprogram simulation and debugging tools are called for.
procedures and no compiler to compile microcode, the micropro- The whole relationship between programming, micropro-
grams were written in the symbolic language explained in Fig. gramming, and machine design should be viewed with a
6. Actually the process was a hand translation of the algorithms common denominator: how should the tradeoffs be made
in the EULER definition to the symbolic microprogram language. such that the ultimate goal can be reached more effec-
The microprograms were translated into actual microcode and tively, . . . how to solve a user’s problem? Green [1966]
simulated before they were put on the System/360 Model 30 by offers some thinking in this direction but the state of the
means of a general microprogram development system. art has to progress further before we will have a complete
understanding of what these relationships and tradeoffs are.
393
Section 1
395
Section 2
Computers with one central processor
and multiple input /output processors
The computer structures discussed in this section are manu- The structure of System/360,
factured mainly by IBM. The reason for this bias toward IBM Part I-outline of the logical structure
is that only fairly elaborate or very specialized structures have The structure of the 360 is presented in Part 6, Sec. 3. A dis-
Pio’s; computers of other manufacturers which have Pio’s tend cussion of an alternative implementation of the 360 by the
to have also the more general multiprocessing capability1 that authors of this book, using multiprocessors, is given (page 585).
would place them in Sec. 3. Chapter 43 gives an overview of the ISP, and Chap. 44 presents
the implementations of various 360 models. The implementa-
The DEC PDP-8 tions of physical processors to give multiple logical processors
using microprogramming are interesting. IBM is rather conserv-
The PDP-8 is presented in Chap. 5, and its 338 P.display ap-
ative in regard to providing structures convenient for multi-
pears in Chap. 25. Discussions are given in Part 2, Sec. 1 and
programming; and a multiprocessing design appears too com-
Part 4, Sec. 1, respectively. For this section, the reader should
plex for them to attempt outside a research environment.
look at the methods for transmitting data between Ms or T and
Mp. Three methods are used: Pi0 or P.display is used to control
The engineering design of the Stretch computer
T.displays (Chap. 25); Pc directly transmits a word to the buffer
of a K for low-data-rate devices, here a K may request data, Stretch (also known as Model 7030) and the UNIVAC LARC
using the program interrupt; and a K transmits data directly [Eckert, et al., 19591 are perhaps the first computers with the
to Mp. principal design goal of maximizing numerical computing
power. Stretch, aptly named because of its influence on the
technology (and on the IBM organization), was initiated by the
The IBM 1800 Atomic Energy Commission at Los Alamos. It was designed t o
Chapter 33 describes the 1Pc-9Pio IBM 1800 computer. There interpret large-scale scientific programs for nuclear engineer-
are five Pi0 types, depending on the components they control. ing. Like a number of other high-risk major developmental
Although we classify them as Pio’s, they are barely processors efforts in the computer field, Stretch was not outstandingly
since the instruction counter has a very restricted behavior. successful as a computer system. Only a few(5 - 10) were built
Unless the data channel has “data chaining” capability (in at a cost substantially exceeding their contract price and with
effect a jump instruction), it is not a processor. performance only modestly better than the art at the time of
their production. However, again in common with other similar
efforts, they had a substantial positive effect on the state of
The IBM 7094 II
the art. In the Stretch case, in particular, the 2.18-microsecond
The IBM 7094 II computer is discussed in Part 6, Sec. 1, page Mp core technology developed for Stretch was transferred to
515; its description appears in Chap. 41. The earlier 709 was the 7090. In fact, this was a major contribution to why Stretch
about the first computer to use independent Pio’s. UNIVAC was only modestly better than 7090. The design goal was per-
(Chap. 8) has a very extensive K for data transmission con- formance 100 times an IBM 704. The computer is described
current with processing, whereas the 701 and 704 both required at a high level in Chap. 34. Buchholz’s book on Project Stretch
Pc to control each data word transmitted. The Pio’s of the 7094 [Buchholz, 19621 is outstanding as a text on computer struc-
I I might be looked a t a s an overreaction or overdesign inspired tures and as a description of Stretch. It should be read by all
by the 701-704. computer designers.
‘For example, the CDC-3600 [Casale, 19623, and the SDS Sigma 7 [Mendelson Computers built t o maximize numerical computing power
and England, 19661. also include, besides the UNIVAC LARC for the Lawrence Radia-
396
Section 2 1 Computers with one central processor and multiple input/output processors 397
tion Laboratory at Livermore, the Control Data 6600 (Chap. 39), integers, variable-length integers, boolean vectors, single and
and the IBM System/360, Models 9 1 and 85. double floating point. The length of thevariable integer is speci-
Stretch derives its power through: fied by parameters in the instruction. Noisy-mode floating-point
data provide a method of introducing a roundoff error in the
1 Compound and complex ISP instructions least significant bit under program control. Thus a problem can
2 A PMS structure with Mp(2.18 ps/w),Pc(0.25 - 1 ps/w),
Pio’s, and a satisfactory switch between P’s and Mp
be run in conventional and noisy modes and the results com-
pared. An instruction is either 32 or 64 bits.
3 Many data-types The ISP processor state has an instruction counter, a dou-
ble-length accumulator, 15 index registers, about 6 registers,
4 Parallelism within the Pc, involving concurrent interpre-
and about 100 miscellaneous bits. Computing power is obtained
tation of the instruction stream using the ”Instruction
look-ahead’’ mechanism by having an instruction set with complex instructions. Hence,
there is an instruction for almost every possible operation,
The last of these, internal Pc parallelism, is the most novel. though inverse subtract and inverse divide instructions are
Stretch was possibly the earliest computer to make use of it; lacking. However, there is a “multiply and add” instruction.
each of the other “maximum” power C’s listed above also uses Stretch has the complete set of 16 operators for boolean vec-
some version of instruction look-ahead, for each of these tors. Compound instructions, formed from a sequence of sim-
“maximum” systems is faced with how to obtain computing pler instructions, also increase power. These instructions
power that goes beyond the basic logic and memory technology specify the array element to be accessed, an operation on the
available at the time the system is designed. The conclusion, element, and a calculation to get the next element, in a single
reached in all these cases, is to move toward internal paral- instruction. Notice that several of these instructions are
lelism. oriented toward operations on arrays (i.e., matrices), which are
In Stretch the instruction look-ahead mechanism fetches the the type of numerical-analysis tasks for which the system was
next several instructions and partially interprets each future built.
instruction. The mechanism is elaborate compared with the Multiprogramming was done with Stretch [Codd et al., 19591
straightforward instruction stack in the CDC 6600 (Chap. 39, and undoubtedly had some influence within IBM. Stretch has
page 489). The Stretch look-ahead complexity stems from par- a pair of bounds registers to relocate and protect a single
tially interpreting instructions which may later have to be un- program. The interrupt scheme for Stretch [Brooks, 1957al was
done. better than that of existing IBM computers, though it is not
Stretch uses a basic Mp(core; 16384 w; (64 + 8 parity) b/w; described in Chap. 34.
tc:2.18 ps). Sixteen Mp’s can be connected to the P’s via the The importance of Stretch lies in the by-products it inspired
S(’Memory Bus; time multiplexed). The 8 parity bits are used and its influence on IBM, encouraging a concern with hardware
to give single-error correction and double-error detection, which project management. The elaborate ISP and the complex im-
is a very substantial amount of error protection compared with plementation of Stretch may not have been worth the effort,
standard design practice. This is the memory that was incor- especially when one compares this computer with the later,
porated in the IBM 7090 and became operational even before larger but elegant CDC 6600. It is, however, interesting to note
Stretch was delivered. Thus, as is often the case with large that Stretch was used as a central component in an early spe-
development efforts, the by-products are as important as the cialized multiprocessor system called the IBM Harvest [Herwitz
main product. and Pomerene, 19601, which provides extremely powerful data-
There is a single well-designed physical Pio, called the Ex- processing capabilities.
change, consisting of several logical Pio’s. Its ability to have
the state of all the logical Pio’s accessible in Mp is useful and
important. This design seems better than the data channels PILOT, the NBS multicomputer system
in the IBM 709-7094 series. It is almost a prototype for the IBM The National Bureau of Standards’ PILOT computer (Chap. 35)
System/360 Pio’s. was first described in 1959. A t that time it was a multiple
The Stretch word length is 64 bits. It has operations on the computer; by our criteria, we classify it as a multiple-processor
following data-types: binary integers, decimal integers, address computer, as shown by its PMS structure (Fig. 1). However,
398 Part 5 1 The PMS level Section 2 1 Computers with one central processor and multiple input/output processors
P c ( ' P r i m a r y Computer)-T.console -
T(reader) +
unlike present multiprocessors with several identical proces- ess devices such as Ms(magnetic tape) and used a plugboard
sors, each PILOT processor is different. program memory. The idea of an independent processor (IBM
PILOT is a good example of an early attempt to use multi- 7094) or computer (CDC 6600) for input/output processing is
processors; successors look little like it. It has one of the best used now, though it is doubtful that PILOT inspired these de-
analytical discussions of any computer [Leiner et al., 19571. signs.
With this machine there was an attempt to resolve the contro- The capacitor-diode store is novel and daring for the tech-
versy between the short-word EDSAC (17 bits) and the long- nology. Two- and three-address computers are used in the pri-
word Institute for Advanced Studies computers (40 bits) by mary and secondary computers. The secondary computer, with
providing a processor and memory (i.e., computers) for each 16-bit words, is not very useful; its memory is very limited, and
problem. Only the first computer had substantial Mp, and the it is essentially used only for address calculations. The book-
other computers, or processors, could be concerned only with keeping operation for a three-address computer could easily
the first computer. The third computer was introduced to proc- keep a small processor busy.
Chapter 33
Introduction
This third-generation computer is constructed with hybrid-circuit technology, the 1800 shows an increase in capability. The 1800
technology (semiconductors bonded to ceramic substrates) known Pc has a medium-sized state (ISP has six registers) including three
as SLT (Solid Logic Technology). It has a core primary memory. index registers. The implementation is not elegant; a single register
The 1800 is designed for process control and real-time applica- array and adder would provide the basis for a straightforward Pc
tions. It is nearly identical to the IBM 1130, which is designed implementation. The 1800 has features which facilitate higher
for small-scale, general-purpose, and scientific calculation appli- information processing rates compared with Whirlwind. The major
cations. The two C’s perform about the same for computation change between Whirlwind and the 1800 machines was brought
bound problems. The 1130 and 1800 are not program compatible about by the decreasing cost of registers and primary memory.
with the “universal” IBM System/360 series, though introduced In the 1800, all K‘s have independent memory (usually 1 2 -
at about the same time. However, the 1800 uses terminals and words or characters) so that concurrent operation of almost all
secondary memories similar or identical to the System/360. These the T and Ms via their K’s is possible. In contrast, Whirlwind has
are organized about the standard IBM System/360 8-bit byte. Thus only a single, shared register in Pc, and only one device can
their common information media provide a link between the two. operate at a time.
Hence an 1800 is sometimes connected to the System/360 as a Lower hardware costs allow multiple Pio’s in the 1800. The
preprocessor. The relative performance of the IBM 1130, 1800, Pio’s represent an unusual approach to information processing in
and the IBM System/360 can be seen on page 586. The 1800 has this period. The Pio’s which process standard disk, magnetic tape,
a better cost/performance ratio than a System/360, Model 40 and and card reader are conventional, but the Pio’s for analog and
has the performance of a Model 30. From now on we will refer process signals are novel and interesting. The latter Pio’s are the
only to the IBM 1800, although much applies to the IBM 1130. most unusual part of the 1800, and they allow independent pro-
The 1800’s interface facilities include a large number of T’s grams in each Pi0 to do some very trivial processing tasks such
which can connect to different physical processes; a multiple as alarm-condition monitoring independent of Pc. However, the
priority interrupt facility with fast response; multiple Pio’s which Pio’s are limited; for example, it is difficult to transmit or receive
can transfer information at high data rates;’ and a complete a data block between Ms and Mp (using a Pio) without surrounding
instruction set for real-time, nonarithmetic processing. the data block with Pi0 control words (thereby transmitting the
We include the 1800 because it is a typical, 16-bit, real-time, control words).
process control computer. The ISP is the most straightforward of The interrupt system is typical of second- and third-generation
the IBM computers in the book (and perhaps the nicest). The computers and is comparable to the SDS 900 series (Chap. 42).
several different Pio’s and their implementations are unusual and In later computers interrupt conditions are used to determine a
should be carefully studied. Important aspects of the 1800 include fixed address to which the processor interrupts. There are generally
the PMS structure as it links to real-time processes, e.g., analog many conditions (100 to 1,000), but only a few discrete levels (8
processes; the straightforward Pc ISP (Appendix 1of this chapter); to 20). The 1800 depends on program polling within a discrete
the specialized Pio’s for real-time T’s; the Pc implementation; and interrupt level; each level has a unique, fixed address.
the Pi0 implementation. The chapter is written to expose and A principal ISP design problem is the addressing of the 65,536-
explain these aspects.2 word Mp. Thus, a 16-bit number has to be generated within Pc
By comparing the 1800 with Whirlwind, an evolutionary pro- for an address. In this regard the 1800 behaves like the 12-bit
gression can be seen. Their ISP’s are similar but, because of better machines which have to address a 212 (4,096) word memory, and
the modes or methods the 1800 uses for addressing are reasonable.
lAkhoigh we refer to the data channels as Pio’s, they have a very limited
ISP for a Pio; in fact, they might better be called Ks. It should be noted that it is relatively difficult to write programs
‘Some of the material in the chapter ha5 been abstracted from the JBM which do not modify themselves. For example, the instruction,
1800 Functional Characteristics Manual. Store Status, is changed by its execution.
399
400 Part 5 1 The PMS level Section 2 I Computers with one central processor and multiple input/output processors
A peculiar feature of the 1800 is its storage protection (see page The central processor'-primary memory
408). This feature should provide program relocation capability The IBM 1800 is a fixed-word-length, binary computer with 4,8,
in addition to protection, but it does not. 16, or 32-kword memories of 16 + 1 + 1bits, and a memory cycle
time of 2 or 4 microseconds. Of the 18 bits 1 bit is used as a parity
check (P bit) and 1 bit is used for storage protection (S bit). The
PMS structure
Pc instruction set operates on 16-bit and 32-bit words. Indirect
A simplified picture of the IBM 1800 structure is given in Fig. addressing and three index registers are used in address modifica-
1, without Pio('Data Channel)'s and K('Device Adapter)'s. Each tion. The Pc has a 24-level interrupt system, three interval timers,
T and Ms have a K which connects Pc's In and Out Bus, the S('Pc and a console.
to K). Some K's attach to Pio's and some directly to Pc. Information The Pc interrupt is a forced branch (jump) in the normal
can be transferred between M p and K via Pi0 at rates up to 0.5 program sequence based upon external or internal Pc conditions.
megawordis or 8 megabitsis. The IBM Configurator (Fig. 2) gives The devices and conditions that cause interrupts are hardwired
the restrictions on the possible structures, together with minute in fixed priority levels. An interrupt request is not honored while
L details. It is presented as an alternative to the PMS structure the level of the request itself or any higher level is being serviced,
(Fig. 1).The Configurator is intended to show the "permissible
or if the level requested is masked. Examples of interrupt condi-
structures" but does not show the logical or physical structure.
tions are:
The PMS diagram (Fig. 3) alternatively shows the physical-logical
hardware structure and performance parameters. l t should be 1 An external process condition that requires attention is
noted that a PMS diagram with the information of the computer detected.
component Configurator (Fig. 2) would require slightly more de-
tails (and space). 'IBM name: the Processor-Controller or PC.
PROCESSOR-
CONTROLLER
1 PROCESS I/O DATA PROCEjSING I/O
Fig. 1. IBM 1800 data acquisition and control system. (Courtesy of International Business Machines Corporation.)
Chapter 33 1 The IBM 1800 401
MWS
t '
MPX, R
I DIGITAL INPUTS
II
I
I
I
I
I
t
I
I
I
iI
I
I
I A I
I
I
I
+II
I
I
I
I
I
I
I
I I I
I
I I
I
Chapter 33 1 The IBM 1800 403
T
~
ANALOG INPUTS
404 Part 5 I The PMS level Section 2 I Computers with one central processor and multiple input/output processors
T.console -
I K ( t i me)+
M p l PC"- s? T(#l : typewriter)-
T(#2:4: page: p r i n t e r ) +
T(#5; t y p e w r i t e r ) -
"ST: T(#6:8: paqe; p r i n t e r ) +
K ~ T(incrementa1 p o i n t p l o t ) '
K T(paper tape: reader 1punch)-
Pi04 K T ( c a r d ; reader1 punch)-
Pi0 K- s - M s ( # ~ : ~ magnetic
; tape)-
P i 0 ( # I : 3) Ms (removable ;d i skpak)-
Pi0 K T ( '?ystem/360 i n t e r f a c e ) -
Pio6-s K(#l:3)-SS-KT
c#I:R;
3
d i q i t a l : i n p u t ; 1 w;
c
contacts l l o g i c voltage
1
K(#4:6)-S-KT i q i t a l : event p u l s e : t
input: counters;
''I
(#1:16: 8 b ) I ( # l : R : 16 b )
1
d i g i t a l : contact
inputs; t o : i n t e r -
u p t : 16 b
K-S -K ( # l :4)-S-KT I :4: d i g i t a l : output:
contactllogic voltage/
ulse: 16 b
p
P-i70
i o s - s L
t?-S-K-
K
I
- S-KT
:-L
a 1:4: analoq:
10113 b
#1:-1024:
output:
analog: i n p u t ;
I
I
, v o l t a g e , c u r r e n t : (+lo1
I I +20 1+50/+100 /+200j+500)
I
I I mv1+5 v1+10 VI (-20)ma)
p io8- KL LZ - - -- -- --
4Maximum o f 9 P i 0 p e r C
'Pio('Digita1 I n p u t Data Channel)
"Pio('Digita1, Analog Output Data Channel)
'Pio('Anal0g I n p u t Data Channel)
'Optional P i 0 t o c o n t r o l analog c h a n n e l ; ( s t r u c t u r e i s qreatly simplified)
'K('ADC; analog: i n p u t : 9 , 12, 15 b/w; i .rate: 9 ... 24 k w / s )
2 An interval timer has counted a previously set time interval. Digital inputs. The Digital Input provides up to 384 process in-
terrupts; up to 1,024bits of contact sense, digital input, or parallel
3 A magnetic-tape drive has completed a data transfer previ-
ously requested and is ready for another request. register input; and 128 bits of event input counters as 1-, 8-, and
16-bit counting registers.
4 An operator has initiated an interrupt from the Pc console.
5 A device such as a typewriter has just printed a character Analog outputs. Up to 128 analog outputs can be provided.
and is ready to receive the next one.
Digital outputs. Digital Outputs provide up to 2,048 bits of pulse
Primarymemory communication and data transmission with
output, contacts, and registers.
terminals and secondary mentory
Two methods are used to transmit data between Mp and Ms, or ZO processors (data channels)
Mp and T. First, low-speed devices are controlled directly by Pio(’Data Channels) give a T or Ms the ability to communicate
the program. Each character or word of data is transmitted to or directly with Mp. For example, if an input unit requires a primary
from the Pc and onto T by means of an Execute I/O(XIO) instruc- memory cycle to store data that it has collected, the Pi0 communi-
tion. The Pc program and device synchronization are accomplished cates directly with Mp and stores the data.
by using the interrupt mechanism. Devices operating under direct The Pio’s run even if Pc is waiting. The Pio’s have two registers:
program control include typewriter, printer, plotter, paper tape a Word Count which is used to count the number of words being
reader and punch, analog-to-digital converters, contact sense, transferred in a block between a device and Mp memory; and a
voltage-level sense, pulse counters, etc. Channel Address which points to the next word transferred in a
The second method of transferring data is via the Pio(’Data block. The Channel Address is also used to select the next instruc-
Channe1)’s. The Pi0 program is started by the X I 0 instruction of tion in the program for the next block transfer task.
the Pc. The transfer of data words then proceeds under control Two basic types of Pio’s are used, nonchaining and chaining.’
of the specified Pio, completely asynchronous to and in parallel The Pio’s provide the ability to transfer either a single block
with Pc program operation. The Pi0 gains Mp access independent (nonchaining) or multiple blocks (chaining) directly to Mp inde-
of Pc (Pc operation is suspended for one M p cycle). During the pendent of Pc.
Mp cycle, the data are taken from or placed into core storage by
Pi0 (via internal Pc control and registers). As soon as the Pi0 has
The central processor
been satisfied, which normally takes one cycle, the Pc proceeds.
The logical state of the Pc, or the Instruction-set Processor, is not Registers in the physical processor
changed by Pio’s access to Mp. This method of access is referred
Figure 4 shows the relationship of the registers in Pc, together
to as “cycle stealing.” Devices (Ms and T) operating under Pi0
with those in the Instruction-set Processor. Those registers acces-
control include magnetic tapes, disks, line printer, card reader-
sible by the program are shown with an *. All the registers are
punch, and the link to the IBM System/360.
accessible from the console. A description of the functions of each
Some devices can operate under both Pc and Pi0 control,
register is given below.
depending on their characteristics and the configuration, e.g.,
analog input, analog output, digital input, and digital output.
Storuge address register (SAR).All Pc references to Mp are selected
or accessed by this 16-bit register. Pi0 references to Mp use the
Process Z/O, controls and transducers
Channel Address Register (CAR) of the active Pio.
Analog inputs. Analog-input equipment includes analog-to-digital
converters, multiplexors, amplifiers, and signal conditioning equip- Instruction register (I)*. This 16-bit counter register holds the
ment to handle various analog-input signals. The data input rates address of the next instruction.
are up to 20,000 16-bit samples per second, with program selecta-
ble resolution and external synchronization. There can be 1,024 Storuge buffer register (B).This 16-bit register is used for buffering
(via relay) and 256 (via high-speed solid state) multiplexed analog- all word transfers with Mp.
input channels connected to a single K (analog-to-digital con- ‘A descriptive name undoubtedly concocted by one of IBM’s marketing
verter). The Confignrator (Fig. 2) shows the allowable inputs. departments.
406 Part 5 I The PMS level Section 2 1 Computers with one central processor and multiple inputloutput processors
Console
'
I A
Core Storage
d
d
r
c
?
e
- s s i n g
I -I
I
t +
Timers + 1
Operotion
Monitor
4
I
0 B 5PS
I
1
+ I
D #I
1 Connected to
t In Bus Input Dev ices
I A *
I f w
Connected to
1
out Bus
I U I
Output Devices
Control Registers
I Overflow*, Carry* sc
(6)
Fig. 4. IBM 1800 Pc data flow. (Courtesy of International Business Machines Corporation.)
Arithmetic factor register (0).This 16-bit register is used to hold Shift control counter (SC). This 6-bit counter is used primarily to
one operand for arithmetic and logical operations. The Accumu- control shift operations.
lator provides the other factor.
Accumulator temporury (U).The U register is used to store A
Accumulator ( A ) " ,This 16-bit register contains the results of any temporarily during an instruction or an operation which requires
arithmetic operation. It can be loaded from or stored into core the A's facilities.
storage, shifted right or left, and otherwise manipulated by specific
arithmetic and logical instructions. OP register (OP). This 5-bit register is used to hold the operation
code portion of an instruction.
Accumulator extension (Q)". This register is a 16-bit low-order
extension of the Accumulator. It is used during multiply, divide, Index registers'. The three l6-bit registers are used in effective-
shifting, and double-precision arithmetic. address calculations.
Chapter 33 I The IBM 1800 407
OP Operation Code. These 5 bits define the instruc- COND Conditions. These 6 bits select the indicators that
tion. are to be interrogated on a BSC or BSI instruction.
The bit, assignments for conditions are:
Cond( 10) A =0
Cond(l1) A < 0
Op Code F T Displacement
Cond(l2) A > 0
Cond( 13) (A( 15) = 0) that is, A is eoen
I I Cond(l4) (Carry = 0)
Cond( 15) (Overflow = 0)
Fig. 5. IBM 1800 one-word-instruction format. (Courtesy of Inter-
national Business Machines Corporation.) ADDRESS These 16 bits usually specify a core storage address
408 Part 5 I The PMS level Section 2 1 Computers with one central processor and multiple input/output processors
Effective-address generation. The Effective Address (EA) is devel- Load and Load accumulator Yes LD
oped as shown in Table 1. The instruction set is divided into five store Double load Yes LDD
Store accumulator Yes STO
classes as shown in Table 2.
Double store Yes STD
Storuge protection. The storage-protection facility protects the Load index $ LDX
contents of specified individual locations of Mp from change due Store index Yes STX
Load status No LDS
to the erroneous storing of information during the execution of
Store status Yes STS
a program. The status of each location is identified as “read only”
Arithmetic Add Yes A
or “read/write” by the condition of the Storage Protect Bit, S.
Double add Yes AD
The Store-status instruction is used to write and clear Storage Subtract Yes S
Protect Bits. The execution of this instruction is under control of Double subtract Yes SD
Multiply Yes M
the Write Storage Protect Bits switch on the console. Any attempt Divide Yes D
by the program to write into a read-only protected location results And Yes AND
in a storage-protect violation which causes the Internal Interrupt Or Yes OR
Exclusive Or Yes EOR
(the highest priority interrupt).
Shift Shift Left instructions:
Instruction interpretation process Shift left logical (A)t No SLA
Shift left logical ( A Q ) t No SLT
The simplified Pc data-flow block diagram (Fig. 4)shows instruc- Shift left and count (AQ)t No s LC
tions and data entering and leaving memory via the B register. Shift left and count (A)? No SLCA
Additional bits in Pc hold the P and S bits for Mp. Input devices Shift Right instructions:
send data and instructions to the B register via the 18-bit In-bus. Shift right logical ( A ) t No SRA
Shift right arithmetically (AQ)t No SRT
Output devices receive data from the B register via the 18-bit Rotate right (AQ)’ No RTE
Out-bus. Eighteen bits can be transferred between Pc and K(mag-
Branch Branch and store I Yes BSI
netic tape). As each stored-program instruction is selected, its Branch or skip on condition Yes
various parts (op code, format bit, etc.) are directed to the control
registers via the B register and the Out-bus. The control registers
Modify index and skip
Wait
*
No
BSC (BOSC)
M DX
WAIT
Compare Yes CMP
decode and interpret each instruction before the instruction is Double compare Yes DC M
executed.
I/O Execute 1/0 Yes XI0
Except for Pi0 operations, all instructions and data in memory
are addressed by the Storage Address Register (SAR). SAR obtains t Letters in parentheses indicate registers involved in shift operations.
the memory address from the I register or the A register. The $See the section for the individual instruction (MDX and LDX).
Chapter 33 1 The IBM 1800 409
contents of the I register are developed by one of the following 3 SAR addresses the memory location containing the instruc-
means, depending on the Pc operation: tion (first word).
4 Memory location transfers to B register and Out-bus.
1 The I register is incremented for each instruction.
5 Control registers store various parts of the instruction (op
2 The effective address of each instruction is developed in code, format, and tag).
the accumulator (A register) and then transferred to SAR.
The contents of the accumulator are saved in an auxiliary 6 If tag # 00,the specified XR transfers to A register.
(U) register during effective-address computation. If the
instruction was a branch, the contents of SAR is transferred Instruction Cycle 2
to the I register.
7 I register transfers to SAR (I register is then incremented).
The following examples illustrate the data flow or instruction 8 SAR addresses second word of instruction.
interpretation process for the Load Accumulator (LD) instruction.
9 Second word of instruction (address) is read into B register.
10 Address (from B register) is stored in D register.
One-word load instruction
11 a If tag = 00, D register transfers to A register.
Instruction Cycle
b If tag # 00, D register is added to A register (A register
A register transfers to U register. contains contents of XR)
Control registers store various parts of the instruction (op 14 SAR addresses memory at effective address (data word).
code, format, and tag). 15 Data word transfers to B register.
Displacement is stored in the D register. 16 B register loads into A register (through D register).
a If tag = 00, I register transfers to A register.
b If tag # 00, the specified XR transfers to A register.
Central-processor communication with the controls'
Displacement (D register) is added to A register.
Direct program controZ of the controls
Execute Cycle
Pc direct programmed control of 1/0 devices is on the basis of
9 A register transfers to SAR (effective address). single-word or character-at-a-time transfers for each X I 0 instruc-
tion executed. One data word or character is transferred to or from
10 U register transfers to A register.
Mp to K. The X I 0 instruction specifies an 1/0 Control Command
11 SAR addresses data word. (IOCC) with a function of Control, Sense, Read, or Write to a
12 Data word transfers to B register. controlled device. This command is either directly to a device or
to a Pio.
13 B register loads into A register (via D register).
It is possible for the program sequence to execute an X I 0
instruction to a device that is busy responding to a previous X I 0
Two-word load instruction, direct addressing instruction. Each device has a Busy indicator, which signals
Instruction Cycle 1 whether or not the device can accept data or control information.
(Incorrect program sequence timing may cause undetected errors.)
1 A register transfers to U register.
2 I register transfers to SAR (I register is then incremented). 'IBM name: Adapter or Device Adapter.
410 Part 5 I The PMS level Section 2 1 Computers with one central processor and multiple input/output processors
It is possible for a device operating synchronously with the 101 Initialize Write
program to request a data word transfer before the program Initiates a Write operation on a device or unit which
sequence is ready to service the request. Devices with this poten- will subsequently make data transfers from storage via
tial have a “program check” indicator to signal when data have a Pc.
been lost (that is, Pc has not kept u p with the device). 110 Initialize Read
Initiates a Read operation from a device or unit which
Execute Z/O instruction (XZO) will subsequently make data transfers to storage via a
This instruction is used for programmed 1/0 operations and to Data Channel.
initialize Pio; it may be either one or two words in length, as 111 Sense Device
specified by the F bit. In the two-word instruction the address Reads the selected device status word into the Accu-
is either a direct or indirect address, as specified by the IA bit. mulator. A Device Status Word (DSW) and the Process
For proper operation the effective address must be an even ad- Interrupt Status Word (PISW) are sensed with this
dress. The effective address is used to select a two-word 1/0 instruction.
Control Command (IOCC) from storage. If Area 00000 is specified, the Console status and
The IOCC specifies the 1/0 operation, 1/0 device, and core Interval Timer status may be brought into the Accu-
storage address. The format of the two-word IOCC follows, with mulator as specified by a unit address code in the
an explanation of the assigned fields: Modifier field.
4 If Function is Write (001) or Read (010), the Address speci- Internal interrupt. When any one of the following error conditions
fies the storage location of the data word. occur, there is an internal interrupt in Pc: an invalid op code;
a Mp parity error (an even number of bits); a storage-protect
X I 0 execution interpretation process violation; and Channel Address Register check error. The internal
1 The EA of the X I 0 is developed in the accumulator (A) interrupt takes priority over all external interrupts and cannot be
and routed to the Storage Address Register (SAR) to locate masked.
the IOCC (as for any EA). A mask register exists for the masking and unmasking of inter-
2 Bit position 15 of SAR is forced on to select the EA + 1 rupt levels. An interrupt level that is masked cannot initiate a
where the IOCC Area, Function, and Modifier are found. request for service until it has been unmasked.
3 The Area, Function, and Modifier are routed through the
B register to the Out-bus to the control of the device speci- Device status word (DSu/?. DSW indicators usually fall into three
fied by the Area. general categories:
4 Bit position 15 of SAR is turned off to allow the address 1 Error or exception interrupt conditions
portion of the IOCC word to be transferred from the Mp
location specified by the Effective Address (EA) to the B 2 Normal data or service-required interrupts
register. 3 Routine status conditions
5 If the Function is an Initialize Read, Initialize Write, or
Control, the address part of the IOCC is routed through Process interrupt status word indicators (PISW). The P E W indi-
the B register to the Out-bus. The address part of the cators are physically located in Pc and are turned on by events
Initialize Read/Write IOCC goes to the Channel Address external to the computer, e.g., contact closures or voltage shifts.
Register (CAR) of Pio. If the Function is Read or Write, the
address is routed from the B register through the A regis-
ter to the SAR. SAR addresses the memory location to or IO processors1
from which the data are transmitted.
The Pc initializes each Pi0 with an X I 0 instruction. The Pi0 has
Interval timers priority to the extent that, when the 1/0 device is ready to send
Three timers are provided to supply real-time information to the or receive a data word, the Pc is stopped while the word transfers
program. They are in core-storage locations 0004 (Timer A), 0005 to or from core storage. Pc data and conditions are undisturbed
(Timer B), and 0006 (Timer C). Each timer is incremented ac- except for the memory locations that receive data from an input
cording to its associated or permanent time base and can be device.
hardwired to be 0.125, 0.250, 0.5, 1, 2, 4, 8, 16, 32, 64, or 128 1 / 0 devices that are to be operated concurrently must be on
milliseconds. separate Pio's.
The timers can be started or stopped under program control. The X I 0 instruction for a Pi0 specifies an 1/0 Control Com-
When the count reaches zero, an interrupt is requested on the mand (IOCC) with a function of Initialize Read or Initialize Write.
level assigned to the timers. However, even though a device operates with a Pio, the X I 0
instructions in Pc are used to sense device status and for control.
Interrupt
The interrupt feature provides an automatic branch from the Registers
normal program sequence, based upon an external condition. A
Channel address register. The Channel Address Register (CAR)
maximum of 24 external interrupt levels (groups) are available,
is a 16-bit register used to store the M p address of the next word
arranged in order of priority. Twelve external interrupt levels are
that will be addressed by the Pio. Each Pi0 has a CAR. Pi0 and
standard. Each interrupt level has a unique core-storage address
its associated CAR are selected when their assigned 1/0 device
assigned to it. Several devices may be connected to a single inter-
is selected by the Area Code and Modifier of an IOCC word.
rupt level, and program polling can be used to differentiate the
CAR is incremented by 1 after each transfer of its contents
possible signals causing the interrupt. The Interrupt Level Status
to CAB.
Word, ILSW, is used to identify the specific condition causing its
interrupt level to request service. 'IBM name: Data Channel (DC).
412 Part 5 I The PMS level Section 2 1 Computers with one central processor and multiple input/output processors
Channel address buffer. A common Channel Address Buffer (CAB) 3 a The address portion of the IOCC word is stored in CAR
is used by all Channel Address Registers to address Mp. When a for the selected Data Channel and 1/0 device.
cycle steal request occurs, the CAR for the requesting Pi0 is b A CAR check is made between the selected CAR and
transferred into the Channel Address Buffer. the B register.
4 A cycle steal is requested by Pio; CAR transfers to CAB.
Channel-address-register check bit. Channel Address Register
(CAR) checking is provided to ensure that the first word addressed 5 CAB addresses core storage for the first word of the data
by a selected CAR is the first word of the correct data table. Thus table while CAR is being incremented by 1.
the check determines if a Pc program has set up the Pi0 program 6 The first word of the data table contains
correct1y.l A CAR check is made for all devices after the address a Scan Control bits (bit positions 0 and 1)
from the IOCC word is transferred to the selected CAR. A bit- b Word Count (bit position 2 to 15)
by-bit comparison is made between the contents of the selected These are transferred to their respective registers in the 1 / 0
CAR and the contents of the B register. If any of the corresponding device. This is the end of the first cycle steal.
bits are not equal, a CAR check error has occurred. This CAR 7 When another cycle-steal request from Pi0 occurs, CAR,
check error terminates the Pi0 task and initiates an internal inter- which was incremented in step 5, now transfers the next
rupt. higher address to CAB. CAB then addresses core storage
while CAR is being incremented.
Word count register. A Word Count Register is provided in each 8 The first data word is transferred to or from the 1/0 device
Pio. The Word Count Register is loaded with the contents of the via the B register and Data Channel. The Word Count Reg-
word-count portion of the data table, (2:15). This register is ister in the 1/0 device is decremented by 1. This is the
decremented each time a data word is transferred from (to) the end of the second cycle-steal cycle.
data table.
Steps 7 and 8 now continue on a cycle-steal basis; that is, they
Scan control register. A Scan Control Register is provided in each occur as the 1/0 device requests data transfers. The CAR is
Pi0 that has chaining ability. Scan Control register bits are stored incremented with each data transfer and the WCR is decremented.
in the first word of the first data table (bit positions 0 and 1) and This sequence continues until the last data word of the data table
in the second word (bit positions 0 and I) of the second data is transferred. The last word transfer is sensed by the WCR reach-
table and all subsequent data tables in a chain. ing zero or through some indicator in the device. If the device
The Scan Control Register controls the 1/0device and the Pi0 does not have chaining ability, no more demands for data transfer
operation at the end of the data table as follows: single scan of are made until the device is reinitialized with another X I 0 instruc-
data table and stop with an interrupt; single scan of data table tion.
and stop (no interrupt); continuous scan of this data table or a
different data table with an interrupt at the end of this table; and Chaining. These steps are for the second and all subsequent data
continuous scan of this data table or a different data table with tables. See above for steps 1 through 8.
no interrupt.
9 The contents of the word following the last data word in
The IO processor program operation the first data table are transferred to CAR. This word must
contain the address of the next data table.
The sequence of steps for a Pi0 program is given below. The
memory map or format of the program is shown in Fig. 7. 10 a When the next cycle is requested, CAR is transferred
to CAB to address core storage. The contents of the
1 Pc issues an X I 0 instruction which references the IOCC first word of the next data table is transferred to the
word and initializes Pio. B register. This word must contain the address of itself.
2 The Area Code and Modifier of the IOCC select the 1/0 10 b CAR check is performed and CAR is incremented
device. Function specifies the type of operation (Initialize by 1.
Read or Initialize Write, etc.). 11 When the next cycle steal is requested, CAR is transferred
'Not a completely arbitrary program fault to check, since processors are in- to CAB and CAB addresses Mp. The Scan-control bits and
volved. Word-count bits are transferred from the second word of
Chapter 33 I The IBM 1800 413
0 15 0 15
X10 I n s t r u c t i o n
sc Word Count
Word C o u n t = 22
SC = C o n t i n u o u s w i t h
1001 F i r s t D a t a Word
No I n t e r r u p t
1-1
Word C o u n t = 54
1002 i n g l e Scan
2002 F i r s t D a t a Word ond Stop w i t h an
Interrupt
c I
1
2055 Last D a t a Word
a b.
Fig. 7. IBM 1800 data-channel tables for chaining memory maps. ( a ) First data table; ( b ) second data table. (Courtesy of International Business Machines
Corporation.)
the data table to their respective registers. CAR is incre- 1 Analog input (block data transfers, and comparisons of
mented by 1. analog inputs for limits)
12 Data are transferred to (from) the 1/0 device on a cycle- 2 Digital input/output
steal basis via the B register and the Data Channel. CAB
3 Analog output
addresses core storage to transfer a data word to the B
register. Each time CAB addresses core storage, CAR is 4 Digital output
incremented by 1. When the next cycle-steal request
occurs, CAR is transferred to CAB. The Word-count Reg- Analog-input datu channels. Memory maps (Fig. 8a and b) illus-
ister is decremented for each word transferred. trate the command formats interpreted in the Analog Data Chan-
nel programs. A list of limit values is placed in a table (Fig. sa),
13 When the last data character is transferred (word count
is decremented to zero), operation will continue as speci- and each analog input is compared with the limits. The operation
fied by the Scan Control Register. (See above section for sequence is: Read a specific addressed analog voltage, called the
Scan-Control Register.) multiplex' point (mpx); compare the input voltage with the limits
stored in the table following the analog address (the limit word
contains a high and low value in bits (0:7) and (8:15), respec-
Special data channels
lThe IBM multiplexor is an S which allows multiple inputs to be read
The four Pi0 types for special functions are: into the T(Ana1og to Digital Converter) sequentially.
414 Pari 5 I The PMS level Section 2 1 Computers with one central processor and multiple input/output processors
I
IO ADDRESS A First Mpx Point Location
J " " ~ I i I I I I I I
This word contains
LIMIT WORD Limits Not Used itsown address
1 1 ADDRESS 6 MPX Address 47
L 1 I 1 1 d I 1 I I I I l I
Second Mpx Point
LIMIT WORD Comparison i s Performed
31 I9
00 ADDRESS C
J l I I I I ~ I I I I I I I Third
. Mpx Point
Not Used
1 1 ADDRESS D Fourth M p x Point
L l n c l I I I I I I / I I _
31 23 A I - l n t . WR
LIMIT WORD Comparison i s Performed
IO ADDRESS E Fifth Mpx Point
J P .
ETC.
a.
Location
3000 I Multiplex Address I 3201 This word contains its own address
3001 1 Value 1
I
I
1 3202
3011 T I
Value 1 1 T 3204
3321
ADC F l u e (82)
Locaticm
-
Loccrim
3402
301 5
Car Check Word
= 3015 1
3016 I %$I Word Count = 25 I 3403
3434
ADC Volue (47)
ADC value (82)
3018 1 Value 12
1 3522
I Starting Table Addr.
(3201) 1
3041 T Value 35 T SicrtinS IOCC Starling Table Addr .
35 24
IOCC
3043 A/l -1nitiolize Read
1 A/I - In1 . Rd. I I
b. d.
Fig. 8. IBM 1800 data-channel analog-input instruction format and memory maps. (a) Multiplexor address table
with limit words for comparisons. ( b ) Data table, chained sequential control. (c) Multiplexor address table, random
addressing. (d) Analogto-digital converter storage tables, random addressing (used with a second data channel).
Chapter 33 I The IBM 1800 415
I Data rn I
I Data n
a. C.
-
Digital Input Group Address1
Initial D or A Output Address 1
I Data 1
D or A Output Address
Data2
I Data 2 I
D or A Output Addresses
Datag
Data rn
b. d.
Fig. 9. IBM 1800 data-channeldigital or analog-output instruction formats and memory maps. (a) Digital input,
sequential; (b) digital input, random addressing; (c) digital or analog output, sequential; (d) digital or analog out-
put, random addressing. (Courtesy of International Business Machines Corporation.)
tively); and if the analog-input value lies outside the limit range, Digital-input data channels. Digital parameters or events can be
initiate an interrupt. read into Mp under the control of a Digital-input Data Channel.
Figure 8h describes a second use of this data channel. Pi0 The memory map (Fig. 9a) shows the control format for selecting
accepts a sequence of analog inputs and packs them into a table and inputting a block or sequence of external data. The memory
following the address initiation instruction. The analog inputs from map (Fig. 9h) illustrates a more general ability to address inputs
the T’s are either fixed or selected in a cyclic fashion from a at random and read them into succeeding Mp locations.
Multiplexor.
Two Pio’s can be used concurrently: One Pi0 controls the input Digital- and analog-output data channels. Memory maps (Fig. 3c
from a series of analog-input addresses (Fig. 8c); the second Pi0 and d ) show the program format used by the Digital- or Analog-
packs the corresponding analog values in a second table (Fig. 8 4 . output Data Channels. These channels output selected data points
416 Part 5 I The PMS level Section 2 1 Computers with one central processor and multiple input/output processors
to external analog or digital K’s. This Pi0 is similar to the Digital- general. The Pio’s are rather special, designed to monitor and
input Data Channel. control a process, independent of Pc. Although the Pio’s are
powerful (by providing parallel data transmission), their use, like
Conclusions other multiprocessing systems, is nontrivial. The Pc ISP is fairly
We have tried to show a typical, third-generation computer used straightforward, and one should write a program using it to ap-
for process control. Many of the facilities the 1800 possesses are preciate its simplicity.
Chapter 33 I The IBM 1800 417
Appendix I
I B M 1800 I S P D e s c r i p t i o n
Pc State
A<O:15> Accwnulator
Q<O:15> Accumulator Frtension ,for m u ; t i p l i e r , auotient and double
length
1<0:15> I n s t r u e t i o r . Location Counter
XR[ 1 : 3 1<0 : l5> Index R e g i s t e r s
ov 0iierfloi.i I n d i c a t o r
C Carru Indi ea t or
R"" denotes r u n n i n g comutcr
Mu S t a t e
M [ O : F F F F l 61<P, S ,0:l5> Mp w i t h P a r i t y and Protect b i t s
Pc Console S t a t e
Check Stop Switch pc stops i f storage p r o t e c t w i o l a t i o n occurs
USPB Switch Write .Storage Protect Elits; enables t h e w r i t i n g of b i t s i n
a arord
SPV Indicator Storage P r o t e c t V i o l a t i o n i n d i c a t o r : s e t t o 1 if a memory
reference is made to a o r o t e c t e d iuord
I n s t r u c t i o n Format
instruction/i[0:1]<0;15>
opd:4> := i[O]<n:4> operation code
shop<O:7> := opoi [0]<5.8,9> s h i f t ooeration code count
f := i[O]<5> f o r m a t ; s p e c i f i e s a 1 or 2 word i n s t r u c t i o n
t4:1> := i [ O ] i b : 7 > tag: index r e g i s t e r specification
d&:15> := i[O]&:15> disnlacement o r s h o r t address
d s g n a : 15>:= signgxtend(dc8mQ: 1)5;
a<0:15> := i[11<0:15> afldress
ia := i [ Old> irr'irect aciiress b i t
bo := i C o l d > branch o u t b i t
c o n d 4 : 5 > := i [ O ] < l O : l 5 ; coniYtions for t e s t
Tnstruction I n t e r a r e t o t i o n Process
Run-(instruction[O:ll +M[l:I + I]; next fetch
~f +(I 6 1 + I); f +(I tl + 2 ) ; next 1 or 2 uord i n s t r u c t i o n
Instructionuexecution) execute
Instructicrr Y e t an” Tnstruction Fzerutior Pr0ces.s
Instructiondxecution := (
Load and Arithmetic
LO (:= op = 11000) + (A + M [ z l ) : load a c c m lator
LDD (:= op = IlDOl) i(AOQ tM[z]OM[zd]); double load
STO (:= op = 11010) 4 (M[z] +A); store a c c m l o t o r
STD (:= op = I l O l l ) i(M[Z]oM[zd] tAoQ); double s t o r e
A (:= op = IOOOO) -> (Ov,Cd + A + M[z]): add
AD ( : = op = IOOOI) + (Ov,CoAoQ (-Ana + M[zloM[zd]); double add
S (:= op = 10010) + (OV,CoA + A - M[z]); subtract
SD ( : = op = IOOIl) -) (Ov,Col\Oa +AOQ - M[z]CCl[zd]): double subtract
M (:= op = 10100) 3 (AOQ + A x M[z]): muztipzy
D (:= op = I O l O l ) + (Ov,Q t A o Q / M[z]; divide
A +A@ mod M[z]);
i o g i c a I i n s t r u c t ion:;
AND ( : = op = l l 1 0 0 ) + ( A +A A M[z]); logical and
OR (:= op = IllOl) +(A +A v M[z]); logical or
EOR ( : = op = I l l l o ) ->(A +A @M[z]); loy:cal ezclusiiie or
Compare
CMP (:= op = 10110) - ((A< M[z1)
(A = MCzl) -
+ (I
(1
I + I);
1 + 2));
compare
Shifts
+ ((AW< M [ z l P ( [ z d l )
(AQ = M[zlM[zdl) -
+ (I
(I +
1 + I);
I + 2));
double comnare
(t = 0) + (A + A X 2': c ~A+,-I>);
( t f 0 ) + (A + n o r m a I i z e ( A ) ;
C~XR[t]<l0:19 t normal ize,exponent(A):
XRC t 1 < 8 , 9 + 0) 1:
SLC (:= shop = O O O l C n ~ l l ) + (7 ((s = 0) V A<O>) + ( s h i f t l e f t and count
(t = 0) -? (PaQ t PaQ x 2'; C cA<s-l>):
( t # 0) + (bQ
tnorrnaIize(AnQ);
COXR[t] t n o r m a l ize,exponent(A@))));
B S C (:=
(
(op = 01001)A
skip,condition A 7
i<9>)
f ) + (I
- t
(
Ov
I + I):
t i[Ol<lP);
branch or s k i p on condition
(,skip,condition A f) + ( I tz);
KID -) ov c 0 ) ;
e k i b c o n d i t i o n := (
( ~ O VA d<15>) V overflow o f f
( i C A d<l4>) V carry o f f
(A<l5> A d<13>) V Accumulator even
((A > 0) A d<l2>) V Accumulator greater than zero
(A<O> A d<l I > ) V Accumulator negative
Accumulator zero
((A=O)
(skip,condition
A d<l O>) )
f
+ ( 1 t z + I ; M[z]
+ (d<15>
-,skipjondition
- Ov +
+(I
0);
t I);
t z + I ; P[zl -1));
MDX (:= op = 01110) -f ( modify index and s k i p
( t = 0) A f -(I + I + dsgn); l o c a l branch
( t = 0) A f + ( M [ a l
(Msum=O) v (M101<0> @Msum<O>)
tM[al + dsgn;
- - (I I + I)); r e s u l t zero or s i g n change
Msurn,O:15> := (Mtal + dsgn)
( t # 0) - t ( X R [ t l t X R [ t l + x i ;
(Xsurn=o) V ( ~ ~ [ t l a . @xsurnQO')
Xsurndl:15>
,
:= ( X R [ t ] + dsgn)
- (1 + I + 1))); r e s u l t zero or s i g n change
IO Control I n s t r u c t i o n :
XI0 (:= op = OOOOl) + ( Execute I / O , not defined
lOCC[O:ll eM[zlOM[zdl; next
Execute,lO,inst r u c t ion)
) end I n s t r u c t i o n d z e c u t i o n
I O I n s t r u c t i o n Format:
I O Address<O:15> := iOCCC0l address i f I O data
I O Device o r Area<9:4> := IOCCCII<o:4> io device name
10 Function<5:7> := lOCC[l3<5:7>
I O Modifier<8:25> := l O C C [ l ] C 8 : 1 5 > device f u n c t i o n d e t a i l s
Device mode o f f l i n e := ( I O F u n c t i o n = 0)
Device mode w r i t e := ( I O F u n c t i o n = I)
Device mode read := ( I O F u n c t i o n = 2)
Device mode sense I n t e r r u p t l e v e l := ( I O F u n c t i o n = 3 )
Device mode c o n t r o l := ( I O F u n c t i o n = 4)
Device mode i n i t i a l i z e w r i t e := ( I O F u n c t i o n = 5 )
Device mode i n i t i a l i z e read := ( I O F u n c t i o n = 6)
Device mode sense := ( I O Function = 7)
Chapter 34
The engineering design of the Stretch
computerl
Erich Bloch
Summary The Stretch computer is an advanced scientific computer with This paper reviews the engineering design of the Stretch System
variable facilities for floating-point, fixed-point, and variable-field-length with primary concentration on the central computer as the main
arithmetic and data-handling facilities. contributor to performance. In it, these new techniques, devices,
The performance goal of 100 x 704 speed is achieved by high-speed and instructions have been pushed to the limit set by the present
circuits, multiplexing, and simultaneous-operationtechnique of instruction
technology and, therefore, its analysis will convey best the prob-
and data-fetching, as well a overlap within the execution units. This
lems encountered and the solutions employed.
massive overlap and multiplexing results in complicated recovery routines
between the look-ahead and instruction units. These units are described
in detail, as are the arithmetic units and significant algorithms used in the
floating-point arithmetic. The Stretch system
A flexible set of circuits using a current-switching technique with
Early in the system design, it appeared evident that a six-fold
overriding-level facility is described, as well as the packaging of circuits
improvement in memory performance and a ten-fold improvement
on printed cards. The frame and gate concept is also shown. Performance
figures and hardware count illustratethe size, complexity, and performance in basic circuit speed over the 704 was the best one could achieve.
of the system. To meet the proposed performance criteria, the system had to be
organized in such a way that it took advantage of every possible
overlap of systems function, multiplexing of the major portion of
Introduction the system, processing of operations simultaneously, and anticipa-
The Stretch computer [Dunwell, 19561project was started in order tion of occurrences, wherever possible. The system had to be
to achieve two orders of magnitude of improvement in perform- capable of making assumptions based on the probability that
ance over the then existing 704. Although this computer, like the certain events might occur, and means had to be provided to
704, is aimed at scientific problems such as reactor design, hydro- retrace the steps when the assumption proved to be wrong.
dynamics problems, partial differential equation etc., its instruc- This simultaneity and multiplexing of operations reflects itself
tion set and organization are such that it can handle with ease in the Stretch System at all levels, from overall systems organiza-
data-processing problems normally associated with commercial tion to the cycle of specific instructions. In the following descrip-
applications, such as processing of alphanumeric fields, sorting, and tion, this will be discussed in more detail.
decimal arithmetic. If one considers the Stretch System (Fig. 1) from an overall
In order to achieve the stated goal of performance, all factors point of view it becomes apparent that the major parts of the
that go into the computer design must contribute towards the system can operate simultaneously:
performance goal; this includes the instruction set [Buchholz,
19581, the internal system organization, the data and instruction a The 2-psec, 16,384-word core memories are self-contained,
word length, and auxiliary features such as status-monitoring with their own clocks, addressing circuits, data registers and
devices, the circuits, packaging, and component technology. No checking circuits. The memories themselves are interleaved
one of them by itself can give this hundred-fold increase in speed; so that the first two memories have their addresses distrib-
only by the combining and interacting of these contributing uted modulo 2 and the other four are interleaved modulo
factors can this performance be obtained. 4. The modulo-2-interleaved memories are used primarily
for instruction storage; since, for high-performance instruc-
tions, halfword formats are used, the average rate of ob-
'Proc. EJCC, pp. 48-59, 1959 taining instructions is one per '/z psec. Similarly, a 0.5-psec
42 1
422 Part 5 I The PMS level Section 2 I Computers with one central processor and multiple input/output processors
G
I
IGI/GGi '-I
2 p SEC CORE
I 1
2p SEC CORE Zp SEC CORE,
1
2p SEC CORE
MEMORY I N BUS
I
, 2 p SEC CORE
, I
G
/
I
2 p SEC CORE
1 I
MEMORY OUT BUS
t 1 -
T 5
1 DISK
CONTROL
1 1 CONSOLE
ADAPTER
11 READER
ADAPTER
TAPE
ADAPTER
TAPE
ADAPTER
TAPE
ADAPTER
f?$pt,;
& 4 x lo6 WORDS 729- I
X
TAPE
data-word rate is achieved by the use of four modulo-4 Before discussing the computer organization, a few general
Y -
organized memories. The addressing of the memories and
the transfer of information from and to the memories by
features must be mentioned for completeness:
a memory bus permits new addresses, information, or both a Word length: fj4 bits plus eight bits for parity checks and
to pass through the bus every 200 mpsec. error-correction codes.
h The simultaneously-operating Input/Output units are b Memory capacity and addressing: A possible 256,000 words
linked with the memories and the computer through the can be randomly addressed. These storage positions are all
Exchange, which, after initial instruction by the computer, in external memory, except for the 32 first addresses. These
coordinates the starting of the 1/0 equipment, the checking positions consist of the internal registers (accumulators, time
and error-correction of the information, the arrangement clocks, index registers).
of the information into memory words, and the fetching and
C The instructions are single-address instructions with the
storing of the information from and to memory. All these
exception of a number of special codes that imply the
functions are executed without the use of the computer,
second address explicitly.
so it can in the meantime continue its data processing and
The instruction set (Fig. 2) is generalized and contains a
computation.
full set for single- and double-precision floating-point arith-
c The central computer processes and executes the stored metic, and a full set for variable-field-length integer arith-
program. Here, now, the simultaneity and multiplexing of metic (binary and decimal). It also has a generalized set for
functions has reached its ultimate. index modification and a branching set, as well as a set of
Chapter 34 1 The engineering design of the Stretch computer 423
1/0 instructions. All told, 765 different types of instructions (accumulator). Both halves of the word are independently
are used in the system. indexable.
d The instruction format (Fig. 3) makes use of both half and e A general monitoring device used for important status
full words; half words accommodate indexing and floating- triggers is called the Interrupt [Brooks, 19571 System. This
point instructions (for optimum performance these two sets system monitors the flip-flops which reflect internal mal-
of instructions use a rigid format), and full-word formats functions, result significance (exponent range, mantissa zero,
are used by the variable-field-length instructions. Notice overflow, underflow), program errors (illegal instruction,
that the latter specifies the operand field by the address of protected memory area), and input/output conditions (unit
its left-most bit, the length of the field, and the byte1 size, not ready, etc.). The status of these flip-flops can cause a
as well as the starting point (offset) of the implied operand break in the normal progression of the stored program for
‘Byte: a generic term to denote the number of bits to be operated on as fix-up purposes. Their status is automatically interrogated
a unit by a variable-field-length instruction. at all times.
COMPUTER V O C A B U L A R Y
INSTRUCTION
CATEGORY CLASS MODIFIER I EXAMPLES
NUMBER
OF INSTR
RADIX C O N V E R S I O N 3lN/DEC 32
I
TRANSMIT/SWAP
I10 I N S T R U C T I O N 24
I TOTAL 735
DATA FORMATS
I I I I I I
I
YTE 8 BYTE 7 BYTE 6 BYTE 5 BYTE 4 BYTE 3 BYTE 2 BYTE 1 ECC
I I I I I I I PTY
LUAl INb
POINT
IEXPONE NT~Y MANTISSA ( FRACTION 1 1-4-1 I PAR
" tLl
INDEX I+ ECC
WORD VALUE
I
- COUNT REFILL
PARITY
,FLAG
WORD
0
DATA WORD ADR Isin/&!
I8
I I
25 28
COUNT
46
REFILL
63
ECC
PARITY
71
INSTRUCTION FORMATS
'BINARY
POINT
I
0
ADDRESS ' ;1
18
I
S
1
OP\Ol I
I
28 31
I DECIMAL
DIRECT I
INDEX ADDRESS J OP I
I
DATA WORD
1
INSTRUCTION
INSTRUCTION
FETCH
4 INSTRUCTIONS 4 DATA WORDS
INSTRUCTION
DATA WORD
I INSTRUCTION
UPDATING I I INSTRUCTION
EXECUTION I
INSTRUCTION
EXECUTION
704 STRETCH
FR EXCHANGE
TO EXCHANGE
LOOK-AHEAD
- CHECKER IN BUS
,;PiTRANSFER BUS
v I1
11
I
r I 2 WORD 2WORD
1 1I
1
7 I ACCUMULATOR OPERAND
1 INTERRUPT
SYSTEM I A.B REGISTER I
1
AR ITHMETlC
CHECK
SERIAL
ARITH UNIT
The Memory Bus is the communication link between the mem- rogated on data fetches. The return address is remembered and
ories on one side and the exchanges and the computer on the other. the requesting unit receives the information when it becomes
It monitors the requests for storage to, or fetches from, memory, available. To accomplish this, from the time information is re-
and sets up a priority scheme. Since 1/0 units cannot hold u p quested the receiving data register is in a reserved status.
their requests, the exchange will get highest priority, followed by Requests for stores and fetches can be processed at a 200 mpsec
the computer. In the computer the instruction-fetch mechanism rate and the time, if no busy or priority conditions exist, to return
has priority over the operand-fetch mechanism. All told, the the word to the requesting unit is 1.6 psec, a direct function of
memory bus gets requests from and assigns priority to eight differ- the memory read-out time.
ent channels, The Instruction Unit [Blaauw, 19591 i s a computer of its own.
Since memory can be accessed from multiple sources, and once It has its own instruction set, its own small memory for index word
accessed it is on its own to complete its cycle, a busy condition storage, and its own arithmetic unit. During its operation as many
can exist. Here again, the memory bus tests for busy conditions as six instructions can be at various stages of execution.
and delays the requesting unit until memory is ready to be inter- The Instruction Unit fetches the instruction words from mem-
Chapter 34 I The engineering design of the Stretch computer 427
ory, it steps the instruction counter, and performs the indexing of as the instruction unit starts processing an instruction, it is re-
instructions and the initiation of data fetches. After a preliminary moved from the buffer, thus making room for the next memory-
decoding of the class of instruction, it recognizes its own instruc- word access (Fig. 6). Incidentally, half-word instructions and
tions and executes indexing instructions. On branches, conditional full-word instructions can be intermixed within the same word,
or unconditional, the instruction unit executes these. In the case and therefore the latter can cross a word boundary. This permits
of conditional branches, it makes the assumption that the branch maximum packing of instructions in memory and also serves as
will not be successful. a facility for automatic program assemblers and compilers.
This assumption and the availability of two full-word buffer The adder path, index registers, and transfer bus to look-ahead
registers keep the flow of instruction to the computer continuous. complete the instruction unit system (Fig. 6). It should be noted
Therefore, the rate of instructions entering the instruction unit that the index registers are part of the instruction-unit data path,
is for all practical purposes independent of the memory cycle. therefore permitting fast access (no long transmission lines) to an
Since, for high speed instructions, half-word formats are used, index word. There are 16 index words available to the programmer.
four of these at any one time can be in buffer storage. As soon The index registers, consisting of multi-aperture cores, are oper-
ated in a non-destructive fashion, since in a representative pro- of “To Memory” operations, place into the look-ahead the result
gram, the index word is used nine out of ten times without modi- word for transfer to the proper memory position.
fying it. This permits fast operation under these conditions, and
additional time is only applied where modification is involved. Arithmetic units
After processing through the instruction unit, the updated (in- The design of the arithmetic units was established along lines
dexed) instruction enters a level of the Look-ahead (Fig. 5). Besides similar to the design of look-ahead and the instruction unit. Every
the instruction, all necessary information, its associated instruction attempt was made to speed up the execution of arithmetic opera-
counter value, and certain tag information are also stored in the tions by multiplexing techniques and overlapping of the algo-
same level. The operand, already requested by the instruction unit, rithm, where mathematically permissible.
will enter this level directly and will be checked and error- The arithmetic units, consisting of the Serial Unit and the
corrected while awaiting transfer to the arithmetic units for execu- Parallel Unit, use the same arithmetic registers, namely a double-
tion. length accumulator (A$) consisting of 128 bits and a double-length
An interlocked counter mechanism in the look-ahead keeps its operand register (C,D) consisting of 128 bits. The reason for the
four levels in step, preventing out-of-sequence execution of in- use of the same arithmetic registers is the fact that at any time,
structions, even if all information for a succeeding one is available, a shift from floating-point to variable-field-length operation (or vice
before the previous instruction has been started. uersa) can be made by the program. Therefore, the result obtained
The pre-accessing of operands by the look-ahead and of instruc- by a floating-point operation can serve as the starting operand for
tions by the instruction unit leads sometimes to embarrassing a variable-field-length operation. The chief reason for the double-
positions, for which a fix-up routine must be provided. Consider length registers is the definition of maximum field length to be
the program 64 bits. The field can start with any bit position, and therefore
can cross the word boundary.
(n) STORE Accumulator m
The executions of floating-point mantissa operations and varia-
( n + 1) LOADR
ble-field-length binary multiply and divide operations are per-
( n + 2 ) ADDm
formed by the parallel unit, whereas the floating-point exponent
and assume instruction (n)is in look-ahead, waiting for execution. operation and the variable-field-length binary and decimal add-
If ( n + 2) now enters the look-ahead, a reference to m cannot be type operations are executed by the serial unit. The square-root
made, since the data stored in that position is subject to change operation and the binary-to-decimal conversion algorithm are
by the STORE instruction. The look-ahead must recognize this executed in unison by both units. Salient features of the two units
and “forward” the result of instruction (n), when received, to the will now be described.
+
level where ( n 2 ) is stored.
Another example is the case where the instruction unit assumed The serial arithmetic unit [Brooks et al., 19591 (Fig. 7 ) . The serial
that a conditional branch would not be executed. This instruction arithmetic consists of a switch matrix which can extract 16 con-
is stored in look-ahead and, when it is recognized that the branch secutive bits from A,B and C,D. These 16 bits then can be aligned
was successful, all modifications of addressable registers made by in such a way that the low-order bit of a field as specified by the in-
the instruction unit in the meantime must be restored. Look-ahead struction is at the right end of the field. This wrap-around circuit
in this case acts as a recovery memory for this information. A then feeds into a carry-propagate adder or, in case of logical-con-
similar condition exists when interrupts occur due to arithmetic nect instructions, into the logic unit. At the adder output, a true
results. The look-ahead here again has the data stored pertaining complement unit and a binary-to-decimal correction unit are used
to registers which were modified erroneously in the meantime. The for subtract and decimal operations. The inverse process of ex-
restoring and recovery routines described break into the instruc- tracting is used to insert the processed byte back into the register
tion unit processing, interrupting temporarily the flow of instruc- without disturbing any neighboring positions. Notice that in one
tion and their indexing. clock cycle, the information is extracted, the arithmetic is per-
The arithmetic units described later are slaves to the look- formed and the result inserted back into the registers. In addition,
ahead, receiving not only operands and instruction codes but also the arithmetic information is checked by parity checks on the
the start-execution signal. Conversely, the arithmetic units signal switch matrices and by duplication and comparison of the arith-
to the look-ahead the termination of an operation and, in the case metic procedure in a duplicate unit.
Chapter 34 1 The engineering design of the Stretch computer 429
FR LOOK-AHEAD
1 1-
OPERAND REGISTERS
KCUMULATORS
+ - I*
I
WRAP WRAP
AROUND AROUND
(8 OF 16) ( 8 OF 16)
-
8 BIT TRUE/COM P - TRUEKOMP 8 BIT
PASS AROUND (8 BITS) (8BITS) PASS AROUND
T I 1
DECIMAL
TRUEICOMP CORRECT
I 16-16 1
WRITE IN WRITE IN
MATRIX
I I
Parallel arithmetic unit. The parallel arithmetic unit (Fig. 8) is additions. All additions and subtractions are made in one's com-
designed to execute floating-point operations with a maximum of plement form with automatic end-around carry.
efficiency. Since both single- and double-precision arithmetic is The shifter is capable of shifting up to 4 positions to the right
performed, the shifter and adder exist in a double-length format and up to 6 positions to the left. This shifter arrangement takes
of 96 bits. This insures almost the same performance for single- care of the majority of shifting operations encountered under
and double-precision arithmetic. The adder is of a carry-propaga- normal operation. Where higher-order shifts are required, a suc-
tion type with look-ahead over 4 bits at a time to reduce the delay cessive operation is set up between the parallel unit register and
that normally results in a ripple-carry adder. This carry look-ahead the shifter.
results in a delay time of 150 m p e c for 96-bit binary-number To expedite the execution of the multiply instruction, 12 bits
430 Part 5 I The PMS level Section 2 I Computers with one central processor and multiple input/output processors
I I I
UNIT REGISTER I 7 ADDER 1
CSA 2 I
1 I I
CARRY PROPAGATE s2 I
SHIFTER
ADDER
100 BITS
C1
J. I 1.
CSA 3
c2
I 1
of the multiplier are handled within one cycle. This is accom- Octal value
plished by breaking the 12 bits into groups of three bits each. The 3 6 5 2
action is from right to left and consists of decoding each group If two addition5 of multiples were permitted
of three bits. By observing the lowest-order bit of the next higher
4 x MCI) 6 x MCD 6 x MCD 2 x MCD
group, a decision is made as to what multiple of the multiplicand -1 x MCD -1 x MCD
one must add to the partial product. Since only even multiples
Instead of subtracting 1 x ,MCD in n + 1, subtract 8 X MCD in n.
of the multiplicand are available, subtraction and addition of the
multiples can result. The following example will elaborate this 4 X MCD 6 x iMCD 6 x ,MCII 2 x MCD
-8 x MCD -8 x MCD
point: (MCD means multiplicand)
Resulting decoding
Groups
4 x MCD - 2 x MCD 6 x MCD -6 x MCD
n + 4 n + 3 11 +2 n+l n
Multiplier, 12 bit group The four multiple multiplicand groups and the partial product of
xxo 011 110 101 010 the previous cycle are now fed into carry-save adders of the form,
Chapter 3 4 I The engineering design of the Stretch computer 431
Sum S = A W B W C Cycle 1: 01 = 2
Carry C‘ = A B + AC + BC Cycle 2: 10 = 2
Cycle 3: 0111 = 4
There are four of these adders, two in parallel followed by two
more in series (Fig. 8). The output of Carry-Save Adder 4 then ( b ) The same problem uith hotli skip ozjer 1 / 0 and 3 / 4 - %3/2
results in a double-rank partial product, the product sum and the complement:
product carry. For each cycle this is fed into Carry-Save Adder
101000000000000
2, and, during the last cycle, into the carry-propagate adder, for
Step 1: 0011101
accumulation of the carries. Since no propagation of carries is
11011010000
required in the four cycles, where multiple multiplicands are
added, this operation is fast and is the main contributor t o the Same as before, QIQ2 = 01
fast multiply-time of Stretch.
100101001 Add 3/4 DR
The divide scheme [Robertson, 19581 has a similarity to the Step 2:
111111001
multiply scheme. Multiples of the divisor are used, namely,
3/2 x divisor, 3/4 x divisor and 1 x divisor. This, plus shifting = 100111
This (by table look-up) indicates QRQ4QsQ6Q7Q8
over strings of ones and zeros, results in the generation of the
required 48 quotient bits within thirteen machine cycles. Most Quotient bits generated per cycle:
machines using a nonrestoring divide method require 48 cycles Cycle 1: 01 = 2
for 48 quotient bits. The following example explains this technique. Cycle 2: 100111 = 6
This scheme depends on the use of normalized divisors:
In general, this method results in the generation of 3.7 quotient
DIVIDEND (DD) = 101000000000000 bits per subtraction. While the mantissa operations of multiply
DIVISOR (DR) = 1100011 and divide are performed by the parallel unit, the serial arithmetic
2’s COMP DR (DR) = 0011101 unit executes the exponent arithmetic. Here again is a case where
3/4 DR = 100101001 overlap and simultaneity of operation is used to special advantage.
( a ) Using skip ouer 1 / 0 only: Checking. The operation of the computer is checked in its entirety
and correction codes are employed where data transfers from
101000000000000 DIVIDEND
memory and input-output units are involved. In particular, all
Step 1: 0011101 ADD DR
information sent to memory has a correction code associated with
1101101
it, which is checked for accuracy on its way from memory. If a
Remainder negative, 1st quotient hit = 0; shift one position. single error is indicated, then correction is made and the error
Leading 1 indicates that next quotient bit must be 1; Q,Q2 is recorded via a maintenance output device. Within the machine,
= 01 all arithmetic operations are checked, either by parity, duplica-
tion, or a “casting out three” process. These checks are overlapped
011010000 REMAINDER with the execution of the next instruction.
Step 2: 1100011 ADD DR
100101
- 11 Hardware count. Figure 9 shows the percentage of transistors used
in the various sections of the machine. It becomes obvious that
Overflow: Remainder positive and Q:$ = 1, leading zero indicates the parallel unit and the instruction unit use the highest percent-
Q4 =0 age of transistors. In case of the parallel unit this is due t o the
extensive circuits for multiply and to the additional hardware to
1011100 REMAINDER achieve speed up of the divide scheme. In the instruction unit,
Step 3: 0011101 ADD DR the controls consume the majority of the transistors, because of
1llJ001
the high multiplexed operation encountered.
Negative remainder; Qn = 0; leading 1’s indicate QBQ7Q8= I 1 1
Performance. The performance comparisons in Fig. 10 show the
Number of quotient bits per cycle: increase in speed achieved, especially in floating-point operations,
432 Part 5 1 The PMS level Section 2 1 Computers w i t h one central processor and multiple input/output processors
INSTRUCTION UNIT
DATA P A T H 17,700 2
CONTROLS 19,500 3-1/2
LOOK-AHEAD
FLOATING PT UNIT
over the 704. It should be noted that for a large number of prob- Simulation of Stretch programs on the 704 proved a performance
lems this particular increase in all arithmetic speeds is almost of 100 x 704 speed in mesh-type calculations. Higher performance
proportional to the performance increase of the problem as a figures are achieved where double- or triple-precision calculations
whole, since the instruction execution-times are overlapped to a are required.
great extent with the preparation and fetching of instructions.
Chapter 34 I The engineering design of the Stretch computer 433
Circuits for high-speed operation must be kept out of saturation at all times.
This then explains why both the PNP and NPN version are used:
Having reviewed the systems organization of Stretch, it is now mainly to avoid the problem of level translation, which would be
of interest to discuss briefly the components, circuits, and packag- required due to the potential difference of the base and the col-
ing techniques used to implement the design. lector. This difference is 6 volts, an optimum point for this device.
The basic component used in Stretch is the high-speed drift Figure 11 shows the basic circuit configuration. It consists of
transistor which exists in both an NPN and a PNP version. This a current source, represented by the -30 volt supply and resistor
transistor has a frequency cut-off of approximately 100 mc and R. The functional operation of the circuits consists of two possible
IBM IBM
0 PERAT ION 70 4 705 STRETCH
1. FLOATING POINT
+_ 128 2 2048
EXPONENT RANGE 22 22
MANTISSA BITS 27 48
FLOATING ADD 8 4 pSEC 1.O pSEC
FLOATING MPY 2 0 4 pSEC 1.8pSEC
FLOATING DIV 216 pSEC 7 . OpSEC
LOADISTORE 24 pSEC O.6pSEC
2. BINARY VARIABLE
BIT RANGE 1 TO 64
16
BIT
FIELD
[ ADD/ LOAD/STOR E
MPY
DIVIDE
2.0pSEC
IO .O pSEC
15.0 p S E C
3. DECIMAL
AR ITHMETIC
4. MISCELLANEOUS
AN
TRUTH I + I - II- I L I
TABLE
A hi --
CIRCUIT
DIAGRAM
AN
OUTPUT
INPUT -5.2V
+.5v
MIN- MAX -+ .4v - 5.6V
SIGNAL
V 0 LTAGES REF ov REF - 6V
6.4V
/ILuLLL76.5V
CIRCUIT DELAY X 2 0 M p S E C
RESPONSE
OUTPUT
AN OP
SYMBOL
TRUTH TABLES
6
m
A
CIRCUIT
(AN)
+
+ 30 -- CIRCUIT AN
OUTPUTS
INPUTS
A,B8X Z22ZE~:~
MIN-MAX
SIGNAL VOLTAGES
REF GND -6.0V
y m - 6.4
6.5
ONLY
( A L L OUTPUTS)
X INPUT
ONLY
OUTPUT
CIRCUIT RESPONSE
INPUT
DELAY= 20MpSEC
CIRCUIT
TRUTH TABLES
Eli
- - - AND
+6V
CIRCUIT
DIAGRAM
AN
A
a
ON
442il 63.4:
A. A+
-6 V
+6 V i1.21,
-6 V
1.
d+6V
MIN- MAX
SIGNAL VOLTAGES .3.5
REF GN D REF GND
- .35
BEG OF END OF
CHAIN CHAIN (4)
paths represented by transistor A or C. Which path is chosen by @ be positive and $ negative. If any or none of the bases A and
the current depends on the condition existing on base A. If point B are positive, then @ will be negative and will be positive. In
A is positive with respect to ground by 0.4 volts, that particular other words, an AND function is obtained on output @.
transistor is cut off, making the emitter of transistor C positive This principle, which is reflected in all the circuits, is essen-
with respect to the base and, therefore, making C conducting. The tially the principle of current switching or current steering.
current supplied by the current source (6 ma) will then flow Logical functions for the PNP circuits are, therefore, a +AND
through transistor C to the load $. Output 6, then, is positive by or -OR. Two outputs from each circuit block are available: the
0.4 volts with respect to the -6 volt reference. This indicates at AND function and the inverse of the AND function.
@ the equivalent function impressed on A. At the same time, s A dual circuit exists for NPN transistors with input levels at
is negative with respect to the -6 volt power supply by 0.4 volt, -6 volts and output levels at ground. This circuit will give the
representing, therefore, the inverse of the function impressed on +OR or -AND function.
A. Conversely if A is negative with respect to the ground reference, A thorough investigation of the systems design showed that the
transistor A is the conducting one, keeping emitter C negative with circuits described so far are versatile enough to be used throughout
respect to its base. The current flows through transistor A, making the system. However, there are enough special cases (resulting
-
@ positive with respect to -6 and @ negative with respect to -6. from the many data buses and registers throughout the machine)
Again, the output of @ reflects the function impressed on A, that could use a distributor function or an overriding function.
whereas represents the inverse of the function. This caused the design of a circuit which permitted great savings
If an additional transistor now is paralleled with A, it becomes in space and transistors by adding a third voltage level. Figure
obvious that only if both bases A and B are positive will output 12 shows the PNP version of the third-level circuit.
If transistor X were eliminated, then transistors A and B in A circuit package using the smaller of the two printed circuit
conjunction with the reference transistor C would work normally boards shown in Fig. 14, called a single card, contains AND or
as a current switching circuit, in this case a +AND circuit. If OR circuits. It should be mentioned that the printed wiring is
transistor X is added with the stipulation that the down level of one-sided and that besides the components and transistors, a rail
X is more negative than the lowest possible level of A or B, it is added which permits the shorting or addition of certain loads
becomes apparent that when X is negative, the current will flow depending on the use of the circuits. This rail then has the effect
through that branch of the circuit in preference to branch @ or of reducing the different types of circuit boards in the machine.
-
-
+,regardless of inputs A and B. Therefore, the output of @ and Twenty-four different boards are used and of these, two types
@ will be negative, provided input X is negative. Output I l l is reflect approximately 70% of the total single card population.
the inverse of input X. If, however, X is positive, then the status Due to the large number of registers, adders, and shifters used
of A and B will determine the function @ and 5 implicitly. This in the computer, it seems reasonable that functional packages
demonstrates the overriding function of input X. could be employed economically, because of wide usage. This
Similarly, the NPN version (not shown) results in the OR results in the high-density package also shown in Fig. 14, called
function of C+ if input X is negative and in a positive output at
@ and T, regardless of status A and B, if X is positive. Again
minimum and maximum signal swings are shown in Fig. 12.
The speed of the circuits described so far depends on the
number of inputs and the number of circuits driven from each
load. The response of the circuit is anywhere between 12 and 25
mpsec per logical step with 18 to 20 mpsec average. The number
of inputs allowable per circuit is eight. The number of driven
circuits is three. Additional circuits are needed to drive more than
three bases and where current switching circuits communicate
over long lines, termination networks must be added to avoid
reflections.
To improve the performance of the computer in certain critical
places, emitter-follower logic is used as shown in Fig. 13. These
circuits, having a gain less than one, after a number of stages
require the use of current switching circuits as level setters and
gain devices. Both AND and OR circuits are available for both
a ground-level and a -6-level input. Change from a -6-level
circuit to a ground-level circuit is obtained by applying the ap-
propriate power supply levels. Due to the variations in inputs and
driven loads, the circuits must be designed so that the load can
vary over a wide range. This resulted in instability which had to
be offset by the feedback capacitor C shown in the circuit.
All functions needed in the computer can be implemented by
the use of the aforementioned circuits, including flip-flop opera-
tion, which is obtained by tying a PNP current switch block and
an NPN current switch block together with proper feedback.
Packaging
The circuits described in the last paragraph are packaged in two
ways: Fig. 15. The back panel.
Chapter 3 4 1 The engineering design of the Stretch computer 439
which overlies the whole back panel, against which the intercircuit
wiring is laid. In addition, the power-supply distribution system
must be of such a low impedance that extraneous noise cannot
induce circuit malfunction. For this reason, a bus system, consist-
ing of laminated copper sheets, is used to distribute the power
to each row of card sockets. The wiring rules are such that single-
conductor wire is used up to a maximum of 24“, twisted pair to
a maximum of 36”, unterminated coax to a maximum of 60”, and
terminated coax to a maxirniim of 100 feet. The whole back-panel
construction and the application of single wire, twisted pair, or
coax are calculated by a computer program to minimize the noise
on each circuit node.
The two gates of a frame are a sliding pair with the power
supply mounted on the sliding portion. All connecting wires
between frames are coax and arrayed in layers which are formed
into a drape.
References
Summary PILOT, the new NBS system, possesses both powerful external in which the system can be used in conjunction with other digital
control capabilities and versatile internal processing capabilities. It contains computer facilities, forming an interconnected communication
three independently operating computers. The primary and secondary network in which all the machines can work together collabora-
computers each utilize only 16 basic types of instructions, thus providing tively on large-scale problems that are beyond the reach of any
a simple code structure; but because so many variations of the formats single machine.
are possible, a wide variety of computing, data-processing, and informa- Because the system was designed for such varied uses (ranging
tion-retrieval operations can be performed with these instructions. The
from automatic search and interpretation of Patent Office records
secondary computer is specially adapted for performing so-called “red-
tape” operations,and both the secondary and the primary computers, acting to real-time scheduling and control of commercial aircraft traffic),
co-operatively, can carry out special complex sorting or search operations. the system is characterized by a variety of features not ordinarily
The third computer in the system, called the format controller, is specially associated with a single installation, namely: a high computation
adapted for performing editing, inspecting, and format modifying opera- rate, highly flexible control facilities for communicating with the
tions. The system is equipped to transfer information concurrently along outside world, and a wide repertoire of internal processing formats.
several input-output trunks, though only two are planned for the near The system contains three independently programmed computers,
future. Using two such trunks, it is possible to maintain two continuous each of which is specially adapted for performing certain classes
streams of data simultaneously flowing between any two external units and of operations that frequently occur in large-scale data-processing
the internal memory, without interrupting the data-processing program. applications. These computers intercommunicate in a way that
The system can operate with a wide variety of input-output devices, both
permits all three of them to work together concurrently on a
digital and analog, either proximate or remotely located. The external
common problem. The system thus provides a working model of
control capabilities of the system enable the machine to supervise this wide
family of external devices and, on an unscheduled basis, to interrupt or an integrated multicomputer network.
redirect its overall program automatically, in order to assist or manage
them. System organization
Exclusive of data-storage and peripheral equipment, the central
At the National Bureau of Standards (NBS) a new large-scale processing and control units of the over-all system contain ap-
digital system has been designed for carrying out a wide range proximately 7,000 vacuum tubes and 165,000 solid-state diodes.
of experimental investigations that are of special importance to The basic component for these units is a modified version of the
the Government. The system can be utilized for investigating new one megacycle package used in the NBS DYSEAC, which in turn
or stringent applications of these general types: (1)data-processing was evolved from the hardware used in NBS Electronic Automatic
applications, in which the system can be used for performing Computer (SEAC). As a result of a more effective logical design
accounting and information-retrieval operations for management and faster memory, however, the new NBS system will run more
purposes; (2) mathematical applications, in which the system can than 100 times faster than SEAC on programs involving only
be used for performing mathematical calculations for scientific fixed-point operations; for programs involving floating-point ma-
purposes, including scientific data-reduction; ( 3 ) control applica- nipulations, the advantage exceeds 1,000. The arithmetic speed
tions, in which the system can be used for performing real-time of the new system derives in a large part from connecting a novel
control and simulation operations, in conjunction with analog type of parallel adder to a diode-capacitor memory capable of
computer facilities or in conjunction with other instrument instal- providing one random access per microsecond.
lations, remotely located if necessary; and (4)network applications, The system contains seven major blocks, which are indicated
‘ P ~ o c .EJCC, 71-75 (1958). in Fig. 1, namely: (1) the primary computer, in the lower center
440
Chapter 35 1 PILOT, the NBS multicomputer system 441
Table 1 Arithmetic operation times a completely impromptu basis, at the instigation of either the
(including 4 random access times to last memory) system or the external world, or both acting jointly.
The primary computer, a high-speed general-purpose com-
Total time
(microseconds) puter, contains both an arithmetic unit and a program control unit
of considerable versatility. This computer can carry out a variety
Minimum-
of high precision arithmetic and logical processing operations, in
Operation Average maximum
either binary or decimal code and in a wide variety of word lengths
Fixed-point Addition, Subtraction, Comparison . . 7.5. . . . . . 6-9 and formats. Its partner computer, the secondary computer, spe-
Fixed-point Multiplication . . . . . . . . . . . . . . 3 1 . . . .
, .22-40 cializes in short-word operations, usually manipulations on address
Fixed-point Division . . . . . . . . . . . . . . 7 3 . .
, , , , , , .72-74
numbers or other “red-tape” information, which it supplies auto-
Floating-point Addition, Subtractiont . . . , 20 . . . . . . 19-21
Floating-point Multiplication, . . . . . . . . . . . 3 7 . . .
, , , .28-46 matically as needed to the primary program. The third computer
of the system, called the format controller (see input-output con-
t For shift of 4 bits. trol in Fig. l), is specially designed for carrying out editing,
inspecting, and format-modifying operations on data that are
of the figure, (2) the primary storage, upper center; ( 3 )the second- flowing in or out of the internal memory via the peripheral external
ary computer and the secondary storage, right; (4)the input-output units of the system. All three computers, and all the external units
control, upper left; (5) the external storage units, upper far left; of the system, share access privileges to the common high-speed
(6) the external input-output units such as readers, printers, and internal memory, which is linked to the input-output and external
displays, lower far left; and (7) lower left, the external control storage units via independent trunks for effecting data-transfers.
containing the special features that facilitate communication with Transfers of data can take place between the external units, the
people and devices in the world outside the system which is memory units, and the computers concurrently without interrupt-
remotely located if necessary. Interchanges of information between ing the progress of the computational program. Because of the
the system and the outside world can take place at any time, on flexibility of the format controller, incoming data can be accepted
.
De14 60 STORAGE LOCATIONS
CONTROLLER TRUNKS TR4NSFERS
32,768 TOTAL
TRANSFERS
---------- --.I
I
ADDRESSIBLE STmAGE WORDS
A VI
v)
2 6
3
a
2
-
0
D
PROGRIM 8 DAT4 FOR SECOND4RY Y)
v 1
1
8 PROCESSING a
ARITHMETICUNIT PROGRAM
CONTROL UNIT PROGRAM ARITHMETIC 8
CONTROL UNIT FSUCESSINGUNIT
BINARY a DECIMAL,
~~
THREE-ADDRESS
~
TWO-ADDRESS BINARY,
MANUAL,
.FIXED a FLOATING
POINTS,
l $ ~ ~ ~ O ” * SYSTEM 16-811,
DATA 4ND DIRECT INSTRUCTIONS EXPLICIT NEXT
DISPLAYS ~ ~ ~ ~ ~ ~ o ~ ! ~ ~ ~ , “ , s
CONTROL SIGN4LS FULLa HALFWJrnS sE%%~k?$xT 4ND mNTRDL SIGN4S INSTRUCTION
4 (I6 VARIETIES) 4 (16TYPES)
(16 BASIC TYPES)
from a wide variety of external devices and in a wide variety of format, the respective lefthand and righthand halves of each
formats. double operand are processed simultaneously in a single instruc-
tion time, and the two independent half-word results are written
back in the corresponding halves of the full-length result location.
Functions of the major units
The specific functions of the major units can be described briefly Program control unit. The program control unit interprets and
as follows: regulates the sequencing of instructions in the program. It operates
with a 68-bit binary-coded 3-address instruction word. See Table
Primary computer 3. Each instruction word contains three 16-bit codes which specify
Arithmetic and processing unit. Using a 64-bit number word with the addresses of each of two operands, alpha and beta, and usually
algebraic sign, this unit carries out 7 different types of arithmetical the address of the result of the operation, gamma, in the main
operations, 5 types of choice (branch) operations, and 2 types of memory. The memory location of the next instruction word is
logical pattern-processing operations. See Table 2. Arithmetical specified by a 16-bit address number contained in one of 16 possi-
operations can be performed in any of 16 possible formats. For ble base registers; a 4-bit code in the instruction word (d-digits)
example, arithmetic can be performed using either a pure binary specifies which one of the base registers contains the desired word.
or a binary-coded decimal number code, and in both fixed-point Whenever a register is so used as a next-instruction address source,
and floating-point notation. Fixed-point operations can also be its contents are automatically increased by unity. Choice instruc-
carried out in a special half-word format in which two independ- tions, used for program branching, from time to time may cause
ently addressable half-words are stored in a single full-word storage a new alternative address number to be inserted in any one of
location. These two half-words can be processed either separately, the base registers. This register is then used as the source of the
as independent words, or concurrently in duplex format. In duplex address number of the next instruction.
68-65 64-61 60-57 56-53 52-49 48-45 44-41 40-37 36-33 32-29 28-25 24-21 20-17 16-13 12-9 8-5 4-1
Tags Address alpha Address beta Address gamma Next Code for Mon.
Instn. Operation Break
Point
-
OOO? a- b- C- d- Param. Basic e-
Digits Digits Digits Digits eter Type Digits
Addresses alpha, beta, and gamma written in the instruction in the system, concurrently with computation. The size of the
word are subject to automatic modification if desired by writing block transferred may range from a single word to the entire
a 1-digit in a specified bit position. Such addresses are called contents of the memory, and the addresses between which the
relative addresses. Each of the three addresses ( a ,/3, and y ) in each information is transferred are specified by a single programmed
instruction word contains a 4-bit code group, called the a-, b-, inter-memory transfer instruction. Automatic interlocks are pro-
and c-digits respectively, in which any base register identification vided to insure that all future references which the program may
number (0 through 15) may be written. When this is done, the make to any memory positions involved in the inter-memory
address number to which the computer actually refers is equal transfer operation are automatically made after the data have been
to the sum (modulo 216) of the address number stored in the shifted to the new locations.
designated base register plus an address-modification constant,
indicated in the remaining 12 bits of the 16-bit address segment Secondary computer
of the instruction word. Arithmetic and processing unit. The secondary computer is a
high-speed independently programmable general-purpose com-
Primay storage units
puter that operates in conjunction with the primary computer and
Fast access memory. Because of budget limitations, the initial can perform 16 distinct types of operations using 16-bit words.
installation of the system will contain only a relatively small These operations include 6 arithmetic-processing operations, 4
section of internal memory of the diode-capacitor type. This choice operations, 1 nonnumerical processing operation, and 5
diode-capacitor memory, originally developed at NBS in 1953, is operations that transfer digital information or control-signals be-
very fast; i.e., capable of providing one random access per micro- tween the primary and the secondary computers. See Table 2.
second, but it has the disadvantage of relatively high cost per word Operation times for the secondary computer average about 2 psec.
of storage. This type of memory is available in modules of 256 Both computers operate concurrently and can transfer infor-
words subdivided as follows: mation back and forth between each other. One of the principal
functions of the secondary computer is to carry out so-called
Numerical information 64 bits
“red-tape” operations, such as: (1) counting iterations, (2) syste-
Algebraic signs and tags 4 bits
matically modifying the addresses of the operands and instructions
Parity check digits 4 bits
referred to by the primary program, (3) monitoring the primary
Total word length 72 bits
program, and (4)various special tasks. Through the use of special
The over-all system is designed to accommodate up to 32,768 subroutines for the secondary computer, both computers acting
internally-accessible full-words, which may be held in storage units co-operatively can be made to carry out a wide variety of complex
with access times ranging from 1 microsecond (psec) to 32 psec. operations without unduly complicating the writing of the primary
Thus the minimum fast access memory can be backed up with computer programs. Examples of such operations are: (1)special
a much larger and slower magnetic-core memory. types of sorting, (2) logarithmic search, (3) routines involving
cross-referencing, or items selected according to an attached code,
Inter-memory trunsfer trunk. Provision is made for transferring (4) error analyses, and (5) operations involving small numerical
blocks of information between the various internal storage units fields.
444 Pari 5 1 The PMS level Section 2 I Computers with one central processor and multiple input/output processors
Secondary storage unit. Associated with the secondary computer Input-output control
is the secondary storage unit which consists of 60 storage locations Concurrent input-output trunks. The concurrent input-output
containing 16-bit words. Sixteen of these locations can be used trunks have the function of controlling the transfer of information
as base registers by the primary computer and may be selected in either direction between the internal memory and the external
by the primary computer according to the a-, b-, c-, and d-digits storage units. All input-output transfers are initiated by a single
in the primary instruction word. The contents of the registers internally programmed instruction, and are carried out by the
selected by the primary computer in this way are automatically trunk units with the aid of automatic interlocks similar to those
added to the address numbers specified in the primary computer used in the inter-memory transfer trunk for preventing interfer-
instruction word. The secondary storage unit is also capable of ence with the progress of the computing program. The size of the
being addressed directly by the primary computer. The fifteen block of data that is transferred may range from a single word
4-word blocks of the secondary storage are identified by 15 special to the entire contents of the memory and may be directed to any
primary address numbers. Other addressable registers associated addresses. Using two such trunks, it is possible to maintain two
with the secondary storage hold the address numbers of current continuous streams of data simultaneously flowing between the
and next instruction words in the primary program. internal memory and any two external storage units without
interrupting the progress of the computations.
Program control unit. The secondary computer program operates
with a 2-address instruction system, the addresses referring to
Format controller. Data that are passing in and out of the internal
words in the secondary storage unit, including the base registers.
storage system via the input-output trunks are subject to further
See Table 4. From time to time the primary instruction program
concurrent processing by the format controller. The format con-
may order the insertion of a new instruction into the secondary
troller is an independent internally-programmed data-processing
instruction register or may order the transfer of data in either
unit specially designed for carrying out general-purpose editing,
direction between the primary storage units and the secondary
inspecting, and format-modifying operations on incoming or out-
storage unit. The secondary computer program may also cause data
to be transferred into the secondary storage unit from the primary going data. Programs for the format controller are stored on
instruction register and can also cause information to be trans- removable plugboards, and the primary computer program is able
ferred into the primary instruction register from a location in the to direct the format controller to select whichever particular
main memory. format program may be appropriate from among the small library
of format programs contained on the boards currently attached
Using these facilities, the secondary computer can inspect each
to the machine. Among the typical kinds of programs that the
instruction word in the primary program as it is selected from the
format controller can carry out are: (1)searching of magnetic tapes
primary store and, acting upon specifications written into the
for words bearing identifying addresses or other coded labels
secondary program, can cause the primary instruction either to
specified by the internal program, with selective input or output
be executed as written or to be replaced by a new instruction word
of data at these selected tape locations, (2) insertion of incoming
from a memory location determined by the secondary. Other types
data for the internal storage units of the system into address
of discrimination can be effected by the secondary that depend
locations specified by the incoming data itself, (3)conversion and
upon the result of a primary operation, such as an overflow, jump,
rearrangement of data that are stored on external units in formats
etc. These features facilitate the use of interpretive programming
not compatible with the formats used in the internal units; e.g.,
methods.
binary-decimal character conversion, adjustment of word-length
modules, etc.
Input-output units the data-processor and the operator, and allows the operator to
monitor the progress of the program automatically, to insert new
The system is designed to operate with a wide variety of input-
data and instructions, and to withdraw intermediate results con-
output devices, both digital and analog.
veniently, without need for advance preparation of special pro-
grams. This is particularly useful in debugging programs and in
Input readers and printers. Flexowriter units and paper-tape read-
checking equipment malfunctions.
ers and punches will be available in the initial installation.
Monitor operations are performed by the machine whenever
Punched card input readers and high-speed printers, along with
the conditions specified by the external switch settings occur in
their auxiliary controls, may be attached to the format controller
the course of the program; e.g., every time the program refers to
in the manner indicated in the preceding paragraph.
a new instruction, any time the program refers to an instruction
to which a special monitor breakpoint symbol (e-digits) is attached,
Displays. Two types of displays are provided for: (1) pilot-light any time an arithmetic overflow occurs, etc. By pairing a particular
display of data and control information in the various registers type of manual-monitor operation with a selected set of conditions,
and flip-flops throughout the system, in order to aid the rapid a variety of special composite operations can be performed.
diagnosis of equipment malfunctions of programming faults, and
(2) picture-tube display of real-time data stored in the internal Remote controls. Manual-monitor operations can be specified and
memory of the system. This kinematic diagram type of display initiated by external devices as well as by human operators. Since
is very important when performing dynamic simulation operations all of the external switch settings control only d-c voltages, the
which require visual presentation of the simulated data in real- external devices can even be remote from the machine itself, and
time to the human operators. from a distance, via ordinary electrical transmission lines, they can
exercise supervisory control over the internal program of the
External control machine. This makes it possible to harness together two or more
Manual-monitor control. The term “manual-monitor” was coined remotely located data-processing machines, and have them work
at NBS several years ago to describe certain types of control together co-operatively on a common task. Each member of such
operations that are initiated either manually by the machine an interconnected network of separate data processors is free at
operator or by the machine itself under conditions which are any time to initiate and dispatch special control orders to any of
specified by means of external switch settings. The former is its partners in the system. As a consequence, the supervisory
referred to as a manual operation and the latter is called a monitor control over the common task may be shared among the various
operation because the machine must monitor its internal program members of the system, and may be passed back and forth from
to determine precisely when the operation should be performed. one machine to the other as the need arises.
The type of operation to be performed as well as the conditions
under which it is to be performed are specified by means of
external switch settings. References
This feature provides for convenient communication between LeinA57, 59
Section 3
446
Chapter 36
Introduction is, to the user, simply the percentage of available, on-line, opera-
The D825 Modular Data Processing System is the result of a tion time during a given time period. Every system designer must
Burroughs study, initiated several years ago, of the data processing trade off the costs of designing for reliability against those incurred
requirements for command and control systems. The D825 has by unavailability, but in no other application are the costs of
been developed for operation in the military environment. The unavailability so high as those presented in command and control.
Not only is the requirement for hardware reliability greater than
initial system, constructed for the Naval Research Laboratory with
the designation AN/GYK-3(V), has been completed and tested. that of commercial systems, but downtime for the complete system
This paper reviews the design criteria analysis and design rationale for preventive maintenance cannot be permitted. Depending upon
that led to the system structure of the D825. The implementation the application, some greater or lesser portion of the complete
and operation of the system are also described. Of particular system must always be available for primary system functions, and
interest is the role that developed for an operating system program all of the system must be available most of the time.
in coordinating the system components. The data processing facility may also be called upon, except
at the most critical times, to take part in exercising and evaluating
Functional requirements of command and control data processing the operation of some parts of the system, or, in fact, in actual
By “command and control system” is meant a system having the simulation of system functions. During such exercises and simula-
capacity to monitor and direct all aspects of the operation of a tions, the system must maintain some (although perhaps partially
large man and machine complex. Until now, the term has been and temporarily degraded) real-life and real-time capability, and
applied exclusively to certain military complexes, but could as well must be able to return quickly to full operation. An implication
be applied to a fully integrated air traffic control system or even here, of profound significance in system design, is, again, the
to the operation of a large industrial complex. Operation of com- requirement that most of the system be always available; there
mand and control systems is characterized by an enormous quan- must be no system elements (unsupported by alternates) perform-
tity of diverse but interrelated tasks-generally arising in real ing functions so critical that failure at these points could compro-
time-which are best performed by automatic data-processing mise the primary system functions.
equipment, and are most effectively controlled in a fully integrated
central data processing facility. The data processing functions Adaptability criteria. Another requirement, equally difficult to
alluded to are those typical of data processing, plus special func- achieve, is that the computer system must be able to analyze the
tions associated with servicing displays, responding to manual demands being made upon it at any given time, and determine
insertion (through consoles) of data, and dealing with communica- from this analysis the attention and emphasis that should be given
tions facilities. The design implications of these functions will be to the individual tasks of the problem mix presented. The working
considered here. configuration of the system must be completely adaptable so as
to accommodate the diverse problem mixes, and, moreover, must
Aoailability criteria. The primary requirement of the data-proc- respond quickly to important changes, such as might be indicated
essing facility, above all else, is availability. This requirement, by external alarms or the results of internal computations (exceed-
essentially a function of hardware reliability and maintainability, ing of certain thresholds, for example), or to changes in the hard-
ware configuration resulting from the failure of a system compo-
‘AFIPS Proc. FJCC, vol. 22, pp. 86-96, 1962 nent or from its intentional removal from the system. The system
447
448 Part 5 1 The PMS level Section 3 1 Computers for multiprocessing and parallel processing
must have the ability to be dynamically and automatically re- proposed schemes for the organization of data processing systems
structured to a working configuration that is responsive to the were evaluated in light of the requirements listed above. Many
problem-mix environment. of the same conclusions regarding these and other schemes in the
use of computers in command and control were reached inde-
Expansibility criteria. The requirement of expansibility is not pendently in a more recent study conducted for the Department
unique to command and control, but is a desirable feature in any of Defense by the Institute for Defense Analysis [Kroger et al.,
application of data processing equipment. However, the need for 19611.
expansibility is more acute in command and control because of
the dependence of much of the efficacy of the system upon an The single-computer system. The most obvious system scheme, and
ability to meet the changing requirements brought on by the very the least acceptable for command and control, is the single-com-
rapidly changing technology of warfare. Further, it must be possi- puter system. This scheme fails to meet the availability require-
ble to incorporate new functions in such a way that little or no ment simply because the failure of any part-computer, memory,
transitional downtime results in any hardware area. or 1 / 0 control-disables the entire system. Such a system was not
Expansion should be possible without incurring the costs of given serious consideration.
providing more capability than is needed at the time. This ability
Replicated single-computer systems. A system organization that had
of the system to grow to meet demands should apply not only to
been well known at the time these considerations were active
the conventionally expansible areas of memory and 1 / 0 but to
involves the duplication (or triplication, etc.) of single-computer
computational devices, as well.
systems to obtain availability and greater processing rates. This
approach appears initially attractive, inasmuch as programs for
Programming criteria. Expansion of the data-processing facility
the application may be split among two or more independent
should require no reprogramming of old functions, and programs
single-computer systems, using as many such systems as needed
for new functions should be easily incorporated into the overall
to perform all of the required computation. Even the availability
system. To achieve this capability, programs must be written in
requirement seems satisfied, since a redundant system may be kept
a manner which is independent of system configuration or problem
in idle reserve as backup for the main function.
mix, and should even be interchangeable between sites performing
On closer examination, however, it was perceived that such
like tasks in different geographic locales. Finally, because of the
a system had many disadvantages for command and control appli-
large volume of routines that must be written for a command and
cations. Besides requiring considerable human effort to coordinate
control system, it should be possible for many different people,
the operation of the systems, and considerable waste of available
in different locations and of different areas of responsibility, to
machine time, the replicated single computers were found to be
write portions of programs, and for the programs to be subse-
ineffective because of the highly interrelated way in which data
quently linked together by a suitable operating system.
and programs are frequently used in command and control appli-
Concomitant with the latter requirement and with that of
cations. Further, the steps necessary to have the redundant or
configuration-independent programs is the desirability of orienting
backup system take over the main function, should the need arise,
system design and operation toward the use of a high-level pro- would prove too cumbersome, particularly in a time-critical ap-
cedure-oriented language. The language should have the features
plication where constant monitoring of events is required.
of the usual algorithmic languages for scientific computations, but
should also include provisions for maintaining large files of data Partially shared memory schemes. It was seen that if the replicated
sets which may, in fact, be ill-structured. It is also desirable that computer scheme were to be modified by the use of partially
the language reflect the special nature of the application; this is shared memory, some important new capabilities would arise. A
especially true when the language is used to direct the storage partially shared memory can take several forms, but provides
and retrieval of data. principally for some shared storage and some storage privately
allotted to individual computers. The shared storage may be of
Design rationale for the data-processing facility any kind-tapes, discs, or core-but frequently is core. Such a
The three requirements of availability, adaptability, and expansi- system, by providing a direct path of communication between
bility were the motivating considerations in developing the D825 computers, goes a long way toward satisfying the requirements
design. In arriving at the final systems design, several existing and listed above.
Chapter 36 I D825-a multiple-computer system for command and control 449
The one advantage to be found in having some memory private a decentralization of the computing function-that is, a multi-
to each computer is that of data protection. This advantage van- plicity of computing units. However, the nature of the problem
ishes when it is necessary to exchange data between computers, required that data be freely communicable among these several
for if a computer failure were to occur, the contents of the private computers. It was decided, therefore, that the memory system
memory of that computer would be lost to the system. Further- would be completely shared by all processors. And, from the point
more, many tasks in the command and control application require of view of availability and efficiency, it was also seen to be unde-
access to the same data. If, for example, it would be desirable to sirable to associate 1/0 with a particular computer; the 1/0
permit some privately stored data to be made available to the fully control was, therefore, also decoupled from the computers.
shared memory or to some other private memory, considerable Furthermore, a system with several computers, totally shared
time would be lost in transferring the data. It is also clear that memory, and decoupled 1 / 0 seemed a perfect structure for satis-
a certain amount of utilization efficiency is lost, since some private fying the adaptability requirements of command and control. Such
memory may be unused, while another computer may require a structure resulted in a flexibility of control which was a fine
more memory than is directly available, and may be forced to match for the dynamic, highly variable, processing requirements
transfer other blocks of data back to bulk storage to make way to be encountered.
for the necessary storage. It might be added in passing that if The major problem remaining to realize the computational
private 1/0 complements are considered, the same questions of potential represented by such a system was, of course, that of
decreased overall availability and decreased efficiency arise. coordinating the many system elements to behave, at any given
time, like a system specifically designed to handle the set of tasks
Muster/sluve schemes. Another aspect of the partially shared with which it was faced at that time. Because of the limitations
memory system is that of control. A number of such systems of previously available equipment, an operating system program
employ a master/slave scheme to achieve control, a technique had always been identified with the equipment running the pro-
wherein one computer, designated the master computer, coordi- gram. However, in the proposed design, the entire memory was
nates the work done by the others. The master computer might to be directly accessible to all computer modules, and the operat-
be of a different character than the others, as in the PILOT system, ing system could, therefore, be decoupled from any specific com-
developed by the National Bureau of Standards [Leiner et al., puter. The operation of the system could be coordinated by having
19571, or it may be of the same basic design, differing only in its any processor in the complement run the operating system only
prescribed role, as in the Thompson Ram0 Wooldridge TRW400 as the need arose. It became clear that the master computer had
(AN/FSQ-27) [Porter, 19601. Such a scheme does recognize the actually become a program stored in totally shared memory, a
importance, for multicomputer systems, of the problem of coordi- transformation which was also seen to offer enhanced program-
nating the processing effort; the master computer is an effective ming flexibility.
means of accomplishing the coordination. However, there are Up to this point, the need for identical computer modules had
several difficulties in such a design. The loss of the master com- not been established. The equality of responsibility among com-
puter would down the whole system, and the command and control puting units, which allowed each computer to perform as the
availability requirement could not, consequently, be met. If this master when running the operating system, led finally to the design
weakness is countered by providing the ability for the master specification of identical computer modules. These were freely
control function to be automatically switched to another processor, interconnected to a set of identical memory modules and a set
there still remains an inherent inefficiency. If, for example, the of identical 1/0 control modules, the latter, in turn, freely inter-
workload of the master computer becomes very large, the master connected to a highly variable and diverse 1/0 device comple-
becomes a system bottleneck resulting in inefficient use of all other ment. It was clear that the complete modularity of system ele-
system elements; and, on the other hand, if the workload fails to ments was an effective solution to the problem of expansibility,
keep the master busy, a waste of computing power results. The inasmuch as expansion could be accomplished simply by adding
conclusion is then reached that a master should be established only modules identical to those in the existing complement. It was also
when needed; this is what has been done in the design of the D825. clear that important advantages and economies resulting from the
manufacture, maintenance, and spare parts provisioning for iden-
The totally modular scheme. As a result of these analyses, certain tical modules also accrue to such a system. Perhaps the most
implications became clear. The availability requirement dictated important result of a totally modular organization is that redun-
450 Part 5 I The PMS level Section 3 1 Computers for multiprocessing and parallel processing
dancy of the required complement of any module type, for greater or system module. Conflicting requests are queued up according
reliability, is easily achieved by incorporating as little as one to the priority assigned to the requestors. Priorities are pre-
additional module of that type in the system. Furthermore, the emptive in that the appearance of a higher priority request will
additional module of each type need not be idle; the system may cause service of that request before service of a lower priority
be looked upon as operating with active spares. request already in the queue. Analyses of queueing probabilities
Thus, a design structure based upon complete modularity was have shown that queues longer than one are extremely unlikely.
set. Two items remained to weld the various functional modules The priority scheduling function is performed by the bus allo-
into a coordinated system-a device to electronically interconnect cator, essentially a set of logical matrices. The conflict matrix
the modules, and an operating system program with the effect of detects the presence of conflicts in requests for interconnection.
a master computer, to coordinate the activities of the modules into The priority matrix resolves the priority of each request. The
fully integrated system operation. logical product of the states of the conflict and priority matrices
In the D825, these two tasks are carried out by the switching determines the state of the queue matrix, which in turn governs
interlock and the Automatic Operating and Scheduling Program the setting of the crosspoint switch, unless the requested module
(AOSP), respectively. Figure 1 shows how the various functional is busy.
modules are interconnected via the interlock in a matrix-like
fashion. The AOSP: a n operating system program. The AOSP is an operating
system program stored in totally shared memory and therefore
System implementation available to any computer. The program is run only as needed
Most important in the design implementation of the D825 were to exert control over the system. The AOSP includes its own
studies toward practical realization of the switching interlock and executive routine, an operating system for an operating system,
the AOSP. The computer, memory, and 1/0 control modules as it were, calling out additional routines, as required. The con-
permitted more conventional solutions, but were each to incor- figuration of the AOSP thus permits variation from application to
porate some unusual features, while many of the 1/0 devices were application, both in sequence and quantity of available routines
selected from existing equipment. With the exception of the latter, and in disposition of AOSP storage.
all of theses elements are discussed here briefly. (A summary of The AOSP operates effectively on two levels, one for system
D825 characteristics and specifications is included at the end of control, the other for task processing.
the paper.) The system control function embodies all that is necessary to
call system programs and associated data from some location in
Switching interlock. Having determined that only a completely the 1/0 complement, and to ready the programs for execution by
shared memory system would be adequate, it was necessary to find finding and allocating space in memory, and initiating the proc-
some way to permit access to any memory by any processor, and, essing. Most of the system control function (as well as the task
in fact, to permit sharing of a memory module by two or more processing function) consists of elaborate bookkeeping for: pro-
processors or 1/0 control modules. grams being run, programs that are active (that is, occupy memory
A function distributed physically through all of the modules space), 1/0 commands being executed, other 1/0 commands
of a D825 system, but which has been designated in aggregate waiting, external data blocks to be received and decoded, and
the switching interlock, effects electronically each of the many activation of the appropriate programs to handle such external
brief interconnections by which all information is transferred data. It would be inappropriate here to discuss the myriad details
among computer, memory, and 1/0 control modules. In addition of the AOSP; some idea of its scope, however, can be obtained
to the electronic switching function, the switching interlock has from the following list of some of its major functions:
the ability to detect and resolve conflicts such as occur when two
or more computer modules attempt access to the same memory 1 Configuration determination
module. 2 Memory allocation
The switching interlock consists functionally of a crosspoint
3 Scheduling
switch matrix which effects the actual switching of bus intercon-
nections, and a bus allocator which resolves all time conflicts 4 Program readying and end-of-job cleanup
resulting from simultaneous requests for access to the same bus 5 Reporting and logging
Chapter 36 I D825-a multiple-computer system for command and control 451
6 Diagnostics and confidence checking 4 Inability to access memory, or an internal parity error;
parity error on an 1 / 0 operation causes termination of that
7 External interrupt processing
operation with suitable indication to the AOSP
The task processing function of the AOSP is to execute all 5 Primary power failure
program 1/0 requests in order to centralize scheduling problems 6 Automatic restart after primary power failure
and to protect the system from the possibility of data destruction
7 1/0 termination other than normal completion
by ill-structured or conflicting programs.
AOSP response to interrupts. The AOSP function depends heavily While the reasons for including most of the interrupts listed above
upon the comprehensive set of interrupts incorporated in the are evident, a word of comment on some of them is in order.
D825. All interrupt conditions are transmitted to all computer The array-data-absent interrupt is initiated when a reference
modules in the system, and each computer module can respond is made to data that is not present in the memory. Since all array
to all interrupt conditions. However, to make it possible to dis- references such as A[k] are made relative to the base (location
tribute the responsibility for various interrupt conditions, both of the first element) of the array, it is necessary to obtain this
system and local, each computer module has an interrupt mask address and to index it by the value k. When the base of array
register that controls the setting of individual bits of the interrupt A is fetched, hardware sensing of a presence bit either allows the
register. The occurrence of any interrupt causes one of the system operation to continue, or initiates the array-data-absent interrupt.
computer modules to leave the program it has been running and In this way, keeping track of data in use by interacting programs
branch to the suitable AOSP entry, entering a control mode as it can be simplified, as may the storage allocation problem.
branches. The control mode differs from the normal mode of The primary power failure interrupt is highest priority, and
operation in that it locks out the response to some low-priority always pre-emptive. This interrupt causes all computer and 1/0
interrupts (although recording them) and enables the execution control modules to terminate operations, and to store all volatile
of some additional instructions reserved for AOSP use (such as information either in memory modules or in magnetic thin-film
setting an interrupt mask register or memory protection registers, registers. (The latter are integral elements of computer modules.)
or transmitting an 1/0 instruction to an 1/0 control module). This interrupt protects the system from transient power failure,
In responding to an interrupt, the AOSP transfers control to and is initiated when the primary power source voltage drops
the appropriate routine handling the condition designated by the below a predetermined limit.
interrupt. When the interrupt condition has been satisfied, control The automatic restart after primary power failure interrupt is
is returned to the original object program. Interrupts caused by provided so that the previous state of the system can be recon-
normal operating conditions include: structed.
A description of how an external interrupt is handled might
1 16 different types of external requests clarify the general interrupt procedure. Upon the presence of an
2 external interrupt, the computer which has been assigned respon-
Completion of an 1/0 operation
sibility to handle such interrupts automatically stores the contents
3 Real-time clock overflow of those registers (such as the program counter) necessary to
4 Array data absent subsequently reconstitute its state, enters the control mode, and
5 Computer-to-computer interrupts goes to a standard (hardware-determined) location where a branch
to the external request routine is located. This routine has the
6 Control mode entry (normal mode halt) responsibility of determining which external request line requires
servicing, and, after consulting a table of external devices (teletype
Interrupts related to abnormalities of either program or equipment buffers, console keyboards, displays, etc.) associated with the
include: interrupt lines, the computer constructs and transmits an input
1 Attempt by program to write out of bounds instruction to the requesting device for an initial message. The
computer then makes an entry in the table of the 1/0 complete
2 Arithmetic overflow program (the program that handles 1/0 complete interrupts) to
3 Illegal instruction activate the appropriate responding routine when the message is
Chapter 36 I D825-a multiple-computer system for command and control 453
read in. A check is then made for the occurrence of additional in the program code inform the AOSP that parallel processing with
external requests. Finally, the computer restores the saved register two or more computers is possible at a given point. In addition,
contents and returns in normal mode to the interrupted program. the programmer must specify where the branches indicated in this
manner will join following the parallel processing.
AOSP control of 1 / 0 activity. As mentioned above, control of all
1 / 0 activity is also within the province of the AOSP. Records are Computer module. The computer modules of the D825 system are
kept on the condition and availability of each 1/0 device. The identical, general-purpose, arithmetic and control units. In deter-
locations of all files within the computer system, whether on mining the internal structure of the computer modules, two con-
magnetic tape, drum, disc file, card, or represented as external siderations were uppermost. First, all programs and data had to
inputs, are also recorded. A request for input by file name is be arbitrarily relocatable to simplify the storage allocation func-
evaluated, and, if the device associated with this name is readily tion of the AOSP; secondly, programs would not be modified
available, the action is initiated. If for any reason the request must during execution. The latter consideration was necessary to mini-
be deferred, it is placed in a program queue to await conditions mize the amount of work required to pre-empt a program, since
which permit its initiation. Typical conditions which would cause all that would have to be saved to reinstate the interrupted pro-
deferral of an 1/0 operation include: gram at a later time would be the data for that program and the
register contents of the computer module running the program
1 No available 1 / 0 control module or channel. at the time it was dumped.
The D825 computer modules employ a variable-length in-
2 The device in which the file is located is presently in use.
struction format made up of quarter-word syllables. Zero-, one-,
3 The file does not exist in the system. two-, or three-address syllables, as required, can be associated with
each basic command syllable. An implicitly addressed accumulator
In the latter case, typically, a message would be typed out on the stack is used in conjunction with the arithmetic unit. Indexing of
supervisory printer, asking for the missing file. all addresses in a command is provided, as well as arbitrarily deep
The 1/0 complete interrupt signals the completion of each 1/0 indirect addressing for data.
operation. Along with this interrupt, an 1/0 result descriptor is Each computer module includes a 128-position thin-film mem-
deposited in an AOSP table. The status relayed in this descriptor ory used for the stack, and also for many of the registers of the
indicates whether or not the operation was successful. If not machine, such as the program base register, data base register,
successful, what went wrong (such as a parity error, or tape break, the index registers, limit registers, and the like.
card jams, etc.) is indicated so that the AOSP may initiate the The instruction complement of the D825 includes the usual
appropriate action. If the operation was successful, any waiting fixed-point, floating-point, logical, and partial-field commands
1/0 operations which can now proceed are initiated. found in any reasonably large scientific data processor.
AOSP control of program scheduling. Scheduling in the D825 relies Memory module. The memory modules consist of independent
upon a job table maintained by the AOSP. Each entry is identified units storing 4096 words, each of 48 bits. Each unit has an individ-
with a name, priority, precedence requirements, and equipment ual power supply and all of the necessary electronics to control
requirements. Priority may be dynamic, depending upon time, the reading, writing, and transmission of data. The size of the
external requests, other programs, or a function of many variable memory modules was established as a compromise between a
conditions. Each time the AOSP is called upon to select a program module size small enough to minimize conflicts wherein two or
to be run, whether as a result of the completion of a program or more computer or 1/0 modules attempt access to the same mem-
of some other interrupt condition, the job table is evaluated. In ory module, and a size large enough to keep the cost of duplicated
a real-time system, situations occur wherein there is no system power supplies and addressing logic within bounds. It might be
program to be run, and machine time is available for other uses. noted that for a larger modular processor system, these trade-offs
This time could be used for auxiliary functions, such as confidence might indicate that memory modules of 8192 words would be more
routines. suitable. Modules larger than this-of 16,384 or 32,768 words, for
The AOSP provides the capability for program segmentation example-would make construction of relatively small equipment
at the discretion of the programmer. Control macros embedded complements meeting the requirements set forth above quite
454 Pari 5 I The PMS level Section 3 I Computers for multiprocessing and parallel processing
difficult. The cost of smaller units of memory is offset by the Table 1 Specifications, D825 modular data processing system
lessening of catastrophe in the event of failure of a module.
Computer module: 4, maximum complement
1/0 control module. The 1/0 descriptor is an instruction to the Floating-point add: 7.0 psec (average)
1/0 control module that selects the device, determines the direc- Floating-point multiply: 3 4 . 0 psec (average)
tion of data flow, the address of the first word, and the number
Logical AND: 0 . 3 3 psec
of words to be transferred.
Memory type: Homogeneous, modular, random-access,
Interposed between the 1/0 control modules and the physical
linear-select, ferrite-core
external devices is another crossbar switch designated the 1/0
Memory capacity: 65,536 words (16 modules m a x i m u m , 4 0 9 6
exchange. This automatic exchange, similar in function to the
words each)
switching interlock, permits two-way data flow between any 1/0
1/0 exchanges per system: 1 or 2
control module and any 1/0 device in the system. It further
enhances the flexibility of the system by providing as many possible 1/0 control modules: 1 0 per exchange, m a x i m u m
external data transfer paths as there are 1/0 control modules. 1/0 devices: 6 4 per exchange, m a x i m u m
Access t o 1/0 devices: All 1/0 devices available to every 1/0 control
Equipment complements. A D825 system can be assembled (or module in exchange
expanded) by selection of appropriate modules in any combination Transfer rate per 1/0 exchange: 2,000,000 characters per second
of: one to four computer modules, one to 16 memory modules, 1/0 device complement: All standard 1/0 types, including 6 7 kc mag-
netic tapes, magnetic d r u m s and discs, card
and paper tape punches and readers, char.
acter and line printers, communications and
display equipment
one to ten 1/0 control modules, one or two 1/0 exchanges, and
one to 64 1/0 devices per 1/0 exchange in any combination
selected from: operating (or system status) consoles, magnetic tape
transports, magnetic drums, magnetic disc files, card punches and
readers, paper tape perforators and readers, supervisory printers,
high-speed line printers, selected data converters, special real-time
clocks, and intersystem data links.
Figure 2 is a photograph of some of the hardware of a com-
pleted D825 system. The equipment complement of this system
includes two computer modules, four memory modules (two per
cabinet), two 1/0 control modules (two per cabinet), one status
Fig. 2. Typical D825 equipment array. display console, two magnetic tape units, two magnetic drums,
Chapter 36 I D825-a multiple-computer system for command and control 455
a card reader, a card punch, a supervisory printer, and an electro- A second requirement is that the working configuration of the
static line printer. system at a given moment be instantly reconstructable to new
D825 characteristics are summarized in Table 1. forms more suited to a dynamically and unpredictably changing
work load. In the D825, all communication routes are public, all
Summary and conclusion modules are functionally decoupled, all assignments are scheduled
dynamically, and assignment patterns are totally fluid. The system
It is the belief of the authors that modular systems (in the sense of interrupts and priorities controlled by the AOSP and the
discussed above) are a natural solution to the problem of obtaining switching interlock permits instant adaptation to any work load,
greater computational capacity-more natural than simply to without destruction of interrupted programs.
build larger and faster machines. More specifically, the organiza- The requirement for expansibility calls simply for adaptation
tional structure of the D825 has been shown to be a suitable basis on a greater time scale. Since all D825 modules are functionally
for the data processing facility for command and control. Although decoupled, modules of any types may be added to the system
the investigation leading toward this structure proceeded as an simply by plugging into the switching interlock or the 1/0 ex-
attack upon a number of diverse problems, it has become evident
change. Expansion in all functional areas may be pursued far
that the requirements peculiar to this area of application are, in beyond that possible with conventional systems.
effect, aspects of a single characteristic, which might be called It is clear, however, that the D825 system would have fallen
structural freedom. Furthermore, it is now clear that the most far short of the goals set for it if only the hardware had been
unique characteristic of the structure realized-integrated opera- considered. The AOSP is as much a part of the D825 system
tion of freely intercommunicating, totally modular elements- structure as is the actual hardware. The concept of a “floating”
provides the means for achieving structural freedom. AOSP as the force that molds the constituent modules of an
For example, one requirement is that some specified minimum equipment complement into a system is an important notion
of data processing capability be always available, or that, under having an effect beyond the implementation of the D825. One
any conditions of system degradation due to failure or mainte- interesting by-product of the design effort for the D825 has, in
nance, the equipment remaining on line be sufficient to perform fact, been a change of perspective; it has become abundantly clear
primary system functions. In the D825, module failure results in that computers do not rim programs, but that programs control
a reduction of the on-line equipment configuration but permits computers.
normal operation to continue, perhaps at a reduced rate. The
individual modules are designed to be highly reliable and main-
tainable, but system availability is not derived solely from this
source, as is necessarily the case with more conventional systems.
The modular configuration permits operation, in effect, with active References
spares, eliminating the need for total redundancy. AndeJ62; KrogM61; LeinA,57; PortREiO; ThomRH3
Chapter 37
M . Lehman
Summay After an introduction which discusses the significanceof a trend normal circumstances, with all units operational, each could be
to the des@ of parallel processing systems, the paper describes some of assigned a specific activity within an overall control program. As
the results obtained to date in a project which aims to develop and evaluate a result of the multiplicity of units in such Multiprocessing Systems,
a unified hardware-software parallel processing computing system and the failure of any one would degrade, but not immobilize, the system,
techniques for its use.
since a supervisor program could re-assign activities and configure
the failed unit out of the system. Subsequently, it was recognized
1. Multiprogramming, multiprocessing, that such systems had advantages over a single processor system
and parallel processing in a more general environment, with each processor in the system
having a multiprogramming capability as well.
A brief review of the literature, of which a partial listing is given
Finally, following from ideas first exploited in the Gamma 60
in the bibliography, reveals an active and growing interest in
Computer [Dreyfus, 19581, there has come the realization that
multiprogramming, multiprocessing, and parallel processing.
multi-instruction counter systems can speed up computation, par-
These three terms distinguish three modes of usage and also serve
ticularly of large problems, when these may be partitioned into
to indicate a certain historical development. We cannot here
sections which are substantially independent of one another, and
attempt to trace this history in detail and so must rely on the
which may therefore be executed concurrently-that is, in parallel.
bibliography to credit the contributions from industrial, university,
When the several units of a multiprocessing system are utilized
and other research and development organizations.
to process, in parallel, independent sections of a job, we exploit
the^ emergence of autonomous input-output devices first sug-
the macro-parallelism [Lehman, 19651 of the job, which is to be
gested [Gill, 19581 the time-sharing of the processing and periph-
distinguished from micro-parallelism [Lehman, 19651, the relative
eral units of a computing system among several jobs. Thus surplus
independence of individual machine instructions, exploited in
capability that could not be applied to the processing of the
look-ahead machines. This mode of operation is termed Purullel
leading job in a batch processing load, at any stage of the compu-
Processing and, as in PL/I [IBM OS/360, PL/I Language Specifica-
tation, could be usefully applied to successor jobs in the work load.
tion, Form C28-6571, p. 741, the execution of any program string
In particular, while any computation was held up for some 1/0
is termed a Tusk. We note that parallel processing may, and
activity, the single main processor could be used for other compu-
normally will, include multiprocessing activity.
tation. The necessary decision-taking, scheduling, and allocation
procedures were vested in a supervisor program, within which the
user-jobs were embedded, and the resultant mode of operation was
2. The approach to parallel processing system design
termed Multiprogrumming.
The use of computers in on-line control situations and for other In the previous section we indicated that the prime impetus for
applications giving rise to ever-more stringent reliability and the development of parallel processing systems arose from their
availability specifications, resulted in the construction of systems potential for high performance and reliability. These systems may
including two or more central processing units [Leiner et al., 1959; operate as pools of resources organized in symmetrical classes and
Bright, 1964; Desmonde, 1964; McCullough et al., 19651. Under it is this property that promises High Auuilubility. They also
possess a great reserve of power which, when applied to a single
'Proc. IEEE, vol. 54, no. 12, pp. 1889-1901, December, 1966 problem with the appropriate degree of parallelism, can yield high
456
Chapter 37 I A survey of problems and preliminary results concerning parallel processing and parallel processors 457
performance and fast turn around time. Surplus resources can be analysis and usage techniques, through executive strategies and
applied to other jobs, so that the system is potentially efficient, operating systems, to logic design and technology. We therefore
displaying a peak-load averaging effect and hence high utilization present concepts and results from each of these areas, as obtained
of hardware [Corbato and Vyssotsky, 19651.The concept of sharing during our preliminary investigation into the design and use of
in parallel processing systems and its related cost reduction is not, parallel processing systems.
however, limited to hardware. Perhaps even more significant is
the common use of data-sets maintained in a system library or
file, and even concurrent access during execution from a high-
3. Language
speed store. This may represent considerable economy in storage
space and in processing time for 1/0 and internal memory. 3.1 Parallelism in high level languages
hierarchy transfers. But above all [Corbato and Vyssotsky, 19651
The analysis of high level language requirements for parallel
it facilitates the sharing of ideas, experience, and results and a
processing has received considerable attention in the literature.
cross fertilization among users, a prospect which from a long term
We may refer in particular to the paper by Conway [1963] which
point of view represents perhaps the most significant potential of
discussed the concepts of Fork, Join, and Quit, and the recent
large, library-oriented, multiprocessing systems. Finally, in this
review by Dennis and Van Horn [1966].
brief summary of the basic advantages of parallel processing
Recognizing that programming languages should possess capa-
systems, we refer to their intrinsic modularity, which may yield
bilities that express the structure of the computational algorithm,
an expandable system in which the only effect of expansion on
Schlaeppi [ 19??] has proposed augmentations to PL/I-like lan-
the user is improved performance.
guages that portray the macro-parallelism in numerical algorithms.
Adequate performance of parallel processing systems is, how-
These in turn have been reflected in proposals for machine-
ever, predicated on an appropriately low level of overhead. Allo-
language implementation. As examples we discuss Split, Terminate,
cation, scheduling, and supervisory' strategies, in particular, must
Assemble, Test and Set or Wait (interlock), Resume, Store-Test and
be simplified and the related procedures minimized to comprise
Branch, and External Execute instructions. We describe here only
a small proportion of the total activity in the system. The system
the basic functional elements, from which machine instructions
design must be based on performance objectives that permit a user
for actual realization will be composed as suggested by practical
to specify a time period and a tolerance within which he requires
programming experience.
and expects to receive results, and the cost for which these will
be obtained. In general the entire system must yield minimum
throughput time for the large job, adequate response time to the 3.2 Machine level instructions for tasking
terminal requests in conversational mode, guaranteed throughput Split provides the basic task-generating capability. It indicates that
time for real-time tasks, and minimum cost processing for the in addition to continuing the execution of the present instruction
batch-processed small job. These needs require the development string in normal fashion a new task, or set of tasks, may be initi-
of an executive and supervisory system integrated with the hard- ated, execution starting at a specified address or set of addresses.
ware into a single, unified computing system. Finally, the tech- Such potential tasks will be queued to await pick-up by an appro-
niques and algorithms of classical computation, of problem analy- priate processing unit.
sis, and of programming, must be modified and new, intrinsically Terminate causes cessation of activity on a task. The terminat-
parallel procedures developed if full advantage is to be gained ing unit will, of its own volition, access an appropriate queue to
from exploitation of these parallel systems. obtain its next task. Alternatively, it may execute an executive
Our studies to date represent but a small fraction of the ground allocation-task to determine which of a number of task-queues is
that will have to be covered if effective parallel processing systems to be accessed next according to the current urgency status of work
are to come into their own. It is, however, abundantly clear that in the system.
such systems will yield their potential only if the design is ap- Assemble permits the merging of several tasks. The first ( n - 1)
proached on a broad but unified front ranging from problem tasks in an n-way parallel set belonging to a single job, reaching
' We differentiate intuitively between executive and supervisory activities. the assemble instruction terminate. The nth task, however, will
The former are those whose costs should be chargeable to the individual proceed to execute the program string which constitutes the
user directly, whereas the latter are absorbed in the system running costs. continuation of all n tasks.
458 Part 5 1 The PMS level Section 3 I Computers for multiprocessing and parallel processing
Test and Set or Wait provides an interlock facility. Thus a are linked by a special bus. This provides facilities whereby any
number of tasks all operating on a common data set may be one unit may, at a given time, act as a command or signal source
required to filter through certain sections of program or data, one with all other units potential recipients. By thus systemizing
at a time. This may be achieved by an instruction related to the inter-unit communication and making it a concurrent activity, we
S/360 test and set instruction [Falkoff et al., 19641, but causing both increase system utilization and remove a maze of intercon-
the task finding the specified location to be already set to go into necting cables. Succeeding subsections describe some of the func-
a wait state. System efficiency requires that processors do not idle, tions that the controllers fulfill and, briefly, one hardware proposal
so that the waiting task will generally be returned to queue and for their realization.
the processor released for other work.
He.wme directs a processor or processors waiting as a result 4.2 Interaction activities
of a test on a specified location, to proceed, or more generally, In present-day systems there already exist activities of the type
that specified waiting tasks that have been returned to queue be to be classified as interaction. Thus, for example, in System/S6O
re-activated to await the spontaneous availability of an appropri- we find a CPU to Channel Halt I / O facility, channel interruptions
ate processor. of processors, and timer interruptions. In extending the concept
Test and Branch Storage Location permits communication be- we differentiate among three classes of interaction.
tween parallel tasks based on tests analogous to the register tests
of uniprocessors, but associated with the contents of storage loca- PROBLEM INTERACTION. These relate to logical dependencies
tions. This is desirable since processor registers are private to the between tasks, and will generally require waits, forced branches,
processor and inaccessible from outside. or terminations. Search termination, previously discussed, is an
External Execute is a special case of the general interaction example of this type interaction, as are data and instruction-
facility discussed in Section 4 that permits related tasks to influ- sequence interlocks.
ence one another. This can be achieved through the application
EXECUTIVE INTERACTION. This activity is concerned primarily
of instructions already discussed. It is, however, more efficient to
with the allocation of system resources. Consider, for example, the
provide a new facility akin to the Interrupt concept. By applying
problem of processing interrupts in a parallel processing system.
this Interaction function, a task may cause other specified tasks
These will usually not need to interrupt a computing activity, but
to execute an instruction at a specified location, each on comple-
may await the spontaneous availability of a unit at a Terminate,
tion of its present instruction. Thus, for example, a number of
a natural lx-eakp0int.l If an interrupt does become critical it should
processors searching for a particular item in a partitioned list can
not be applied to a specific physical unit. Instead the interruption
be caused to abandon the search when the item has been located
should be steered to that unit which, by virtue of the work it is
by one, while processors searching for other items, or otherwise
processing, may be classed as Most Interruptable. Selection of the
busy, will not be redirected.
latter may be obtained ahead of time and is maintained by the
interaction system, on the basis of the relative urgency of tasks.
Another example of executive interaction concerns the constant
4. Interaction
provision of queue status information to all active units. Besides
4.1 The interaction concept simplifying scheduling activity this may prevent units from access-
ing empty queues, reducing both storage and executive interfer-
An extension of the task interaction concept introduced in the
ence. Similarly, units can be caused to access a previously empty
preceding section is fundamental to efficient parallel processing.
queue when an entry is made, obviating continuous testing of
In the particular example cited, the interaction, in the form of
queue status.
an external execute instruction, forms part of the computational
procedure. In fact, many other situations arise in which processing 'This is possible in a parallel processing system since tasks are smaller than
for inter-task communication may be detached from problem jobs and since there are many processors. Furthermore, units operate
processing and be carried through concurrently in autonomous anonymously.That is, on picking up a task, a unit records the task identity
in an internal register and its own identity in a table associated with the
units, thereby increasing system utilization. work queue. Other processors do not, therefore, know how tasks and
We therefore propose to associate with each active unit in the processors are matched at any time, since this is a matter of chance, and
system an autonomous Interaction Controller. Groups of controllers determination would require an extensive and wasteful table search.
Chapter 37 I A survey of problems and preliminary results concerning parallel processing and parallel processors 459
The interaction system also supports other activities associated a processor instruction) and the number of interaction functions
with accounting, recording, and general system supervision. it is required to implement.
Controller connection to the ten-bit wide interaction bus is by
SYSTEM INTERACTION. System interaction provides controls and means of OR gates. When an interaction is occurring, one and
interlocks for operation and maintenance of the physical system. only one controller will be in command of the bus. Figure 3
It includes, for example, interchange of information between illustrates the sequence of events required to implement an inter-
active units about the validity of storage map entries, storage action.
protection control, queue interlocks, checks and counts of unit The controller required by its associated processor to initiate
availability, the initiation of routine and emergency diagnostic and an activity will await availability of the bus, indicated by an ALL
maintenance activity, and the isolation of malfunctioning units. ZERO state, and will then attempt to seize control by transmitting
a unique identifying four-out-of-eight code. Should more than one
controller attempt to seize the bus at the same time, a conflict
SUMMARY. The preceding paragraphs have indicated some of the
resolution procedure is initiated. This is based on the simultaneous
many applications of an interaction controller. The common
transmission by all requesting controllers of a second, two byte,
property which, for practicality, has been used to identify poten-
identifying code. Each byte consists of one or more ones followed
tial interaction activities is that they should be autonomous rela-
by all zeros. A simple comparison by each controller of its trans-
tive to the main computational stream and that their execution
should not require access to storage.
--
Processor or channel
interface
- I
-
Status bits
I
a
i
7
Interaction required
Bus free ?
Seize bus
Job ident
7 - v
&L
Conflict?
.. ~~ ~ _ _ ~ ~ _ ~ ~
+ Task ident Seizure code
~ ~ ~ ~ ~ ~ - _ _ _
t+- Registers
E r n i t order or question
~~ ~
- .-I
, F Fig. 3. The interaction sequence.
explicit directive. Status bits that may be set or reset by appropri- erable amount of hardware, it is still an order of magnitude less
ate directives, provide data on the status of various systems queues, than the hardware found in the units that the switch is intercon-
on the interruptability of given processors, on Wait status, and necting. Moreover, its regular structure and simple, repetitive
so on. logic suggest ultimate economical realization using monolithic
circuit techniques.
5. Storage communication
6. Usage
The fact that interest in large parallel processing systems is in-
creasing rapidly as technology enters into the integrated or mono- 6.1 The executive system
lithic era is no coincidence. Such systems will not, in fact, be
The basic properties outlined in Sec. 2 give parallel processing
practical for general purpose application until miniaturization
reaches the stage where the large amount of hardware required systems the potential to overcome many of the ills and shortcom-
can be assembled in compact fashion. This need is most apparent ings that presently beset computer systems. For maximum effec-
tiveness, the system must be library- or file-oriented. It can, how-
when one considers communication between the high-speed store
ever, be exploited efficiently only if the overhead resulting from
and the various classes of processors, which may collectively be
executive control and supervisory activity does not strangle the
termed Requestors. Already in presently available systems, the
system. More particularly, the gains from the sharing of resources
transmission delay between storage and requestors is of the same
and any peak averaging effect must exceed any additional over-
order of magnitude as the storage cycle time; and cycle times are
head due to resource allocation procedures, conflict resolutions,
still decreasing.
and other processing activity arising from the concurrent operation
Formulation of a hardware model as in Fig. 1led to the imme-
of many units. Thus a unified and integrated design approach is
diate conclusion that feasibility of the interconnection of large
required in which software and hardware, operating system and
numbers of units had first to be established. Many possible systems
were considered, and preliminary studies concluded that the processing units, lose their separate identities and merge into one
crossbar switch was the most appropriate system for early study
in view of its regular structure, simplicity, and basic modularity.
More particularly, monolithic crossbar modules are visualized
which it will be possible to interconnect to provide networks of
11
-:I
any required dimensions. Alternatively, or additionally, other End of
interconnections of these modules can provide highly available, storage Storage
cycle, select,
multi-level trunking systems.
In addition to the switch proper, the crossbar network requires
a selection and control mechanism. It is moreover appropriate to
SConneri ' To other reqL
switching sec
tors and other
)n inputs
.ccept signal,I
locate the queues, which store all but one of a group of conflicting
requests, within the switching area. A switch complex, as in Fig.
4,has been designed for a system configuration including twenty-
Request
signal,,
From other decoders
Decoder,
To other scanners
Crossbor
switch i
four requestors, thirty-two memory modules, thirty-two data plus Decision
section
four parity bit words, and sixteen plus two parity bit addresses.
The result of this design study shows that the size and com-
plexity of such a switch is not excessive for a large scale system.
In its simplest form and using standard high-performance logical I 181
devices, with a fan-in of four, a fan-out of ten and a four-way OR
capability, its use leads to a worst case delay of some seven logical
levels in the control and queue decision circuits and two levels
signal,
in each direction of the switch. The switch uses between two and
three times as many circuits as a central processor such as the
model 75 of System/36O. While this, in itself, represents a consid- Fig. 4. The centralized crossbar switch.
462 Part 5 I The PMS level Section 3 I Computers for multiprocessing and parallel processing
overall complex, for which allocation and scheduling procedures, CALL I PROCEDURE
for example, are as basic and as critical as arithmetic operations. C = C + l
Equally significant to the successful exploitation of parallel
<
IF C N THEN GO TO IN If all n I-tasks completed,
proceed with J
processing potential are the problems of data management, man-
machine interactions; and, most generally, problem preparation CALL J PROCEDURE
FIN CALL K PROCEDURE
and usage of the system. We restrict the present discussion to brief
comments on programming techniques for task generation and on Execution of split and terminate instructions involves executive
the development of algorithms possessing macro-parallelism. In overheads, so that these instructions should not be used indiscrim-
particular we indicate that multi-instruction-counter systems can inately. Within a system in which a maximum of p processors are
be profitably applied to the solution of the large problems whose available to a job, it is pointless to partition a job, at any one time,
computing requirements tax the speed capability and storage of into more than p tasks. It is, however, undesirable to guarantee
the largest computer and the patience of their users. In the fol- a user that p processors, or even more than one processor, will
lowing section we evaluate these proposals by quoting some per- execute his program. A simple task generation scheme that makes
formance measurements obtained from an executing simulator. as many entries in the task queue as there are potentially concur-
rent parts of the algorithm (for example, from a loop containing
6.2 Programmed task generation
a split instruction) is inefficient when that number is much larger
Study of the usage of parallel processing systems for the rapid than the number of processors that happen to be available. The
solution of large real-time problems involves two aspects. On the technique also leads to very large queues. An alternative, termed
one hand we must consider the development of algorithms dis- Onion Peeling by us, puts the instruction sequence containing the
playing an appropriate form of macro-parallelism. On the other split at the head of procedure I and ends each execution of the
hand programming techniques must be developed for efficient procedure with a terminate. This restricts the queue length for
exploitation in terms of both problem- and machine-oriented this job segment to one but it otherwise is as inefficient as the
instructions, such as those discussed in Sec. 4. previous method.
It is appropriate to discuss programmed task generation first.
For simplicity we consider a job segment that requires n executions A Modilfied Onion Peeling scheme (MOP) restricts the split and
of a procedure I. The procedure will itself include modification terminate overhead to at most one morel than the number of
of index registers or other changes that distinguish the individual processors actually applied to the segment. It also ensures that
tasks. We assume that on completion of all n tasks, a new proce- processing is completed as quickly and as efficiently as possible
dure J should be initiated. Moreover, should processing power be with the number of processors that become available to the job
available at a time when n executions of I have been initiated but segment. Thus if during execution no further processors are freed,
not all n completed, we assume that an independent procedure the n tasks are executed sequentially with only one split and no
K , belonging to the same job, may be initiated. In the simplest terminate. If, on the other hand, some other number of processors
case K will be a terminate instruction which releases the processor, is used for execution, the procedure is speeded up accordingly.
and makes it available to process other work as determined from The maximum number p of processors that may be applied to the
the work-queue complex. job may be limited by the number of processors in the system and
AZO available, or by executive edict.
B = O The basic scheme was illustrated by the above program, in
c=o which the first expressions following the ZEROing of counters
ST IF N - B 5 1 THEN GO TO IN Suppress split if nth task
ensures that no unnecessary splits are queued.
being initiated
A = A + l
IF A 2 P THEN GO TO IN Split if less than p proces- 'This is not quite accurate. The simple MOP algorithm presented here
sors allocated does not explicitly interlock the split seqnence. There is therefore a possi-
SPLIT TO ST bility that unnecessary task-calls may be queued during the execution of
B=B+1 the split which is to generate the nth task. The probability of this is,
IN >
IF B N THEN GO TO FIN If all n I-tasks started, however, small, while the degradation arising from an interlock could be
proceed with K significant, and the algorithm in the form given appears more economical.
Chapter 37 1 A survey of problems and preliminary results concerning parallel processing and parallel processors 463
structure, though it is clear that in any realization interleaving sizes of matrices were used to isolate the effect of commensurate
will be partial, both to sustain high availability and to decrease periodicities of array mapping with the address structure of the
storage interference between independent jobs. The individual store, which demonstratively had significant influence on the
processors have a System/36O-like structure [Blaauw and Brooks, results.
19641 and execute an augmented subset of S/36O machine lan- Instruction execution times for the most frequently executed
guage. The nonstandard instructions added to the repertoire in- instructions used in the experiment are given in Table 2.
clude the functions discussed in Section 4. The local store LSi, These times exclude the instruction fetch time (one instruction
to be used also as an instruction buffer, is however not included for each fetch), since these are overlapped unless storage conflict
in the model for which the interference results are quoted in the occurs, when a request must be queued. The arithmetic operations
next section. The simulator configuration is parameterized so that, may also include a data fetch (RX instructions) in which case a
for example, the numbers of storage modules and processors, further store access time is required.
instruction execution times (in storage cycles), and the nature of In the absence of an internal instruction buffer, processors
statistics gathered and printed may be selected for each run. The executing the same program string interfere with each other
program itself is modular, and both system features and measure- continuously during instruction fetches. To minimize this effect
ment facilities may be expanded or modified as required. for loops that are short relative to the width of the interleaving,
it is profitable to unwind such loops by repetition so that the
7.3 Simulator experiments resultant string stretches as far as possible across the interleaved
7.3.1 Kernels. Simulation experiments fist concentrated on an store. The program was unwound in this way. We note, however,
investigation of storage interference arising in the execution of that it is in fact better [Rosenfeld, 19651 to repeat the loop,
typical kernels from numerical analysis. The results indicated that appropriately modified, several times across the interleaved store,
under the limited condition of the experiments and for a storage directing successive processors to successive, hut unconnected,
module-to-processor ratio of two, interference would degrade loops. This can decrease interference by as much as twenty percent
performance by less than twenty percent, dropping to some five over the previous case.
percent for storage module-to-processor ratio of eight. Addition Some results of the simulation are given in Table 3 and plotted
of a local processor store and its use as an instruction buffer in Figs. 5 and 6.
effectively eliminated interference, as expected, indicating that We note that running time (col. 4) is defined as the interval
it had been substantially due to instruction-fetch interference. between the start of the first processor on its first task and the
These results were considered to have been generated under completion, by the last processor to finish, of its final task. Since
conditions too restrictive to permit generalization. In particular an onion peel technique has been used for the splitting, there is
each set referred only to concurrent executions of a single loop. an interval (of order 70 storage cycles) between the start of suc-
Thus more recent experiments have included many runs of a cessive tasks. There is also an initial interval (87 memory cycles)
matrix-multiply subroutine and the solution of an electrical net- in which the first processor initializes the program. Finally, the
work problem using an appropriately modified version of the finish of processors is staggered and, in particular, for the sixteen-
Jacobi variant of the Gauss-Seidel solution of a set of linear alge- processor case, eight processors are assigned two tasks (rows) in
braic equations. succession, and eight, three tasks. The former processors will, of
1 2 3 4 5 6 7 8 9 10 11
lar mode of partitioning is not optimum if the shortest execution N,, =64
time is to be obtained. From a system efficiency point of view,
however, and in actual operation with other jobs and tasks in the
system, it is of no consequence since processor idling does not
actually occur. New tasks, perhaps arising from quite different jobs, 420KI
E \
'""L
are initiated, according to some scheduling strategy, whenever a
30K
processor becomes spontaneously available.
Total delay due t o
10K 5 storage interference
Number of processors
Parallel processor
m
"3
-
400K-
8 progrom
Uniprocessor
program Fig. 6. Total processor time and interference in matrix multiply modules.
5 300K- ( 4 0 x 4 0 ) " ( 4 0 1 40)
mm N,, = 6 4 In addition to run time, we define a total processor time (col.
g 200K-
5).This represents the sum total of time that individual processors
-
rn
were active in the program and is therefore a reflection of total
100K-
16 tosks processor running cost. Storage interference (cols. 6, 7 ) measures
x
\: 40tasks
the total time that processors were inactive due to attempts to
initiate simultaneous accesses to the same storage module. It
occurs also when only a single processor is applied, when it repre-
sents a conflict between a data fetch and an attempt by the overlap
Fig. 5. Execution time for matrix multiply.
circuit to initiate an instruction fetch from the same module.
466 Part 5 I The PMS level Section 3 1 Computers for multiprocessing and parallel processing
to get program and data into the high-speed store and to output
results. We include utilization figures for these executions in Table
STORAGE
KILOCYCLES II"" 3, to aid in analysis of the system behavior but not for evaluation
INNER
16 STORAGE MODULES
LOOP SIZE
/ /
I I
purposes.
$400
--2
---A---
EOUnTlONS
3 EOUATIONS
4 EQUATIONS
-.-)(-.-5 EOUATIONS
I d : represents the solution of a set of simultaneous linear equations,
described by a sparse coefficient matrix. The technique used for
2
350
300 I
l I, ' i its solution on the executing simulator essentially comprises a
relaxation procedure. Extensive runs have been made using a
specific thirty-six node network, yielding twenty-six equations with
up to four terms in each equation.
From the wealth of results obtained we present representative
sets that indicate some general trends related to the characteristics
and performance of the parallel processing system. Available space
will not permit, however, detailed analysis in the present paper,
nor does it permit a discussion of the equally interesting results
obtained concerning speed of convergence, in particular, and other
40
30
STORAGE
20 KILOCYCLES
600
32 STORAGE MODULES
550
550L
64 STORAGE MODULES
figures for the case of a five-equation inner loop. Table 4 lists these
f 500
INNER LOOP SIZE same results as a percentage of the time using one processor and
f 2 EWATIONS
----A---- 3 EQUATIONS compares them with the reciprocal of the number of processors.
-..+..- 4 EQUATIONS
5 EOUATIONS Figure 11indicates storage interference and parallel processing
overheads as a function of the number of processors, with storage
modularity again a parameter and an inner loop again comprising
n. 250
8 200
100 STORAGE
KILOCYCLES
5 EQUATIONS
260 IN A LOOP
-
w
-+ 240--
a 220-
9
0
200-
- -0- 16 STORAGE MODULES
32 STORAGE MODULES
I 2 3 4 5 6 7 8 9 1011 1213141516 180- -+A--
64 STORAGE MODULES
n -
NUMBER OF PROCESSORS
-I 160-
$ -
Fig. 9. Total processor and throughput times i n electrical network
e 140-
-
analysis-@ storage modules. 120 -
-
100 -
effects which must be understood within the framework of a
numerical analysis of the relaxation solutions.
Figures 7 , 8 ,and 9 present the basic performance data, through-
put time, and total processor time, for a total of one hundred and
forty-four cases. The variables are the number of processors in the
system (12 cases), the size of the inner loop as represented by the $
I- 601
number of currents (from 2 to 5 ) evaluated in the loop, and the
number of interleaved storage modules (16, 32, 64).
These curves clearly indicate the reduction in throughput time
to be obtained from the use of parallel processing, the consequent
501
40
Table 4 Run time for resistor network system relative to the run time
using one processor, with a five equation inner loop
Number of
processors
2
4
6
7
8
9
1
16 Storage
modules
100%
52.8
29.5
22.4
20.9
19.2
17.8
Relatiae time
32 Storage
modules
100%
51.2
27.9
20.3
17.9
16.8
15.2
64 Storage
modules
100%
51.2
27.1
19.5
17.1
15.8
14.2
100
No. of processors
100%
50.0
25.0
16.7
14.3
12.5
11.1
3
20
IO
0
10
11
12
17.6
16.8
17.5
14.5
13.9
13.9
13.7
12.9
13.0
10.0
9.1
8.3
F
14 17.3 13.2 11.7 7.2
16 17.7 13.7 11.7 6.3
I2345678910111213141516
the evaluation of five currents. Storage interference has previously NUMBER OF PROCESSORS
been defined. The parallel processing overhead represents as a
percentage the excess of total number of storage cycles required
Fig. 12. Storage utilization and cost /performance factors.
for execution, excluding storage interference cycles, when more
than one processor is used, relative to the number of cycles re-
quired by a one-processor execution. Actual counts during execution show that in general some
sixty-seven percent of store access are instruction fetches in this
program and some thirty-three percent are data fetches. Thus
% incorporation of a substantial instruction buffer in each processor
clearly reduces all interference by an order of magnitude, since
of the four ways in which a storage interference can occur, only
one-a data fetch conflicting with a data fetch-remains in the
inner loop. Moreover, these measurements refer to a processor in
which arithmetic speeds, as in Table 2, are of the order of magni-
tude of a memory cycle time, which implies a somewhat powerful
processor. Thus in every sense the interference figures are worst
case results which, with the performance curves to which they
relate, support the view that storage interference is not a serious
obstacle to parallel processing.
The four contours drawn on these curves represent lines of
constant storage module-to-processor ratio. They slope slightly
upward due to the statistical Marbles and Boxes [Rosenfeld, 19651
effect previously referred to.
Figure 12 presents two sets of data, based on the five-equation
line loop. The upper family of curves relates to storage utilization.
" T 2345678910111213141516 The reservations made at the end of Sec. 7.3.2,with reference to
NUMBER OF PROCESSORS the significance of utilization figures, also apply. The second family
of curves represents a first attempt at estimating the relative
Fig. 11. Storage and executive interference. quality of processing, that is, some function of a cost/performance
Chapter 37 1 A survey of problems and preliminary results concerning parallel processing and parallel processors 469
factor. Such a factor is intuitive and environment-sensitive, de- Any ultimate evaluation of a parallel processing system within
pending on the relative concern for speed and for costs of various a working environment depends on actual operating experience.
sorts. For the present data we have chosen to display a function: This in turn requires the existence of a system and the interest
’ K
= throughput time x total processor time
of users. Only when usable systems become available will the
concept of parallel processing in integrated systems be accurately
evaluated.
where K is a constant, throughput time a measure of the speed
of computation, and total processor time a measure of the cost. References
BlaaGM; BrigH64; ConwM63; CorbF65; DennJ66; DesmW64; DreyP58;
8. Conclusion FalkA64; GillS58; GregJ63; KatzJ66; LehmM65; LeinA5Q; McCuJ65;
MiraW67; NievJ64; RoseJ65; SchlII??; ShedGGBa, b; SlotD62; SmitR64;
I11 this paper we have presented some thoughts on parallel process- PL/I Language Specification, FormC28-6571
ing. In particular we have chosen to survey the topic by including
an extensive bibliography and some of the results of our work in Bibliography
this area. The discussion has had to be brief, but our intention
,411eM63; AmdaC62; AndeJ63, 6.5; .%rdeB66;BaldF62; BIaa664; Brig€Ifi4:
has been to convey the picture of the potential that parallel
BuchW62; BussB63; CoddE62; ComfW65; ConwM63; CorbF62, 65;
processing systems offer for the future development of computing.
CritA6:); DaleRB5; DennJ65, 66; DesmW64: DijkE65; DreyPJX; ErnsH63;
The key to successful exploitation lies in a new, unified, and
EstrGW, 63; EwinR64; FalkA64; ForgJ65; FranJ57; GillS58; 61asE65;
scientific approach to the entire problem of the design and usage GregJ63; HellH61, 66; KatzJ66; KinsH64; KnutD66; LehmM6Xa, 6311, 65;
of computing systems. The development of large, integrated sys- LeinA59; LourN59; MarcM63; McCaJ62; McCnJ65; MeadR63; MillW63;
tems raises many problems, but there can be no doubt that eco- MiraW67; NievJ64;OssaJ65;PennJ62; RoseJ65; SchlH??;SeehRB3; SenzDB5;
nomic solutions to these will be found. Their development should ShedC66a, 6611; SlotD62; SmitR64; SquiJ63; StraC59; VyssV65; WirtN66;
comprise a significant part of the computer system architectural IBM OS/.360 PL/ZLanguage Specijication, Form C 28-6571; Proc. IFIP1062.
design effort of the next few years. “Symposium on Multi-Programming” 1963.
Section 4
470
Section 4 1 Network computers and computer networks 471
-
Mp-Pc' K-Ms
1 ! [I6
drum; 0 17 rns;
~ s / w . 8192 w 1
Mp- Pi0 .
K-T(I . ines, cards, paper tape)-
.. ..
K-T ( ' Mas t e r Con so I e ) -
.-- 7
M 'Peripheral
1
B u f f e r : drum:
w
S-
]
M 'Display
R u f f e r : drum;
[8l9? w
than any other computer. Ten smaller C's control the main
Pc and allow it to spend time on useful (billable) work rather
than its own administration. The independent multiple data
operators in the 6600 increase the speed by at least 2y2 times
over a 6400 which has a shared D. Finally, it realizes the 10 C's
in a unique, interesting, and efficient manner. Not many com-
puter systems can claim half as many innovations. Fig. 2. CDC 6600 PMS diagram (simplified).
472 Part 5 I The PMS level Section 4 I Network computers and computer networks
why we consider the 6600 t o be fundamentally a network. Each write accesses to store results. We would agree that this is a
Cio (actually a general-purpose, 12-bit C) can easily serve the valid assumption for scientific programs (e.g., look at a FOR-
specialized Pi0 function for Cc. The Mp of Cc is an Ms for a Cio, TRAN arithmetic statement), and it is probably valid for most
of course. By having a powerful Cio, more complex input-output other programs as well.
tasks can be handled without Cc intervention. These tasks can Cc has provisions for multiprogramming in the form of a
include data-type conversion, error recovery, etc. The K’s which protection and relocation address. The mapping is given in the
are connected to a Cio can also be less complex. Figure 2 has ISP description for both Mp and Ms(’Extended Core Storage-
about the same information as Thorton’s Fig. 1 block diagram / ECS).
(Chap. 39). Appendix 2, Chap. 39, has an ISP description of the PCP.
A detailed PMS diagram for the C(’6400, ‘6416, ‘6500, and Appendix 2 includes a figure which shows the instruction de-
‘6600) is given in Fig. 3. The interesting structural aspects can coding and execution as well. The 6600 PCP is about the same
be seen from this diagram. The four configurations, 6400 - as the early CDC 160. The PCP has an 18-bit A register because
6600, are included just by considering the pertinent parts of it has to process addresses for the large Cc.
the structure. That is, a 6416 has no large Pc; a 6400 has a sin- One interesting aspect of the 6600 which we question is the
gle straightforward Pc; a 6500 has two Pc’s; and the 6600 has lack of communication among all components at the ISP (pro-
a single powerful Pc. The 6600 Pc has 10 D’s, so that several gramming) level. When Pc stops, it has no way of explicitly
parts of a single instruction stream can be interpreted in paral- informing any other components. There are no interprocessor
lel. A 6600 Pc also has considerable M.buffer to hold instruc- interrupts. An io device cannot interrupt a Pio, nor can Pio’s
tions so that Pc need not wait for Mp fetches. communicate with one another except by polling. The state
The implementation of the 10 Cio’s can be seen from the switching for Pc is, however, elegant, since a Pi0 can request
PMS diagram (Fig. 3). Here, only one physical processor is used Pc to stop a job, store Mps, and resume a new task in one
on a time-shared basis. Each 0.1 ps a new logical P is processed instruction. (The t.save +
t.restore 2 ps.)-
by the physical P. The 10 Mp’s are phased so that a new access
occurs each 0.1 ps. The 10 Mp’s are always busy. Thus the i.rate The operating s y s t e m
is 10 x 12 b/ps or 120 megabits/s. This process of shifting
The Cio’s functions are data transmission between a peripheral
a new Pc state into position each 0.1 ps has been likened to
device and the large Cc via the Cio’s Mp with some data trans-
a barrel by CDC. A diagram of the process is shown in Fig. 4.
formation or conversions: complete task management, includ-
The T’s, K’s, and M’s are not given, although it should be
ing initiation, termination, and error handling; and manage-
mentioned that the following units are rather unique: a K for
ment of Pc. The Cio’s perform in about the same manner as
the management of 64 telegraph lines to be connected to a
the C(’Attached Support Processor) in the N(’360 ASP) (Chap.
Cio; an Ms(disk) with four simultaneous access ports, each at
40, page 506). The operating-system software is managed by
1.68 megacharls data transfer rate, and a capacity of 168
a single fixed Cio. The remaining nine Cio’s are free, and as
megachar; an Ms(magnetic tape) with a K( # 1:4) and S to allow
io tasks arise in the system, the Cio’s assign themselves to
simultaneous transfers to 4 Ms; the T (display) for monitoring
particular tasks, carry out the tasks, and then free themselves
the system’s operation; K’s to other C’s and Ms’s; and con-
t o take on other tasks. The operating-system software resides
ventional T(card reader, punch, line printer, etc.).
in Mp(Pc) (that is, Cc) accessible to all Cio’s and includes:
ISP
1 The variables which determine the state of a particular
The ISP description of the Pc is given in Appendix 1, Chap. 39. job, e.g., data pointers to Ms(disk, ‘ECS), running time,
The Pc has a very clean, straightforward scientific-calculation- a list of jobs to do, etc.
oriented ISP. We can consider it a variation on the general-
2 Programs for the Cio’s
register structure because the Pc state has three sets of general a Parts of the operating system used by the Cio re-
registers. Their use is explained both in Chap. 39 and its Ap- sponsible for the system management
pendix 1. This structure assumes that a program consists of b IO management programs (or programs to get the
several read accesses to a large array(s), a large number of task management program from Ms) which the Cio’s
operations on these accessed elements, followed by occasional use
Section 4 1 Network computers and computer networks 473
Mp(#O:Y)'- S"-Pc3
I (bO:9)-Stm-S #1.12 K-L(l
T('Dead Start Console)-
vs/w; I2 b/w)-
i
v 4 1 [fixedjrK-STT(bl :2; CRT; display)-
ll LT(keyboard) -
!\
'Read Pyramid; buffer:
12 b/w: M(workinq:
(1+2+3+4+5): 12 b/w):
.2 p / w )
7 7
Mp4 (#0:31)-S6- CB
1 L(#2,3,4: to:'Extended Core Coupler)
J
c9
D('Divide: 2.9 ~ s )
I 2 3 4 5 6 7 IO II 12
ttl
CENTRAL
MEMORY
(60)
0 I 2 3
1121
4
1L
5
1
6 7 1 0
1121
1 1 1
*
2 1 3 1
REAL TIME
# ,121
EXTERNAL EWIPMENT
Fig. 4. CDC 6600 peripheral and control processors. (Courtesy of Control Data Corporation.)
Section 4 I Network computers and computer networks 475
In a typical system, one might expect to find the following CDC 7600
assignment of PCP's to be: The CDC 7600 system is an upward compatible member of the
CDC 6000 series. Although the main Pc in the 7600 is compati-
1 Operating-system execution, including scheduling and ble with the main Pc of the 6600, instructions have been added
management of Cc and all Cio's for controlling the io section and for communicating between
2 Display of job status data on T(display) Large Core Memories/LCM and Small Core Memory/SCM. It is
expected to compute at an average rate of four to six times
3 Ms(disk) transfer management a C('6600).
4 T(printers, card reader, card punch) The PMS structure (Fig. 5) is substantially different from that
5 L( # 1:3;to:C.satellite) of the 6600. The C('7600 Peripheral Processing UnitIPPU),
unlike the C(l6600 Peripheral and Control Processor)'s, has a
6 Ms(magnetic tape)
loose coupling with the main C. The PPU's are under control
7 T(64 Teletypes) of the main C when transferring words into SCM via K('Input-
8 Free to be used with Ms(disk) and Ms(magnetic tape) Output Section). The 15 C('PPU)'s have 8 input/output chan-
nels. These channels, which can run concurrently, provide the
9 Free link between C('PPU) and peripheral Ms's and T's. Some of the
10 Free PPU's are located in the same physical space as the Pc.
1-
16 w; 60 b/w
-- -- -- - - - - -1
D('Long Add)
D ( ' Increment)
D ( Pop" I a t i on Count)
D('Boolean)
-c I
-S K M.workinq: instruction D ( 'Shift)
interpreter D('Normalize)
M ' I n s t r u c t i o n Stack: D ( ' F l o a t i n q Add)
f l i p f l o D : 27.5 ns/w; D('Floatinq Multiply)
12 w : 60 b/w D ( ' F l o a t i nq D i v i d e )
The 7600 Pc can be interrupted by a clock, the PPU’s, and There have been instances of very large computers not being
trap condition within the Pc. A breakpoint address, BPA, can carried to completion either for financial or technical reasons.
be set up within Pc such that, on the program reaching BPA, The 6600 seems t o be the first large computer to achieve these
a trap is initiated. This interruption scheme is in contrast to marks of success. Here we are interested in the 6600 because
that of the 6600, which could not be interrupted or trapped. it has held the “world’s largest computer” title for so long.
The 7600 interrupt may be a reaction t o the lack of intercom-
munication in the 6600.
Computer-network examples
In Chap. 40, we present examples of seven computer networks.
Conclusions There is a dearth of both computer networks and of papers on
Although the 6600 was somewhat behind its announced delivery computer networks.
schedule and represented a significant drain on the financial This chapter takes examples from papers and from knowl-
resources of CDC, it is now clear that it is a successful product. edge of several existing or proposed networks.
Chapter 38
Summary The RW-400 Data System, based upon modularly constructed, to another model, due to growth in applications, often resulted
independently operating and flexibly connected components,is the logically in large expenditures of time and money. During maintenance or
evolved snccessor to conventional computer designs. It provides the means malfunction of a conventional computer its entire processing
by which information processing requirements can be met with equipment capacity is shut down. Real time processing reliability cannot be
capable of producing timely results at a cost commensurate with problem maintained on an around-the-clock basis. The conventional ma-
economic value. System obsolescence is minimized by the expandahility in
chine must process its problems serially. This serious limitation
numbers and types of processing modules. Real time reliability is assured
by component duplication at minimum cost and by the advanced design is only partially alleviated by time-sharing or computing-ele-
techniques employed in the system’s manufacture. Man-machine commu- ment-doubling designs. The high cost-per-hour of conventional
nication facilities are program controlled for maximum flexibility. Parallel computer operation rules out direct man-machine intercommuni-
processing and parallel information handling modules increase the system’s cation during other than emergency situations.
speed and adaptability when handling complex computing workloads. This The radically-new polymorphic design concept of the RW-400
polymorphic design truly represents an extension of man’s intellect through Data System was evolved by Ramo-Wooldridge engineers to pro-
electronics. vide a practical solution to those information processing problems
now inadequately handled by conventional computer designs. The
The RW-400 Data System is a new design concept. It was devel- RW-400 is a powerful new tool in the field of intellectronics-the
oped to meet the increasing demand for information processing extension of man’s intellect by electronics.
equipment with adaptability, real-time reliability and power to
cope with continuously-changing information handling require-
ments. It is a polymorphic system including a variety of function- System description
ally-independent modules. These are interconnectable through a The RW-400 Data System contains an optional number and variety
program-controlled electronic switching center. Many pairs of of functionally-independent modules. These communicate via a
modules may be independently connected, disconnected, and re- central electronic switching exchange. Each module is designed,
connected, in microseconds if need be, to meet continuously- within practical economic and functional limits, to maximize
varying processing requirements. The system can assume whatever system adaptability over a wide range of problem types and sizes.
configuration is needed to handle problems of the moment. Hence This new design embodies the latest proven electronic design
it is best characterized by the term “polymorphic”-having many techniques, assuring high processing speeds and high equipment
shapes. reliability. The RW-400’s modularity assures reliable, round-the-
Rapid, program-controlled switching of many pairs of func- clock processing of information with controllable computing ca-
tionally-independent modules permits nondisruptive system ex- pacity degradation during module maintenance or malfunction.
pandability, operating reliability, simultaneous multi-problem Practical man-machine intercommunication is achieved in the
processing capability, and man-machine intercommunication RW-400 system by use of program-controlled information display
feasibility. These are only partially found in computers of conven- and interrogation consoles.
tional design. Figure 1 shows the over-all system design. Modules of various
Computer users have been forced heretofore to match problems types communicate through a central exchange switching center.
to computer limitations. Problem changes posed serious reorien- Computing and buffering modules provide control for the system.
tation and reprogramming difficulties. Changes from one computer These modules are self-controlled and make possible completely
independent processing of two or more problems. One of the
‘Datumnution, vol. 6, no. 1, pp. 8-14, January/Fehruary, 1960. computer modules may be designated the master computer and
477
478 Part 5 I The PMS level Section 4 I Network computers and computer networks
CONTROLLING
I COMPUTING
BUFFERING I
4
DISPLAY
. I I
-I
SWITCHING CENTER
I I I
AUXILIARY STORAGE INPUT OUTPUT
in this role initiates and monitors actions of the entire system. An put/output requirements. Additional man-machine communica-
alert-interrupt network is provided to allow coordinated system tion devices such as interrogation, display and control consoles,
action. Therefore, the system as applied to given information may be included in the system as problem requirements dictate.
processing problems may change on a short range (microsecond) A Tape Adapter (TA) module is available to provide compatibility
basis, thus providing, through programming, a self-organizing with magnetic tape of other computers. Information generated at
aspect to the system. In addition, the system may change through Flexowriter inquiry and recording stations may be directly re-
the years as the applications change. The most efficient and eco- ceived by the system via the Peripheral Buffer Module. This latter
nomical complement of equipment is applied to the problem at module also buffers the receipt of TWX and punched tape infor-
all times. mation.
An RW-400 system is built around an expandable Central The way in which a particular RW-400 Data System functions
Exchange (CX) to which a number of primary modules may be depends on the number and type of each module included. It may
attached. These are: Computer Modules (CM); self-instructed initially be composed of the minimum number and variety of
Buffer Modules (BM); Magnetic Tape Modules (TM); Magnetic modules needed to do a small problem or the initial part of some
Drum Modules (DM); Peripheral Buffer Modules (PB); and large but yet-to-be-defined problem. Such a system would work
console communication Display Buffer Modules (DB). How many much like a conventional computer. It would probably include
modules are put together in a system is entirely a function of a buffer module and thus have a parallel data handling capability
system application. In addition to primary system modules, not found in the conventional design at a comparable price. The
punched card, punched tape, high speed printing and control initial system installation may then be augmented by the timely
console devices are available. These handle nominal system in- addition of modules.
Chapter 38 1 The RW-400-a new polymorphic data system 479
A buffer module (BM) has the capability to control its acquisi- The functional modules
tion and dissemination of information independently. The buffer The key to appreciative understanding of the power of the RW-400
provides a computer module with parallel data handling capability lies in knowledge of intermodule connection. It is appropriate to
without complicating the problem processing program with the describe the Central Exchange (CX) unit first, then follow with
conventional intermixture of arithmetic and housekeeping in- descriptions of the various modules.
structions. Information previously generated by the processing
The central exchange
program may be appropriately disposed of within the system while
processing continues. Data needed at a subsequent time in the The Central Exchange performs the vital function of intercon-
processing may be retrieved from system storage in advance of necting a pair of modules whenever requested to do so by either
need while processing progresses. The simultaneity of these oper- a computer or a buffer module. Since internal programmed control
ations not only materially increases over-all processing speed but is only possible within a computer or a buffer module, one of the
also increases the practical utility of the less costly types of in- interconnected pair of modules must be either a computer or a
ternal system storage such as a magnetic tape. buffer. The time in which any connection may be made or broken
The computer (CM) or buffer (BM) modules, when acting in is about 65 microseconds. An exchange has basic capacity to
a controlling capacity, may initiate connection to an information connect any of 16 computer or buffer modules to any of 64 auxili-
storage or handling module during that part of the processing ary function modules. There is nothing sacred about the number
program when the two can work profitably in unison. The pair 16 since it is possible to extend the CX module’s interconnection
of modules thus interconnected neither affect nor are affected by matrix through design modification when need arises. The CX is
other modules. Logical interlocks prevent unwanted cross talk an expandable, program-controlled, electronic switching center
among modules. An intermodule communication system lets con- capable of connecting or disconnecting any available pair of
trolling modules signal status or alert other such modules of their modules in roughly the time of one computer instruction execu-
need to communicate. The decision by a module receiving an alert tion. Figure 2 illustrates the permissible module interconnections
signal to permit interruption or to proceed is optional with within the Central Exchange.
that module. The optional interrupt feature is that needed to Every intersection on the illustration represents a possible
make the often-discussed but seldom-used program interrupt connection between modules. The “x-ed” intersections indicate
capability both useful and practical. Programs may thus permit typical connections in force at any point in time. The control logic
interruptions only at convenient points in the processing of the CX module’s connection table prevents more than one
sequence. interconnection on any horizontal (controlling) or vertical (con-
Modules may be assigned, under program control, to work trolled) data path representation on the diagram. When connec-
together on a problem in proportion to its needs. As soon as a tion is requested of the Central Exchange while one of the re-
module’s function is complete for a given problem, that module quired modules is already carrying out a previous assignment, the
may be released for reassignment to some other task. The system requesting module can be programmed to sense this condition and
is thus self-controlled to match processing capacity to each prob- wait until connection can be made without interference. Should
lem for the time necessary to do the job. Full system capacity may waiting be undesirable, the requesting module can go on about
be brought to bear upon a very large problem when needed. This its business and check back later to see when the desired connec-
capacity may be apportioned among a number of smaller problems tion can be made. There is an implication here, of course, that
for simultaneous processing, program compilation, program knowing the kind of a system he is dealing with, a programmer
checkout, module maintenance etc., when it is not needed for requests connections in advance of need whenever possible.
maximum system effort. Provision for master-slave control is included via an Assignment
From the preceding system description, it is apparent that such Matrix established within the CX module by a computer module
equipment can be expanded from a modest initial installation into previously assigned to master status. Such a provision is necessary
a very powerful and comprehensive information processing cen- to preclude inadvertent connection requests from unchecked
ter as requirements warrant. More specific descriptions of prin- programs or malfunctioning control modules from affecting sets
cipal system modules follow to give the reader a better feel of modules simultaneously processing another problem. Connection
for how this system might perform his information processing requests are therefore essentially filtered through both an assign-
work. ment and an interconnection validity matrix prior to being acted
480 Part 5 I The PMS level Section 4 1 Network computers and computer networks
TM
upon by the Central Exchange. The computer module manually classified as “external” or “input/output” instructions. All but
assigned to master status is the only one permitted to cause the three of the 24 arithmetic instructions fit into a symmetric scheme
interconnection of a pair of modules which does not include itself. of classification wherein there are seven basic operations, each
having three distinct modes. The seven basic operations are-add,
The computer module (See Fig. 3) subtract, absolute subtract, multiply, divide, square root and insert.
The Computer Module (CM) is a self-sufficient, general purpose, The three modes are-Replace, Hold and Store. If we let the
two-address, parallel word, fixed point, random access computer. capital letter “G” identify the first operand, “H” identify the
Its internal magnetic core memory has a capacity of 1024 words. second operand, an “’”signify an arbitrary operation, the sym-
A computer word consists of 26 information bits and 2 parity bits. bol “+”indicate replace, and “A” the word in the accumulator,
Each parity bit is associated with the 13-bit half word transferred then the three modes may be characterized as:
in parallel via the Central Exchange to other system modules. The
Replace: H ’ G + H, A
instruction repertoire of the CM consists of 38 primary instructions
Hold: H G+ A
whose various modes effectively result in over 300 different oper-
Store: A G+ H, A
ations. Of the 39 available CM-400 instructions, 24 may be classi-
fied as “arithmetic” and 10 as “program control” or “sequence The three remaining arithmetic operations are Add Accumulate
determining” instructions. Five additional instructions may be wherein the contents of H and G are added to the Accumulator;
Chapter 38 1 The RW-400-a new polymorphic data system 481
Multiply Accumulate wherein the contents of H are multiplied Suffice it to say that commands are available for carrying out a
by G and added to A; and Transmit where the contents of G are wide variety of intermodule data communication.
stored in H. The interrupt capability of a Computer Module is a logical
The ten program control instructions are Store, Store Double generalization of the “trapping” feature found on several conven-
Length Accumulator, Load Accumulator, Insert Mask in the tional computers. It permits the automatic interruption of a pro-
S Register, Stop, Link Jump, Compare Jump, Tally Jump, Test gram, at the option of the program, when the computer module
Jump and a Multi-purpose Shift. receives an “alert” that a condition requiring attention has arisen.
The five external instructions are those which cause data to It can be used to warn the program when an error of some type
* be transmitted to or received from a device external to the com- has occurred, minimize unproductive computer waiting time while
puter. Each command is multi-purpose in nature and hence equiv- another module completes its task, eliminate many programmed
alent to several conventional external instructions. The commands status test instructions and provide a convenient means of sub-
are-Command Output, Data Input, Conditional Data Input, Data jecting one computer module to the control of another. Program
Output and Character Transfer. A comprehensive discussion of the control of interruptions within a CM-400 is accomplished through
variation of each of these commands is not pertinent to this article. the sense register S. This register may be filled with an interrupt
J
CONTROL
I
LOGIC 1 L
OP ADDRESS ADDRESS
INSTRUCTION REGISTER
INPUT LINES b
MAGNETIC
CENTRAL CORE
EXCHANGE STORAGE
OUTPUT LINES
- I r J t l
TXCHANGE RFGISTFR c.
L- CONTROL PANEL
I INTERRUPT
ACCUMULATOR
I SENSING
REGISTER
ACCUMULATOR
EXTENSION
ALERT CONDITIONS
module’s storage and may not be in buffer cells addressed by the (the size of the storage available to hold the data in a sending
computer at execution time. The extended addressing and buffer or receiving module). Each block is preceded by a block identi-
register indexing may be used to materially simplify repetitive data fication which permits selective tape information searching by a
acquisition operations. Buffer Module. Single blocks imbedded in a tape file of other
The primary function of a Buffer Module is not, however, that blocks can be overwritten. A two-stack head permits automatic
of an auxiliary computer storage unit. The drum and tape modules verification of each block as it is written. Readback parity errors
more aptly serve this function in the RW-400 system. A Buffer are automatically detected during the writing process. Thus drop-
Module is capable of operating autonomously and of controlling out areas may be determined while the data is still available in
other modules such as Tape Modules, Drum Modules, Peripheral a computer or buffer for recording elsewhere.
Buffers, Display Buffers, Printers or Plotters. This capability en- A description of the RW-400’s tape handling capability would
ables the Buffer Modules in a system to perform routine tape not be complete without mentioning the Tape Adapter (TA)
searching and data transferral tasks thereby freeing the Computer module. This is a self-contained unit capable of performing the
Modules to do more computing. In its “self-instruction” mode, the reading and writing of magnetic tapes in a format acceptable to
buffer executes its own internally stored program in much the same the IBM 704 and 709 systems. The TA consists of an Ampex FR-300
fashion as a computer. The memory of a Buffer Module will half-inch digital tape transport, including dual gap head and servo
therefore be occupied by its own control programs as well as blocks control system; reading, writing and control circuits; and a module
of data which it is holding for transmission to other units. The housing with its own blower and power supply.
buffer is used to acquire information from the relatively slower
auxiliary storage and communication modules while the computer
proceeds at high speed. Blocks of information retrieved in advance
of computer need by the buffer may then be rapidly transferred
to the computer’s own storage or operated upon as they stand in
the buffer via the indirect addressing capability of the computer.
Another feature of the buffer is its switching capability. Each
Buffer Module is composed of two buffer units tied together. A
unit function switching feature permits the employment of the
two units together in an alternating mode of operation. Continuous
information transfer from tape to computer, for example, may be
accomplished without stopping the tape unit. A switching in-
struction executed simultaneously by both units of a Buffer Module
causes whatever devices were connected to the first unit to be
connected to the second and vice versa.
Now that the functional controlling modules and the module
interconnection concept have been discussed, the more conven-
tional auxiliary storage modules available with the system may be
described to round out the processing capability of the system.
The drum module handled by the RW-400 system. In addition to the actual Cathode
The Drum Module (DM) contains a magnetic drum with storage Ray Tube, numerical indicator, signal lamp and typewriter infor-
capacity of 8192 words. It may be connected to either a Computer mation outputs, several types of keyboard activated system control
or a Buffer Module through the Central Exchange. Average access and parameter entry facilities are provided on the console. The
time to the first word position on the drum is 8y2 milliseconds. total man-machine communication facility represented by each
Successive words are transmitted at the rate of 60,000 computer console is designed to be primarily a function of the computer
words per second. The Drum Module is conventionally used as control programs initiated by the analyst via his console.
an intermediate item storage device to minimize tape handling A set of Display Control Keys generate messages which are
time. recorded on a Peripheral Buffer sector for later interpretation and
display generation by a computer program. A set of Process Step
Special system communication modules Keys are provided the analyst so that he can initiate prepro-
The external data and man-machine communication of the grammed system processing variations. Associated with the Process
RW-400 Data System are handled via drum buffer modules. A wide Step Keys is an overlay or “program card’ which permits the
variety of asynchronously operated equipment is speed matched assignment of a variety of meanings to the set of Process Step Keys.
and program controlled through the features designed into these Insertion of the overlay by the analyst gives him a unique label
special system communication modules. for each Process Step Key and automatically cues the controlling
The Peripheral Buffer (PB) provides input/output buffers for computer to assign the corresponding set of programs to each key
communication between Computer or Buffer Modules and rela- message. A Data Entry Keyboard is provided on the console so
tively slow speed external devices such as Flexowriters, Plotters, that the analyst can enter control parameters when asked to do
Punched Tape Handlers, Teletype Lines and Keyboard Operated so via the display devices.
Equipment. The Peripheral Buffer stores its information in four A Joystick Lever affords the console operator a means of con-
pairs of bands which operate alternately as circulating registers. trolling the position of cross hair markers on the cathode ray
Each band contains eight input and eight output buffers for a total display tubes. Associated with the joystick are control keys which
of 32 input buffers and 32 output buffers in each Peripheral Buffer may be used to send a message to the controlling computer speci-
Module. Each buffer is a drum band sector 64 computer words fying the coordinates of the cross hairs. Control programs may be
long. Conventionally one input and one output buffer sector are written, for example, to act upon this information to reorient the
connected to each external device (such as a Flexowriter) to permit display with respect to the area selected by the cross hair position.
two-way communication between the external device and the A Light Gun is also provided as a means of selecting any point
RW-400 system. on the cathode ray tube displays. The gun emits a small beam
of light. With the beam centered on a given point on the cathode
The display buffer ray display tube, pressing the trigger results in the automatic
A Display Buffer (DB) acts as a recirculating storage for the generation of a message to the Peripheral Buffer specifying the
cathode ray tube display units in a Display Console. Information address in the Display Buffer containing the coordinates of the
to be displayed is sent to the DB band associated with a particular selected point.
display tube via the Central Exchange. The Display Buffer sends A set of Status and Error lights are contained on the Display
only status information back to other system modules upon request. Console to provide the console operator with over-all knowledge
The information displayed on any tube is controlled by the bit of the system and thus minimize conflicting control requests and
pattern sent to the Display Buffer. The display pattern is regener- intermodule interference. For example, a Peripheral Buffer may
ated 30 times per second to minimize image fading and flicker. not be ready to accept a console key message until after certain
The preceding explanation of the Display Buffer has little meaning previously requested control actions have been completed. The
to a reader unfamiliar with the features of the Display Console Status Lights indicate this condition to the console operator so
itself. This console is therefore described in more detail in the that he may act accordingly.
following paragraphs.
The printer module
Display consoles
The Printer Module (PR) is basically a 160 column, 900 line per
Display Consoles can give a problem “analyst” or “monitor” a minute Anelex type printer. It receives information from either
visual picture of the status or results of any information being a Computer or a Buffer module via the Central Exchange. Indi-
Chapter 38 1 The RW-400-a new polymorphic data system 485
vidual characters to be printed are represented by a 6-bit code CR communicates with Computer or Buffer modules via the
and are transmitted four to a computer word. Zero suppression, Central Exchange. It is capable of reading 80 column punched
line completion and information block end codes are included for cards at the rate of 2,500 cards per minute. The card punch is
format control. A plugboard is provided for flexibility in columnar connected to the system through the Peripheral Buffer Module
data arrangement. Paper feed is controlled by means of a loop (PB) since it is a relatively low speed device. Emphasis has not
of 7-channel punched paper tape. Control of the printing operation been placed on directly connected punched card equipment since
has been arranged so that the connected control module may send the sources of large volumes of punched cards usually convert this
line headings from one set of memory locations, stop sending data into magnetic tape form which may be more rapidly handled
information while going to a different part of the memory, and using the Tape Adapter Module (TA).
then proceed to send data from this new set of memory locations
to complete a line of print. References
RothS59; Westc6O
The punched card modules
The RW-400 Data System may be equipped with a high speed
punched card reading module (CR) and an IBM card punch. The
486 Part 5 I The PMS level Section 4 1 Network computers and computer networks
Appendix I
RW-40 I5P D e s c r i p t i o n
Mp S t a t e
M[1:1022]<26:1> Mp r e p i s t e r n n and 708.7 are t n n c c e s s i h l e
?c Console S t a t e
C J k 8 : I> conditional .im switches
Control,oanel ,tes t communication i n d i c a t o r
I n s t r u c t i o n Format
instruction/iQ6:l>
f/opd:l> := i Q 6 : 2 1 > f u n c t i o n or OD code b i t s
g<IO:l> := i Q O : l l > f i r s t address
j6:l> := g 6 : 1 > t e s t s e l e c t i o n parameter
h<10:1> := i < I O : l > second ac'dress
Operand Calculation Process
GQ6:1$ := (GI; n e x t f i r s t onerani'
( g = 17778) i E x t e r n a l J d d r e s s ExternalJddress + I)
-
t
G'Q6:1> := ( ( 4 = 0 ) 0;
HQ6:1> := ( H I ;
(0
(g = 1777)
next
-
< g < 1777) +M[qlQ6:l>;
M[External,AddressIQ6:1>)
second operand
(h 1777) -tExternal,Address +External-Address + I)
--
=
H'(26:b := ( ( h = 0) 0;
(o<k1777) id[h]<26:1>
(g = 1777) + H[External,Address]<26: I>)
Chapter 38 1 The RW-400-a new polymorphic data system 487
Hold D i v i d e ( : = op = 15) -)
A,E <-H/G: n e x t H ' + A ) ) :
( ( H Z G ) '0"
(H < G ) --i
- I;
(A,B t H / G ) ) :
Store D i v i d e (:= = 16) + ( ( A c ) -jnv + 1;
(A<G) + (
A,E ( - A / G : n e x t H ' ,-A)):
Replace Square Root ( : = op = 17) - ) ( A ,-sqrt(H+G): next H' t A):
H o l d Square Root ( : = op = 2 0 ) -,(A -sqrt(H+G)):
S t o r e Square Root ( : = op = 2 1 ) .i (A . - s q r t ( A + G ) : next H ' < - A ) :
Accumulate Add ( : = op = 25) --> (A .-OvoA + H + C);
Accumulate M u l t i o l y ( : = op = 26) + (A i-flvOA + H x G):
488 Part 5 I The PMS level Section 4 I Network computers and computer networks
-
A -
(g<10:7> # 0)
((g<IO> +
(
Isond; 7 g<IO> - 17??777?7) h
(96’
(g&
+
+
SR; 7 g8>
10WelectolOdata;
+
-
177777777) A
7 -
s<b I????????)
A
(gQ>
(gq’
@Test) - -
+ CJS; i g c / >
(P h));
177777777)); n e x t
The T e s t condition is a s e l e c t e d b i t of A , or o t h e r P c or I O b i t s .
Test := ( ( j = 0 ) - 0 ;
--
(i Ij i 32) -A<j\.;
(j = 33) (OV; Ov - 0 ) ;
(j = 34)
( j = 35)
( j = 36)
- (Parity error; Parity error - 0 ) ;
(Control-panel
-, (Tape-read;
t e s t ; Control,panel-test
Tape-read 0) ;
-0);
-
+
(g = 0) (P + h ) ) ;
T a l l y Jump (:= op = 33) - ((G = 7 0 )
(G = 0) -- ;
(P c h ) ;
-
-
(G < 0 ) - -
(G > O ) - ( G I
(G
- G
G + 1));
1; P - h ) ;
Compare Jump
Load A
Insert S
( : = op
(:= op = 34)
( := op = 35)
= 37)
-- --(A < G )
(A
(S
+
Oogoh);
P
( A A (OOgOh))
+ h;
v (S A 7 (bgoh)));
Store AB (:= op = 36) (G * 8; H - A ;
(g = 0 ) A (h = 0 ) + (A + 8; B +A))
) end Instruction,executior
Chapter 39
Parallel operation in the Control Data
66001
James E . Thornton
489
490 Part 5 1 The PMS level Section 4 I Network computers and computer networks
4096 WORD
CORE MEMORY
PERIPHERAL
a CONTROL
- 4096 WORD
COREMEMORY
PERIPHERAL
a CONTROL
-- 405'6 WORD
CORE MEMORY
PER1PHERAL
a CONTROL
- 4096 WORD
CORE MEMORY
PERIPHERAL
a CONTROL
PROCESSOR PROCESSOR PROCESSOR PROCESSOR
4096 WORD
c UKl CENTRAL MEMORY * 4096 WORD
*
*n
CORE MEMORY COREMEMORY
PERIPHERAL 6600 CENTRAL PROCESSOR PERIPHERAL
a CONTROL a CONTROL
PROCESSOR 6600 CENTRAL MEMORY 4 PROCESSOR
I
CORE MEMORY
PHERIPHERAL
a CONTROL
PROCESSOR
a CONTROL
PROCESSOR
CORE MEMORY
a CONTROL
PROCESSOR '?- 4096 WORD
CORE MEMORY
PERIPHERAL
a CONTROL
PROCESSOR
PROCESSOR
PROCESSOR
REGISTERS
cc
TIME-SHARED
INSTRUCTION
CONTROL
- PROCESSOR
MEMORI ES
I-
L
t1
WRITE PYRAMID
CENTRAL CENTRAL
MEMORY MEMORY
(60) (60)
o I 2 3 4 5 6 7 1011 12 1 3 1 4
EXTERNAL EQUIPMENT
I
Fig. 2. 6600 peripheral and control processors.
Chapter 39 I Parallel operation in the Control Data 6600 491
Input-output channels are bi-directional, 12-bit paths. One A single real time clock, continuously running, is available to
12-bit word may move in one direction every major cycle, or 1000 all peripheral processors.
nanoseconds, on each channel. Therefore, a maximum burst rate
of 120 million bits per second is possible using all ten peripheral
processors. A sustained rate of about 50 million bits per second Central processor
can be maintained in a practical operating system. Each channel The 6600 central processor may be considered the high-speed
may service several peripheral devices and may interface to other arithmetic unit of the system (Fig. 3 ) . Its program, operands, and
systems, such as satellite computers. results are held in the central memory. It has no connection to
Peripheral and control processors access central memory the peripheral processors except through memory and except for
through an assembly network and a dis-assembly network. Since two single controls. These are the exchange jump, which starts
five peripheral memory references are required to make up one or interrupts the central processor from a peripheral processor,
central memory word, a natural assembly network of five levels and the central program address which can be monitored by a
is used. This allows five references to be “nested” in each network peripheral processor.
during any major cycle. The central memory is organized in A key description of the 6600 central processor, as you will
independent banks with the ability to transfer central words every see in later discussion, is “parallel by function.” This means that
minor cycle. The peripheral processors, therefore, introduce at a number of arithmetic functions may be performed concurrently.
most about 2% interference at the central memory address control. To this end, there are ten functional units within the central
PERIPHERAL A N D
CENTRAL PROCESSOR
C O N T R O L PROCESSORS
UPPER
BOUNDARY
24
OPERATING
LOWER
BOUNDARY
12 I N P U T
OUTPUT C H A N N E L S
processor. These are the two increment units, floating add unit, designated by the f and m octal digits, from registers designated
fixed add unit, shift unit, two multiply units, divide unit, boolean by the i and k octal digits, the result going to the register desig-
unit, and branch unit. In a general way, each of these units is a nated by the i octal digit. In this example, the addresses of the
three address unit. As an example, the floating add unit obtains three-address, floating add unit are only three bits in length, each
two 60-bit operands from the central registers and produces a address referring to one of the eight floating point registers. The
60-bit result which is returned to a register. Information to and 30-bit format follows this same form but substitutes for the k octal
from these units is held in the central registers, of which there digit an %bit constant K which serves as one of the input oper-
are twenty-four. Eight of these are considered index registers, are ands. These two formats provide a highly efficient control of
of 18 bits length, and one of which always contains zero. Eight concurrent operations.
are considered address registers, are of 18 bits length, and serve As a background, consider the essential difference between a
to address the five read central memory trunks and the two store general purpose device and a special device in which high speeds
central memory trunks. Eight are considered floating point regis- are required. The d e s i p e r of the special device can generally
ters, are of 60 bits length, and are the only central registers to improve on the traditional general purpose device by introducing
access central memory during a central program. some form of concurrency. For example, some activities of a
In a sense, just as the whole central processor is hidden behind housekeeping nature may be performed separate from the main
central memory from the peripheral processors, so, too, the ten sequence of operations in separate hardware. The total time to
functional units are hidden behind the central registers from complete a job is then optimized to the main sequence and excludes
central memory. As a consequence, a considerable instruction the housekeeping. The two categories operate concurrently.
efficiency is obtained and an interesting form of concurrency is It would be, of course, most attractive to provide in a general
feasible and practical. The fact that a small number of bits can purpose device some generalized scheme to do the same kind of
give meaningful definition to any function makes it possible to thing. The organization of the 6600 central processor provides just
develop forms of operand and unit reservations needed for a this kind of scheme. With a multiplicity of functional units, and
general scheme of concurrent arithmetic. of operand registers and with a simple and highly efficient address-
Instructions are organized in two formats, a 15-bit format and ing system, a generalized queue and reservation scheme is practi-
a 30-bit format, and may be mixed in an instruction word (Fig. cal. This is called the scoreboard.
4). As an example, a 15-bit instruction may call for an ADD, The scoreboard maintains a running file of each central register,
of each functional unit, and of each of the three operand trunks
to and from each unit. Typically, the scoreboard file is made up
of two-, three-, and four-bit quantities identifying the nature of
f rn I h register and unit usage. As each new instruction is brought up,
the conditions at the instant of issuance are set into the scoreboard.
A snapshot is taken, so to speak, of the pertinent conditions. If
_J no waiting is required, the execution of the instruction is begun
immediately under control of the unit itself. If waiting is required
OPERATION
(for example, an input operand may not yet be available in the
CODE central registers), the scoreboard controls the delay, and when
60 BITS RESULT
0 REG
released, allows the unit to begin its execution. Most important,
(I of 8) this activity is accomplished in the scoreboard and the functional
151
4
OPERAND
unit, and does not necessarily limit later instructions from being
REG. brought up and issued.
(I of 8)
In this manner, it is possible to issue a series of instructions,
some related, some not, until no functional units are left free or
2nd C RAND
REG until a specific register i b to be assigned more than one result. With
(I of 8)
just- those two restrictions on issuing (unit free and no double
result), several independent chains of instructions may proceed
Fig. 4. Fifteen-bit instruction format. concurrently. Instructions may issue every minor cycle in the
Chapter 39 I Parallel operation in the Control Data 6600 493
absence of the two restraints. The instruction executions, in com- previous uses of that register are completed. The central registers,
parison, range from three minor cycles for fixed add, 10 minor therefore, provide all of the data to the ten functional units, and
cycles for floating multiply, to 29 minor cycles for floating divide. receive all of the unit results. No storage is maintained in any unit.
To provide a relatively continuous source of instructions, one Central memory is organized in 32 banks of 4096 words. Con-
buffer register of 60 bits is located at the bottom of an instruction secutive addresses call for a different bank; therefore, adjacent
stack capable of holding 32 instructions (Fig. 5 ) .Instruction words addresses in one bank are in reality separated by 32. Addresses
from memory enter the bottom register of the stack pushing up may be issued every 100 nanoseconds. A typical central memory
the old instruction words. In straight line programs, only the information transfer rate is about 250 million bits per second.
bottom two registers are in use, the bottom being refilled as quickly As mentioned before, the functional units are hidden behind
as memory conflicts allow. In programs which branch back to an the registers. Although the units might appear to increase hard-
instruction in the upper stack registers, no refills are allowed after ware duplication, a pleasant fact emerges from this design. Each
the branch, thereby holding the program loop completely in the unit may be trimmed to perform its function without regard to
stack. As a result, memory access or memory conflicts are no longer others. Speed increases are had from this simplified design.
involved, and a considerable speed increase can be had. As an example of special functional unit design, the floating
Five memory trunks are provided from memory into the central multiply accomplishes the coefficient multiplication in nine minor
processor to five of the floating point registers (Fig. 6). One address cycles plus one minor cycle to put away the result for a total of
register is assigned to each trunk (and therefore to the floating 10 minor cycles, or 1000 nanoseconds. The multiply uses layers
point register). Any instruction calling for address register result of carry save adders grouped in two halves. Each half concurrently
implicitly initiates a memory reference on that trunk. These in- forms a partial product, and the two partial products finally merge
structions are handled through the scoreboard and therefore tend while the long carries propagate. Although this is a fairly large
to overlap memory access with arithmetic. For example, a new complex of circuits, the resulting device was sufficiently smaller
memory word to be loaded in a floating point register can be than originally planned to allow two multiply units to be included
brought in from memory but may not enter the register until all in the final design.
INSTRUCTION
STACK
8 60417
WORDS
OPERANDS
(60-BlT)
(UP TO 8 WORDS
. '
the ten peripheral processors to a condition which allows infor- which now appear to be quite successful. Control Data is exploring
mation to enter from any chosen peripheral device. Such loads advances in technology upward within the same compatible
normally bring in an operating system which provides a highly structure, and identical technology downward, also within the
sophisticated capability for multiple users, maintenance, and so same compatible structure.
on.
The 6600 Computer has taken advantage of certain technology References
advances, but more particularly, logic organization advances AllaRM; ClayB64
Chapter 39 I Parallel operation in the Control Data 6600 497
Appendix I
Pc S t a t e
P<17:0> Program counter
x[0:7]<59:0> Main a r i t h m e t i c r e g i s t e r s . XL1:5], are i m p l i c i t l y loaded from
Mp when A [ l : 5 ] are loaded. X[6:71 are i m p l i c i t l y stored i n
A[O:7]il7 :0>
M p when A [ 6 : 7 ] are Zoaded.
B[Ol<l7:0> := 0 B r e g i s t e r s are general a r i t h v e t i c r e g i s t e r s , and can be used
E[ I :7]<17 :O> as i n d e x r e g i s t e r s .
Run 1 i f i n t e r p r e t i n g i n s t r u c t i o n s , not under program c o n t r o l .
E M 4 7 : O> E x i t mode b i t s
Address gut,of,range,rnode := EM<I 2>
O p e r a n d g u t a f ,rangeurnode := EM<13>
l n d e f i n i teaperandurnode := EM<14>
The above d e s c r i p t i o n i s incomplete i n tha t h e above mode's alarm allow c o n d i t i o n s t o t r a p Pc a t M p [ R A ] . Trapping occurs i f
an alarm c o n d i t i o n occurs "and" t h e mode i s a one,
Mp S t a t e
MP [O :7777778 169:O> main core memory of 218 w, ( 2 5 6 kwJ
Ms [0 :2015232 1 6 9 :0> ECS/Extended Core Storage Program can only t r a n s f e r data between
Mp and Ms. Program cannot he executed i n Ms.
RA<I 7 :O> r e f e r e n c e for r e l o c a t i o n ) address r e g i s t e r t o map a l o g i c a l M p '
i n t o p h y s i c a l Mp
FL<I 7:0> f i e l d l e n g t h - t h e bounds r e g i s t e r which l i m i t s a program's
access t o a range of Mp'
RAECSB9:36> r e f e r e n c e o r r e l o c a t i o n r e g i s t e r f o r Ms (Extended Core S t o r a g e )
FLECK59 :36> f i e l d l e n g t h f o r ECS
Addressaut df-range a b i t denoting a s t a t e when memory mapping i s i n v a l i d
The f o l l o w i n g Mp" array is reserved when Pc s t a t e i s s t o r e d , and switched t o another j o b . The exchange instruction i n
a Peripheral and Control Processor enacts t h e operation: iMp"+ Mp; Mp t Mp").
Mp"[ n]<53 :0> := PoA[ 0300000008
Mp"[n+1]<53:0> := RAoA[lloB[Il
Mp"[ n+2]<53 :0> := FLoA[2]oB[2]
I n s t r u c t i o n Format
instructionQ9:0> although 30 b i t s , most i n s t r u c t i o n s are 1 5 b i t s ; see
I n s t r u c t i o n I n t e r p r e t a t i o n Process
frnd:O> := i n s t r u c t i o n Q 9 : 2 4 > operation code or f u n c t i o n
frni <8 :O> := fmoi extended op code
iQ:O> := i n s t r u c t i o n Q 3 : 2 1 > s p e c i f i e s a r e g i s t e r or an e x t e n s i o n t o op code
jQ:O> := i n s t r u c t i o n Q 0 : l b specifies a register
k Q :O> := i n s t r u c t i o n < l 7 : 1 5 > specifies a register
jkd:O> := j o k a s h i f t constant 16 b i t s )
K<17:0> := i n s t r u c t ion<l7:0> an 18 b i t address s i z e constant
long-instruction := ( ( f m < log) v 30 b i t i n s t r u c t i o n
(50 I f m < 53) v
(60 s fm < 63) v
(70 i f m < 73))
shortJnstruction := long i n s t r u c t i o n 15 b i t i n s t r u c t i o n
I n s t r u c t i o n I n t e r p r e t a t i o n Process
A 15 b i t ( s h o r t ) or 30 b i t ( l o n g ) i n s t r u c t i o n i s fetched from M p ' - [ P ] q x 1 5 f 15 - 1 : p x 1 9 where p = 3, 2, 1, or 0. A 30
b i t i n s t r u c t i o n cannot be stored across word boundaries (or i n 2 , Mp' l o c a t i o n s ) .
P<l>4 a p o i n t e r t o 15 b i t quarter word which has i n s t r u c t i o n
Run + ( i n s t r u c t i o n Q 9 : 1 5 > +Mp'[P]<(p x 15 + 14):(p x 15)r; next Fetch
p t p - I ; next
(p = 0) A IongJnstruction +Run to;
(p # 0) A long-instruction -' (
instruction<l4:0> tMp'[PI<(p X 15 + 1 4 ) : ( p X 15)>:
p t p - I); next
Instruction,execution; next execute
(p = 0 ) - ( p -3; P t P + I))
S e t X[il/SXi
" S X i A j + K" (fm = 70) 4 (x[ 11 + sign,extend(A[j
" S X i B j + K" (fm = 71) --f (x[ 11 + sign,extend(B[j
"SXi Xj + K" ( f m = 72) -f (X[ i l sign,extend(X[j]
"SXi X j + Bk" (fm = 73) --f (X[ il t sign,extend(X[j
"SXi Aj + Bk" (fm = 74) + (X[ i l + sign,extend(A[j
"SXi Aj - Bk" ( f m = 75) + (X[ i l + signgxtend(A[j
"SXi Bj + Bk" (fm = 76) + (X[ i] c s i g n & x t e n d ( B [ j
"SXi Bj - Bk" ( f m = 77) + (X[ i] c s i g n & x t e n d ( B [ j
Miscellaneous program controZ
"PSI (:= f m = 0) + (Run t 0 ) ; program s t o p
"NO" ( : = f m = 46) + ; no operation; pass
d~unpu n c o n d i t i o w l
"JP B i + K" (:= frn = 0 2 ) + ( P + Sy i] + K; p + 3 ) : jump
Jwnp on X [ j ] conditions
"ZR X j K" (:= f m i = 030) + ( ( X [ j ] = 0) + (P t K ; p ~ 3 ) ) ; zero
"NZ X j K" (:= f m i = 031) + ( ( X [ j l # 0) + ( P c K ; p ~ 3 ) ) ; non zero
"PL X j K" (:= fmi = 032) --f ((Xcj] z 0) --f (P t K; p t3)); P I U S 011 position
"PIG X j K" (:= frni = 033) + ( ( X [ j ] < 0) + (P + K ; p t3)); negUtiUe
"IR X j K" (:= f m i = 034) + ( out of range constant t e s t s
Appendix 2
Pc S t a t e
A<17: O> accmlator
P<l I : o> Progrm Address Counter
E.5, S t a t e
M[0:40951<11:0> 4
M index[0:631<11:C’:= M[O:63]<11:D soecial arrau i n PE reserved f o r i n d e x r e g i s t e r
C(’Centra1) S t a t e
CPuP<17: E. the main P c i n s t r u c t i o n address counter
CPM[O:77777781<59:O> the Mp o.f main C
I O R e g i s t e r s f o r C i ‘PCPI
C,OATA[O:63]<Il:O> data b u f f e r s a t peripheral K ‘ s
C,ACTC 0: 633 a b i t t o denote ip 1 of t h e 64 K ’ s i s a c t i v e
LFLG[O:631 denotes a , f u l l (or emptgl b u f f e r a t t h e K
C,FCN[0:631 <I I :o> function or i n s t r u c t i o n r e g i s t e r a t a s p e c i f i c K
I n s t r u c t i o n Format
Ins[ 0:1 ]<I 1 : Cb instruction
long-i nstruct i on 2 w i n s t r u c t i o n : d e f i n e d i n terms o f op codes, see Table, page 50;
short,instruction := 7 long,instruction 1 ZL instruction
K5:D := lns[0]<11:6> f u n c t i o n o r op code
dc5:D := Ins[0]<5:O>
m~11:0> := Ins[l] address Dart
drKl7:0> := d m
-7d<5>
d<5>
-
d,sign<l I :O> := (
OOd :
+l d)
md<lI:O> := (
( d = 0 ) -tn:
(d # 0 ) + m + M[dl)
I n s t r u c t i o n I n t e q j r e t a t i o n Process
Run + ( l n s [ O l c M L P 1 ; P t P + 1 : next fetch
Iongoinstruction + (InsCIl +MEPI; P t P + I ) : next
lnst ruct ion,execution) execute
Chapter 39 I Parallel operation in the Control Data 6600 503
Implementation
The I O x 5 2 b i t s i n t h e b a r r e l for t h e I O PC? IS? i n c l u d e :
A[0:9]<17:0, accumulators
P[O:9]<Il :o, i n s t r u c t i o n address counters
F = XsY8 Instruction e x e c u t i o n := (
X' 8 00 06 07
8
PSN --1; LJM i( RJM + ( PJN + ( MJN - (
nu 7 7 Pi- md); MCmd] t P ; 7A 4 7 > + ( A<17> + (
P c md+l);
00
SHN
A+Ax2L
i
igr
LMN + (
A(-A@d) ;
LPN - (
AcAAd) ;
SCN i(
AtAM);
LDN + (
Atd);
LCN + ( ADNi (
A<- A+d) ;
SEN -,(
AcA-d) ;
A +-d) ;
IO
30
1
SED -> (
I
-7
A00 + (
40
50
ADM i (
7
SBI + (
SBM i(
bcA-M[z])
RAI + ( 401 + (
A c M [ z ]-I ;
next
M[ z ]<-A ) ;
CRM + (
d
CWD -> ( CWM i( AJM i(
c
IJM i( FJM + ( EJM + (
M[m:m+ CPM[A 1.- CPM [A :A+ CdXTCdl-, ( 7CA
,CT [d ]+ ( CvFLG [d ]+ ( 7CF
,LG [d ]+
SxM[dl-l I t M[d: d+5 I); M[d 3- I I+ t - Pcm)) ;
60
CPM[A:A+ M[m:m+
Mldl-11); 5xM[d]-Il);
1 end I n s t r u c t i o n q x e c u t i o n
*
1 uord or short,instruction
Chapter 40
Computer-network examples
We are just entering the era in which general-purpose networks accessing characteristics, and the size of the information unit that
of computers make technical and economic sense. The requisite derive wholly from the links. For instance, many computer net-
hardware and software development of operating systems and works would like to buy their transmissions from the telephone
multiprogramming capability is still maturing. Thus, unlike the system for very short intervals (milliseconds), at very high data
other PMS structures discussed in this book, there is no supply rates, and with short switching time (milliseconds), Le., bursts.
of operational systems with published descriptions upon which we Switching time and pricing policies within the telephone system
can draw. Consequently, we have assembled several brief examples conspire t o make this a difficult thing to do. Thus, with networks,
of networks to provide at least some illustrations of what is sure links become important independent components.
to be an important aspect of computer systems in the near future. One classification of networks (N’s) is by fixed or variable
The more interesting of these examples are still in the planning interconnection structure. Fixed structure may mean that the links
stages; those that exist currently are still highly specialized. are fixed permanently over the life of the network. However, fixed
Spatially distributed intercommunicating networks of digital structure may mean only that connections once made must be
devices have existed for a long time. But many of the ones that held for long periods of time relative to the message flows. An
come most easily to mind are not computer networks. For example, example is the telephone switching system mentioned above,
the various airline reservation systems like American Airline’s which looks like a variable switching structure at the level of
SABRE [Plugge and Perry, 19611 have spatially distributed termi- human conversations, but like a fixed switching structure at the
nals (T’s) with a single Pc, possibly mediated by Pio’s or Cio’s. level of computer conversations. Figures l a and ICshow variable-
When there are several Pc’s, they are functionally integrated so structure systems; Fig. l b shows a fixed-structure system. In the
as to provide the total capacity and reliability needed. Some former, any C can talk directly to any other C. In the latter, each
military networks, such as the SAGE Air Defense System [Everett C talks directly to only a few C’s; thus, to communicate with the
et al., 19571 have multiple computers (SAGE actually has a very other C’s, it must transmit through them as links; that is, it must
large number). But they transmit to each other highly specialized use another C as an L.
data streams (for example, aircraft positional information for con- A second classification of N’s is by the nature of the delays
trol). The National Physics Laboratory of England has made a very suffered by the messages as they travel from an initiating C to
comprehensive proposal for a general-purpose network [Davies et a target C. Communication can be direct, in which case the only
al., 19671, although we do not include it as a chapter. Again, it delays are those through the switches (S) and links (L) between
is just in the proposal stage. The Lawrence Radiation Laboratory the two C’s (Figs. l a and lb). Alternatively, communication can
(at Livermore) is no doubt the earliest and most impressive net- involve storing messages at intermediate nodes (called store-and-
work. forward communication), thus introducing additional memory
In terms of our PMS descriptions, a computer network (N) delays into the communication but decreasing the demands for
requires at least two C’s not connected through primary memory. coordination between the two C’s. Although store-and-forward
Thus each C has a Pc and an Mp of its own and has to communi- systems can be built with the intermediate nodes being K s with
cate with other C’s through messages. Duplex computers are thus buffer memories, in the present context the natural form for such
defined as networks, provided they do not share Mp. For networks, a system uses the other C’s in the system as the intermediate nodes,
links (L’s) are usually shown explicitly. In spatially distributed as in Fig. IC.
systems, both the time delays and the flow rates of the links are Several kinds of reasons can justify the existence of a particular
significant. The latter is so partly because the networks must make network. The following list is adapted from Roberts [1967]:
use of the telephone communication system, which exists inde-
pendently of the networks, thus having parameters that do not Load sharing. A problem (program and data) initiated at one C
correspond with any of the internal parameters of the individual that is temporarily overloaded is sent to another for processing.
computers. There may also be limitations of reliability, cost, The cost of transshipment must clearly be less than the costs of
504
Chapter 40 I Computer-network examples 505
;/
/
C I
Data sharing. A program is run at a node that has access to a large, Reliability. If some components fail, others can be used in their
specialized data base, such as a specialized automated library. It is place, thus permitting the total system to degrade gracefully. (At
less costly to bring the program to the data than t o bring the data the present state of the art, peripheral computers are needed to
to the program. isolate the periphery from the unreliability of the network, and
vice versa.)
Program sharing. Data are sent to a C that has a specialized
program. This might happen because of the size of the program Peak computing power. Large parts of the total system can be
(hence, fundamentally the same reason as data sharing), but it devoted for short periods to a single task, if there are important
might also happen because the knowledge (i.e., initialization and real-time constraints to be met. This depends on being able to
error rituals) to run the program is available at one C but not fractionate the task into independent subtasks.
at another.
Communication multiplexing. Efficient use of communication fa-
cilities is obtained by multiplexing a number of low data-rate
Specialized facilities. Within the network there need exist only
users, for example, T(typewriter; 150 b/s)’s. This may not be a
one of various rarely used facilities, such as large random-access
reason for a network per se but may justify a larger network,
memories, or special display devices, or special-purpose array
provided that there is some reason for having one in the first
processors.
place.
ZBM ASP (Attached Support Processor) The addition of smaller modules of Mp in the form of a
second processor. The processing of the application is di-
This first example (Fig. 2) is the simplest of all computer networks,
vided between the main processor and the support proces-
consisting of two computers tied together, with each functionally
sor, with each performing those functions for which it is
specialized (and in addition required to be physically close). The best suited. The core requirements for the support processor
function of Csupport is job setup and breakdown, that is, pre- are small in comparison with those for the main processor.
processing and postprocessing. All T’s for the network are handled With this division of responsibilities, the system can expand
by it (except for Txonsole on C.main). The function of C.main its capabilities with a minimum addition of storage.
is to process data. Thus this is an escalated version of the Pc-n
The elimination of concurrent use of Pc time on the main
Pi0 organization, where the Pio’s have been made into a Csupport processor for processing support functions (such as printing).
and thus can take on additional functions. It should be compared Because the clerical functions are assigned to the support
with the CDC 6600 organization, which is C.main-10 Cio, but processor, the main processor no longer shares Pc time
where the Cio’s are rather small Cio(4096 w; 12 b/w) compared between the support functions and the application pro-
with the C.support. The ASP organization is the 360 analog of grams. Therefore, the application has the opportunity to
a system consisting of an IBM 7090-IBM 7040 which emerged use all the resources of the main processor to fiill capacity.
spontaneously in the early sixties at several IBM installations in The addition of selector channels. The channel capacity of
order to deal with 7090 1/0 bottlenecks. Thus this kind of simple the system has been increased by one or more additional
computer network has been with us for some time. selector channels attached to the support processor.
In more detail, the advantages that are claimed for ASP are
An algorithm for efficient management of the direct-access
in reducing resource interference:l storage devices for system input/output data sets. The
algorithm was designed specifically to accommodate the
‘Adapted from IBM System/360 Attached Support Processor (ASP)System
data demands, the data set characteristics, and the available
Description, H20-0223-0.
private devices. The input/output routines always know the
position of the access mechanism, thereby ensuring mini-
mum seek time when data are transferred to the devices.
C ( ‘Ma i n) :=
IBM cites the above reasons for using the ASP system. These
views differ from ours on its usefulness. Ideally, a multipro-
grammed single-processor or multiprocessor structure would easily
provide all the above advantages without the overhead of having
large Mp’s on two computers (both of which hold nearly the same
operating system). Also, as we note in the introduction to the
System/S60 (page 584), the support-computer functions can be
handled in the main computer with very little loss of large Pc
--
power (3 to 10 percent). A multiprocessor structure should also
cause less overhead, by not passing data sets between two C’s.
?
Ms(disk) ... l s ( m a g n e t i c tape) ... (Alternatively, in ASP this could be done by an S to common Ms
I from both C’s.)
r“‘-
Mp((.l 5)megabyte)
University of Texas network
The structure shown in Fig. 3 is similar to ASP in that a C.main
Pie. .. Pc(’IBM System/360 Model 40. 50) is used, with some job setup and breakdown being done in several
other C’s. However, there are several of these C’s, and they provide
T(card) ... T ( l i ne; p r i n t e r ) . .. T (typewri t e r )
independent power for small tasks where the setup time for the
large system is greater than the computation time. They are also
Fig. 2. IBM S y s t e m / 3 6 0 Attached Support Processor system/ASP physically remote from C.main and thus serve to make the power
PMS diagram. of the central facility available at local sites. The Teletypes are
Chapter 40 I Computer-network examples 507
PDP-6 with two Pc's ant1 a 262 kword Mp and a 10"bit fixed-
e I etype) head disk for fast-access files), three terminal control com-
puters (DEC PDP-8's), and a large central file (a 1012-bit IBM
Telephone Exchange) Photostore controlled by an IBM 1800 computer). Hardwired 4
megabit per second links connect the large computers to the
switching computer. The terminal computers and the large file
CDC 6600; Computation Center)
are also connected to the switching computer.
i e l e p h o n e Exchange )
The main purpose of the network is to gain access to the
E. L -C('CDC
L
L
-C('CDC
-C('8231
I
1700; L i n q u i s t i c Research L a b o r a t o r y ) -
3100: College of Business A d m i n i s t r a t i o n ) -
Computer Terminal)+
+(card)-
central filing, printing, and terminal facilities. Load sharing is not
an important consideration because each of the large computers
operates nearly autonomously. Thus little change was required in
T(1ine: p r i n t e r ) + each system to be integrated to the network. Jobs enter the net-
L(to: o t h e r C ' s o f f campus)- work in any of three ways-by the batch input terminals of a
large computer; by the typewriter inputs of a large computer;
Fig. 3. The Computation Center, University of Texas, (Austin) Network or by the typewriter inputs of the terminal control computer
PMS diagram. which in turn connects to the central switch. Unlike most uni-
versity computation centers, which provide service for many
users with small jobs, the LRL network is oriented to users with
used to enter jobs directly to the C.main, where they are run in (multiple) large jobs.
a batch mode.
The network of Fig. 3 is that at the University of Texas, as
derived from its internal planning memoranda. Similar systems are
...
in existence or under construction at other universities.
..
.., T s t o r a g e CRT: d i s p l a y :
I [keyboard
-
3 ...
Figure 4 shows a network that is proposed for the M.I.T. campus
[Bhushan, Stotz, and Ward, 19671. It moves to a more complex
('Dataphone). T 'Dataphone;
i L . 8 ) kb/s
(1.2
3
switching system, partly because there are two C.main's. Here
an S(direct) is used in a non-store-and-forward mode as each C
communicates directly with another. The communication rate
between C's is 40 -
230 kb/s. (Note that at higher data rates a
1 ('Dataphone). , . T('Dataphone) . . . T ( 'Dataphone). ..
fairly large computer is necessary just to handle the store-and-
forward message switching information rates.) The purpose of the
network is to allow users of the small or terminal C's to get
access to C('1BM 360/67) and C('GE-645). These two C's can,
of course, communicate with one another. A large number of
users are connected to T(typewriters) via the S('Te1ephone Ex- C ( 'Satel 1 it e ) . . .
change). :(CRT; display). ..
The Lawrence Radiation Laboratory (at Livermore) network
3
The LRL network, started in 1964, appears to be the earliest
general-purpose-computer network. It serves a user population 'S('Te1ephone exchange: (IO- 1 5 ) c h a r / s , ( l . 2 - 4.8) kb/s)
of approximately 1,000, with several hundred simultaneous on 'S('Wideband Communications Center; (40.8 - 2 3 0 . 4 ) kb/s)
I
duDlexed f i l e
C's sharing a
common secondary
memory f o r l o n g
term f i l i n g
. _ . _ . _ . -I
I
.___._._._ I
main p r o c e s s o r s
w i t h secondary
memory (Ms)
. C ( s f ; M5) - -I 1
I l ; Z ; ( c R T ; console)-
-C ( s f : M s ) T l r-3- - - - - - - - - - 1 LN, :
.C(sf: Ms)- -I .\
u
h i g h speed message I
I
I
concentrators, I I
s p e c i a l systems, I
s t o r e and f o r w a r d / s f L S "
- -I F-s-x
- -I--
I-C(sf: MsF----l
--
'
'S(50 180 b / ~ e c ) ~
console)-
'S(600
3S(40
'S(200-
4800 b / ~ e c ) ~
50 k b / ~ e c ) ~
2000 kb/sec; fixed)
~~ -
message concen- I
I, ,-;tT(card, l i n e s , analoq, p l o t ) )
I
3
i x e d , ( ' T e l e p h o n e Exchange; d i r e c t ) , (C(sw:
I
I
t r a t o r s , speciall
systems, s t o r e
and f o r w a r d / s f
I
L;(card,
I ;
line, plot)-
L (200 - 2000 k b / s )
3 Teletype, -
-.- .- L(40 - 50 k b / s )
u
- _ _ - -
- - - - - -- L(50 -
~ ( 6 0 0 4800 b / 5 )
180 b / 5 )
n e t w o r k periphery
all the Ms functions for all C's, except the C(1ibrary). A library's
Typical local network computer, though strongly coupled to the network, would have
We summarize in Fig. 5 the direction in which the last three its own files and specialized terminals, including hard copy devices
networks are moving by presenting a hypothetical, local network, oriented to library needs. The C.file increases the requirements for
as it may mature on many large university campuses (and large the S.centra1 but provides much more economic Ms, as well as
industrial establishments). The network is conceived as a single easing the ability to connect new C's into the system, since they
computing facility, to serve a clientele with many heterogeneous immediately have access to an organized Ms.
but partially overlapping computing needs. An essential feature The reader should note that the four switches (S's) can be either
of the environment of the network is that the collection of com- fixed links, variable switches (e.g., Telephone Exchange), or a
puting resources it connects are not planned all at once but keep computer used as a direct switch or as a store-and-forward switch.
growing and changing in imperfectly controlled ways. This arises The most interesting aspect of this network is that it has a
from the quasi-independent nature of the subparts of large uni- general hierarchical structure and is like other hierarchical organi-
versities and engineering establishments. In any event, the network zations. Here, the levels of the organization are based on data
is a mixture of functionally independent and functionally special- rates. For example, there is a very low-level computer which deals
ized C's. One probable feature is the duplexed C.files which handle with the basic communication to typewriters at -150 b/s. This
Chapter 40 I Computer-network examples 509
i('SC
I
/
I
N('ComLogNet) := I
I
I
I
I
I
1
\
\
\
\
,'
('SC)'
l T ( ' 5 u b s c r i b e r Station/SS) :=
I
(T(Te1etype 'Compoun81 'Magnetic Tape T e r m i n a l 4 ) )
(SS)2 ..
3-
"See F i q u r e 6 ~ .
3T(1Compound) :=
r L r 5 , I 50, r-S F'(card; reader)j S ( ' ComLogNet)
300,600 b/s M.buffer T ( c a r d ; punch)+
4T
c Maqnetic Tape]
Termi na 1
:= [[L 1200,2400, -K-Ms(magnetic
4800 b / s ] ,
T ( 'Tel etype)-
M s ,b u f f e r
tape)
1
-
IN('Switchin9 Center/SC)
zTT('Subscriber Station/SS)
See F i g u r e 6 d
See F i g u r e 6a
Fig. 6a. Combat Logistics N e t w o r k K o m L o g N e t PMS diagram. Fig. 6c. S('ComLogNet) PMS diagram.
510 Part 5 I The PMS level Section 4 I Network computers and computer networks
(10 char/s) and medium (1,200 4,800 b/s) speed, as shown in - 6b a tree is used to present the relationship of constituent members
Fig. 6a. In this regard the network is simply a message switch for of ComLogNet. From it we see that at the first level ComLogNet
the three terminal types. It employs C's for the switching elements has just a switch, links, and terminals (as shown in Fig. 6 4 . The
and is fundamentally a store-and-forward system. Had it not been networks switch employs five specialized N('Automatic Electronic
for security, reliability, response time, and other considerations, Switching Centers/SC)'s which communicate among each other
it would have been possible to construct an equivalent system (Fig. 6c). Terminals connect to the individual N('SC)'s and mes-
using standard lease wire switches (or telephone exchanges). In Fig. sages are routed between two T's, either by a store-and-forward
process within N('SC) or among two N('SC)'s.
The individual N('SC)'s are located at five specific locations and
C (Communications. Data Processor/CDP) :=
consist of fixed computer configurations of five to seven C's. The
structure of N('SC) (Fig. 6 4 is formed basically by a duplex C
structure which handles most processing. Attached to the two
~ ~ reader) C('Communications
~ Data Processor/CDP) are ~ two to four C('Ac- ~
C(CDP) := cumulation and Distribution Unit/ADU) which handle communi-
7 T . c o n s o l e -
T(paper tape; r e a d e r ) +
M s ( # l :3; drum)
3 cation-link processing. A C('Tape Search Unit) is used off line to
process data from Ms(magnetic tape). The structures of C('CDP),
C('Tape Search Unit), arid C('ADU) are defined within Fig. 6d.
I:.;;;1
3C ( 'Accumulation and D i s t r ib u t ion/ADU) :=
independent management, and that have no agreed-upon func-
Mp 'Data s t o r e ; Pc ;-K #i:Z5; ('low tional specialization vis-i-vis each other. Furthermore, the uses
]
speed; 0 - 6 0 1 -L4- that each node will make of other nodes will be the fairly general
I
- K [BOO
b/s)
speed); 601
b/s)
('high
- ones cited at the beginning of this chapter, as generated by a
general scientific community. Since many of the institutions that
Mp 'Procedure; will be tied in are major academic institutions, diversity will be
guaranteed. The motivation behind the experiment is to reveal
96' :5;/ b/w f u n c t i o n : and begin to solve the technical problems of such general net-
works, while also discovering which of the several advantages of
code trans Za- using networks listed earlier (or others unmentioned) emerge as
Pon J important.
4 L i n k ; communications l i n e s
'The Specific links, sites, etc., change with time; thus the actual structures
we present are, by the nature of the experiment, almost guaranteed to be
Fig. 6d. ComLogNet N("3witching Center/SC) PMS diagram. in error.
Chapter 40 I Computer-network examples 511
C('Dartmouth Colleqe)
N('U Illinois)
Santa Barbara
Fig. 7a. Advanced Research Projects Agency (ARPA) network P M S diagram (tentative).
S('Mojave, California)'
I
X(#l !312
Fig. 7d. Advanced Research Projects Agency (ARPA) fixed switching centers PMS diagrams (tentative).
512 Part 5 I The PMS level Section 4 I Network computers and computer networks
for a local computer and local network cases, respectively. The in the other C's that are at a site and no control over their
-
C('1MP) is a C('Honeywell516; 16 b/w; 12 16 kw; 1 p / w ) with operation.
capability to connect to four to six links at a 5O-kb/s data rate.
The ARPA network leases a set of fixed links, L(50 kb/s). Conclusions
These emanate from four Sfixed, as shown in Fig. 7d. Thus the We feel the network is the most important computer structure
fixed links between the various sites, as shown in Fig. 7a, are in the book. Through understanding it, we will be able to organize
composed of the links in Fig. 7d. For example, the L(Carnegie- more computing power than with any other structure and to
Mellon University; Bolt Beranak and Newman) goes from Carnegie- achieve more reliability. The issues of switches and links are so
Mellon University in Pittsburgh, Pa., to Williamstown, Ky., to vital that through understanding of them all computer structures
Littleton, Mass. (on one of the two links) to Bolt Beranak and will improve.
Newman in Boston, Mass. The other L(Litt1eton; Williamstown) is
part of L(University of Michigan; Lincoln Laboratory). With such References
a fixed-link system the network must operate in a store-and-forward BhusA67; DaviD67; EverR57; PlugW61; RobeL67; SegaR61; IBM
fashion, with C('1MP)'s at each site carrying out this function. Thus Systern/360 Attached Support Processor (ASP) System Description,
the C('1MP) is required at each site, since there is no uniformity H20-0223-0
Part 6
Computer families
The three groups or families of computers described in this part are each built around
a single ISP and PMS structure. The IBM 701-7094 I1 sequence (Sec. 1) shows the
evolution of a series. The reader can trace a number of incremental changes, or
features, such as the addition of index registers, indirect addressing, 1/0 processors,
and larger random-access memories. The SDS 900-9000 series and the IBM Sys-
tern1360 are both families in which successor models are within a planned frame-
work; evolution occurs mainly in the implementations, not in the ISP.
513
Section 1
The IBM 701-7094 II sequence,
a family by evolution
The IBM 701, 704, 709, 7090, 7040, 7044, 7094 I, and 7094
II sequence relationship is shown in Fig. 1. The group is not -
a compatible series. The IBM 701 [Astrahan and Rochester,
1952; Buchholz, 19531 is a forerunner of the series; all except
the 701 are painfully compatible. The sequence is included
because the 7090 is a reference or benchmark of scientific-
computer power. All machines use 36-bit words. The 701 stores
two instructions/word in the same manner as the IAS computer
(Chap. 4), whereas all others in the sequence store only one
instruction/word. The 701, 704, and 709 are first-generation,
vacuum-tube technology; the rest are second-generation.
b/char
The IBM 7094 II description given in Chap. 41 is based
directly on information in the Programming Reference Manual,
'Mp(e1ectrostatic: random; 24 p / w ; 2048 w: 36 b/w)
but the Appendices of that chapter give the ISP of the Pc, a
" ~ c ( 2i n s t r u c t i o n s / w ; M.processor s t a t e ( - 3 w ) : I address/
Pio, and a K as inferred by the authors of this book. The i n s t r u c t i o n ; 36 b/w; technology: vacuum tubes: descendants:
description of the Pc gives the instructions in the 704 and 7044 IBM 704, IBM 709: 1953- 1956)
515
516 Part 6 1 Computer families Section 1 1 The IBM 701-7094 II sequence, a family by evolution
to achieve compatibility is inexpensive when the system price storage. Thus a user has to preserve this register when double-
is considered. Also, the incremental changes in the ISP do little precision floating-point instructions are given. The reason for
to increase the Pc performance. Compared with the 704, the this undoubtedly relates to field modifications and cost. In an
extensive order code of the 7094 shows an evolution in which original design this would be inexcusable; in this case double-
for marketing, emotional, or analytic reasons new instructions precision floating point is undoubtedly worth the loss of sense
were added. The index registers and their instructions are a indicators.
good example of this trend. The 7094 has a very general set All in all, the designers of the 704-7094 II provided increased
of index-register transmission instructions; if implemented generality through evolution. They gradually ran out of patching
properly, they are probably easier to provide than the original time, technology, instruction encoding space, and memory
704 instructions. addressing bits, while exceeding compatibility constraints. It
In the implementation of the double-precision floating-point was indeed time to create the IBM System/360.
hardware, the sense-indicator register is needed for temporary
Chapter 41
517
518 Part 6 1 Computer families Section 1 1 The IBM 701-7094 II sequence, a family by evolution
-
. K-Sfx-Ms # 0 : 9 : '729 I - V I . magnetic
tape: 751112 i n / s : 2400 f t ;
X-Sfx
-
200, 556, ROO b y / i n : 6 b/by
7 '716: l i n e ; p r i n t e r : 72/120
l- char/line;
symbol/char;
150 In/min:
6 b/char
64
.K
' T.console -
,
I
I
I
I
I
I Sfx-Ms (#0:9; '7340 Hypertape) -
I T.console -
-
u
L ( t o : P i 0 (#4: 8 ) )
K ( # l : 6)-Sf x-K-T(rUO.9)b.
Console
+
Instruction I Arithmetic
Core Storage
+ Processing equence
Unit I Unit
t
(Central Processing Unit)
- - - - -- - -
7909 Data Channel
--
0 (channel switch) 0 (channel switch)
File
Reader Tape Units Control Synchronizer
Control
Telegraph
I
1/0 Units
Fig. 2. IBM 7094 data-processing system configuration. (Courtesy of International Business Machines Corporation.)
Processor registers and mode bits registers conditions, P and Q). The AC is used to hold one factor during
Figure 3 gives the Pc registers and the data transfer paths. Both arithmetic or logical operations and to receive results from the
the ISP registers (denoted by ") and the temporary registers are adders.
given. The ISP registers and modes are controlled by the program. Information may be shifted into the accumulator from the MQ,
1 bit at a time.
Instruction counter (IC)".The Instruction Counter, IC, is 15 bits.
It is used by the processor to locate the next instruction in Mp. Multiplier-quotient regi.ster ( M Q ) " . The MQ Register is 36 bits.
Once the program is started, the IC can be set to an address During a multiply instruction, MQ contains the multiplier; during
specified by a transfer instruction. For most instructions, the IC a divide instruction, MQ receives the quotient. It can be shifted
is stepped sequentially by 1 with each new instruction. The IC right or left, independently, or combined with AC into a 72-bit
is normally advanced at the end of each instruction (I cycle). register.
Sense indicator register ( S I ) " . The Sense Indicator Register, SI, is 36
Instruction backup register (IBR).The Instruction Backup Register, bits. SI is normally used as a set of binary program switches which
IBR, is a 36-bit register, (S, 1:35), and is used to buffer the next can be set and tested. However, it is also used as a temporary register
instruction. Pc attempts to have the next instruction available in in double-precision arithmetic operations.
IBR, since the Mp permits 72-bit transfers, thus avoiding an
unnecessary reference to Mp. When the instruction reference is Indexregisters (XR)".Seven 15-bit Index Registers, XRs, in the 7094
to an even location, the IBR is loaded with the contents of the system are used for address modification. They are specified by the
next higher odd address after the contents of the even address have tag bits of an instruction (bits (18:2O)) and modify an address by
been placed in the Storage Register. The IBR is also used for adding the two's complement of their contents to the address. In the
fetching operands in double-precision operations. earlier 7090 (and 7044) only XR[l, 2, 41 are available.
Address register ( A R ) .The Address Register, AR, is 15 bits and re- Multiple tag mode". In Multiple Tag Mode only Index Registers
ceives information from the Storage Register, Instruction Backup 1, 2, and 4 can be specified. The indexing function specified is
Register (at the beginning of a storage reference I or E cycle), determined by the "logical-or'' of each index register specified.
Index Register, and Index Adder. The contents of the AR are When not in Multiple Tag Mode, each 3-bit number selects one
sent to the Multiplexor Address Switch to select the core mem- of seven index registers. The 1-bit Multiple-Tag-Mode Register
ory location. maintains the state of the mode. The requirement for the two
modes comes entirely from the need to maintain compatibility
Instruction register (IR). The 18-bit Instruction Register, IR, is between the 704, 709, 7090, 7040, and 7044 (which have three
divided into two parts: bits (S, 1:9) always contain the operation index registers addressed as in Multiple Tag Mode) and the 7094
part of the instruction, and bits (10:17) form the Shift-counter I and 7094 I1 which have seven index registers.
Register. The Shift Counter is used during shifting, multiplication,
division, and floating-point instructions. Bits (10:17) may also Tag register (TR). This temporary register holds the tag field of
contain a sense instruction address, operation codes for those the instruction being executed and is used to select the Index
instructions which require an address part, and the class and unit Register being addressed.
codes for input/output instructions.
Index adders (XAD) (not a register). A separate 15-position Index
Storage register (SR).The 36-bit Storage Register, SR, stores infor- Adder is used for the Index-register operations. All storing, load-
mation that comes from or goes to core storage. ing, changing, and modifying of Index Registers is via the Index
Adders.
Adders (not a register). The Adders furnish a 36-bit path for data
going from the storage register to other registers in the processor. Accumulator overflow*. The Accumulator Overflow Indicator is
turned on whenever a 1 passes into or through position P from
Accumulator register (AC)".The Accumulator Register, AC, is 38 position 1 of the AC as a result of the execution of a fixed-point
bits (a 35-bit word with a 1-bit sign, and 2 bits for overflow arithmetic or a shifting instruction.
Chapter 4 1 1 The IBM 7094 411 521
IC I MULTIPLEXOR
I
I
I
1
L
I I
I I I
Opemtion Sense
Decode
c
L
CPU
* I
I
I
0-35
I
I/ I Adders
I 0-
Sense Indicators
17118- 35
-
I
I
I
.
odd 5.1-35
II Multiplexor Odd Core
I Addresses
Address Switch 1
I
I 3 - 17
I
I
I
T* '
I Even Cwe Even 5.1-35
. Addresses
I 1-
CORE STORAGE
MULTIPLEXOR
Fig. 3. IBM 7094 central-processingunit information flow. (Courtesy of International Business Machines Corporation.)
522 Part 6 I Computer families Section 1 I The IBM 701-7094 I I sequence, a family by evolution
Dioide-check". The Divide-Check Indicator is turned on, in fixed- The operation portion of the Storage Register goes into the In-
point or floating-point division, if the magnitude of the number struction Register, where the operation code is decoded and the
in the AC (dividend) is greater than or equal to the magnitude execute control circuitry is set up to perform the operation
of the number in memory (divisor). specified by the instruction. The address portion of the instruc-
tion word, now located in the Storage Register, may be used
Input-output check". The Input-Output Check Indicator (1-0 directly. Normally, however, it goes to the Address Register and
check) is turned on by the attempted execution of an input/output then to the Multiplexor Address Switch to locate the appropriate
instruction without first selecting an input/output unit. data word in Mp. If the address is to be modified, it is routed
from the Storage Register to the Index Adders for Index-register
Transfer trap mode". The computer can be operated in a special modification. The modified address is then brought to the Address
Transfer Trap Mode. Operation in the Trap Mode permits the Register and on to the Multiplexor Address Switch to locate the
program to run at normal speed with interruptions of normal data word in core storage.
operation only at transfer points. At such points the location of Concurrently, during the same instruction cycle, a second
the last sequential instruction is saved, and a transfer of control instruction, located at the immediately higher odd-numbered Mp
is made to a fixed location. address location, is broiight to the Instruction Backup Register/
IBR. While in the IBR, the odd-numbered instruction is partially
Sense switches". Six Sense Switches are located on the console. decoded to determine if it meets certain criteria for concurrent
They may be turned on or off manually, and there are instructions execution, thus saving a second Mp reference. If the instruction
which sense them. in the IBR cannot be executed with the current instruction, it is
ignored in the current I cycle and is brought into the Storage
Sense lights". Four Sense Lights are also on the console. Any one Register on the next I cycle.
of these lights may be turned on, off, or the status tested by
instructions.
Execution cycle ( E ) .The execution (E)cycle is used when a reference
to core storage is needed. All instructions requiring an operand have
€c'ine1 in-out sicitclzes". These 36 switches on the console may be
an E cycle following the I cycle.
read by an instruction.
Indirect addressing of an instruction requires an extra E cycle.
Instruction-set interpretation In other words, an instruction that normally goes from I to E to
b e executed will go to I, E, and again to E if it is indirectly
The basic computer clock cycle is 2.0 p in 7094 I and 1.4 ps in
addressed.
7094 11, as dictated by Mp. Within the single 2- (or 1.4-) micro-
second cycle, up to 10 sequential register transfers and/or data
operations can take place, each of which transfers information Logic cycle ( L ) . The L cycle is an execute cycle that does not
among the Pc's registers; several operations may occur simulta- require a reference to Mp. Many instructions use both E and L
neously. In Pc four different cycles are used: instruction/I, exe- cycles when information is required from storage and the instruc-
cute/E, logic/I,, and bnffer/B. The cyclic sequence of an instruc- tion cannot be completed during an E cycle. Other instructions
tion is fixed, always beginning with an I cycle and progressing to E, require no reference to storage and, therefore, use only I and L
L, or B cycles, depending on the instruction. The number of cycles cycles for their completion.
requiredfor an instruction may vary from I (e.g.,transfer) to 19 (e.g.,
double-precision floating-point divide). Bufer cycle ( B ) . A buffer (B) cycle is a null Pc cycle; it is used
when the data channels get information from or put information
Instruction cycle (I).The I cycle begins when IC furnishes the into core storage. This information can be either data or data-
instruction location to Mp, via S('Multip1exor). The addressed channel commands. All demands for B cycles come from the
instruction word taken from Mp goes to the Multiplexor Storage channels themselves. Because of the nature of Ms's and T's, the
Bus (Fig. 3 ) . From the Multiplexor Storage Bus the instruction demand for a B cycle takes precedence over an instruction being
is read into the Storage Register where it is separated into the performed by Pc. If Pc is in its logic cycle, then both an L and
operation portion and the address portion of the instruction word, B cycle occur simultaneously.
Chapter 41 1 The IBM 7094 1,Il 523
Instruction interpretution. Instruction flow diagrams for the CLA, Operations on AC and MQ
CAL, and CLS instructions are given in Fig. 4. These diagrams Mps t u Mps
show the sequential process of instruction execution. Although the Mps t u Mp
flow diagrams for these instructions are trivial, the general process Mps t Mps b Mp
is still apparent. The more complex instructions, for example, dou- Operations on the index registers
ble-precision floating-point divide, are carried out in a similar
Operations on the sense indicators
fashion, but with many more operations. The registers, transfer
paths, and interregister data operations are the register-trans- Instruction for program control
fer-level primitives from which the ISP is implemented. The data
Memory mapping for multiprogramming and Mp(65536 w )
flow diagram (Fig. 3) explicitly defines the main registers and
register operations within Pc. A special option provides multiprogramming by allowing a pro-
gram to run in a protected area of Mp. Two registers are used:
Pc ZSP The base register establishes the lower bound of the program, and
The Pc Instruction-set Processor is given in Appendix 1 of this the length register establishes the upper bound. Pc checks that
chapter. The instructions are arranged in groups according to the all program references are within the protected area.
location of operands. These groups are: Two Mp(32678 w)'s can be used on the computer. Mp is then
considered as A core and B core for addresses 0:32767 and
Operations on Mp 32768:65535. A 1-bit register is used to select whether A or B core
M p t u Mp (unary operutionlu o n M p ) is to be used for data; and one 1-bit register is used to select
M p t u Mps (unary operation on Mprocessor whether A or B core is to be used for the instruction. These
state/ Mps) modifications were used at M.I.T. in their Compatible Time Shar-
Mp + Mp b Mps (binary operationlb) ing System/CTSS [Corbato et al., 19621 which used a 7094 11.
6 At the termination of the task, the completion signal from in ISP descriptions (Appendices 2, 3 and 4 of this chapter). The
Pi0 causes Pc to interrupt and Pi0 may also halt. main registers of Pi0 are shown in Fig. 5 . These registers are
declared and their function is explained in the first section of the
Pio('IBM 7909 Data Channel) ISP description of Pi0 (Appendix 2). The remainder of the ISP
description is concerned with defining the interpreter and the ISP
Ms('1301 Disk Storage, '7340 Hypertape Drives) and the T('Te1e-
instruction set.
Processing equipment) communicate with M p via the Pio('7909
There are about 50 bits in the K's (see Appendix 3). A knowl-
Data Channel). Four 7909 Data Channels may be attached to a
edge of K's state and the K process is required for understanding
7094 I or I1 system.
the Pio. A description of the K and Pi0 data-transmission processes
K('7631 File Control) is required for M(disks). Several K('7631)
is given in Appendix 2.
can be used with the 7094 system alone or shared with an IBM
The Pc instructions controlling Pi0 are presented in Ap-
1410 system or shared with another IBM 7000 series (not 7072
pendix 4.
system).
When Ms('7340 Hypertape Drives) are attached to the 7094 The level of detail in the appendices is slightly greater than
that in normal ISP description. It is, however, not completely
system, K('7640 Hypertape Control) is used between the 7909 data
channel and the drives. One K('7640) may be attached to a 7094 precise, as the behavior is extremely time- and Ms- or T-depend-
system; it has two paths, each of which can be used for data ent. The sequence check conditions are incomplete; that is, the
transmission.
The K('1416-6 Input-Output Synchronizer) is used with T('Te1e-
,.
processing Equipment)'s. The structure for these T's is rather
Mp(core)
elaborate, yet only six T's can be active at a time.
3'7606 Mlultiplexor)
Transferring data from Mp to a T or an Ms via the 7909 takes
place as follows:
(36 datal (15 address)
Pc sets up the data-transfer management program in Mp for Storage Channel
address
a Pio. switches switches
Pc starts Pi0 by setting Pio's command (instruction) location
counter at the origin of the task program in Mp. (Faults in (1 5 ) (151 (15)
the connection may cause Pi0 interrupts to Pc.) Operation Word Address
conditions for illegal instruction sequences are not given. Both ISP nicely as a Pc instruction set. The T or Ms events occur at times
and text descriptions are given for parts which are particularly peculiar to the device-not a simple synchronous clock. Finally, the
complex. peripheral components have a large number of error states.
The ISP description should be observed in the following se-
quence: Pi0 State; K State (Appendix 3); Pi0 Instruction Format; Conclusions
Pi0 Interpreter; Pi0 Instruction-Control (or Initialization) in- The series ending with the IBM 7094 I1 is a significant member of
structions, Block Transfer (or Copy) instructions, Conventional the computer population. It provides a good example of the evolii-
Move and Transfer instructions, and Interrupt Control instructions; tion in computer systems that occurred from 1954 to 1!)65.
Instructions in Pc (Appendix 4);Interrupt Operation; and Proc-
esses definingdata movementsbetween K and Pio(Appendix2).The References
Pio, K, and Ms or T processes are, in several ways, more complex CorhF62; FrizC53; GreeJ57; CrumM58; RossH53; SaxoJH3;StevL52; A22-
than those of a Pc. First, Ms or T activity is not categorized as 6703 IBM 7094 Principles of Operation
526 Pari 6 I Computer families Section 1 I The IBM 701-7094 II sequence, a family by evolution
Appendix I
I B M 7094 Pc I S P D e s c r i o t i o n
Pc S t a t e
The d e s c r i p t i o n does n o t include t h e imo p r o t e c t i o n and r e l o c a t i o n schemes used f o r t h e 7040 and 7094. The Trm-Mode . f l i p - f l o p
i s declared; i t s a c t i o n i s n o t described. T r a p d o d e allovs anu change o,f t h e T n s t r u c t i o n Counter t o cause a t r a p . The Tnstruc-
t i o n Backup R e g i s t e r i s n o t described, although it i s used t o save time i n program execution. The d e s c r i p t i o n of t h e a r i t h m e t i c
f u n c t i o n s i s highly s i m p l i f i e d .
AC<a,P ,S, 1 : 35> * Accumulator, 38 b i t s
ACsIS.1 :35> := AC4,1:35> * signed AC &lord
AC\6,1:35> := ACB.1:35> A l o g i c a l AC word
P := A C 6 > * carry .for A C d : 3 5 > : AC overflow i s a l s o s e t
Q := ACQ> * carry f o r b i t s @ , l : 3 5 >
5 := AC<S> * s i g n b i t of AC
MQ<S, I :35> * Multiplier-Quotient
ACMQ<S,Q,P,l :71>:= A C d l Q < l :35> * i?ouble uord accwrtulator
S la:35> Sense Indicators or pr>ogramf l a g s must be preserved i f
double p r e c i s i o n f l o a t i n g p o i n t i n s t r u c t i o n s are given.
X R ' [ 1 :7]<3! Ih i n d e x R e g i s t e r s i n 7094
XR"[A,B,C]<3: ID := X R [ 1 , 2 , 4 ] 8 r ID * Index R e g i s t e r s f o r 704, 7090
Multiple,Tag,Mode Drogram switch t o f o r c e compatibi1it.u w i t h 7 0 4 , 7090; o n l y
3 index r e g i s t e r s XR[A,B,CI are i n 704, 7090
I c<3: l7> * I n s t r u c t i o n Location Counter
Run * i n d i c a t e s whether machine is executing i n s t r u c t i o n s
DivideJheck *
A C d v e r f low *
MQuoverf low *
I n put Jlu t pu t ,chec k *
Trap,request<A:H> Request t o trap P c :porn P i 0 # A , . ,# H
TrapJode 4 A l l o ~ l strapninq o r n o t o f t r a n s f e r i n s t r u c t i o n s ( n o t
descrihed I
PC Console S t a t e
byKol3D
Sens%Switches<O:P
Sense,L igh t s<O : P
Mp S t a t e
M[O:32768-l]<S, I :3V
I n s t r u c t i o n Format
instruction<S,1:35> corresponds t o t h e physical Storage R e p i s t e r
Y<21 :35> := i n s t r u c t i o n Q l :35> generallu t h e address o a r t : .*sed t o c a l c u l a t e t h e e f , f e c -
t i v e address: corresponds t o t h e physical Address R e g i s t e r
~ < 1 8 : 2 @ := i n s t r u c t i on<l8: 20>
.
t h e XR t o use: I , . .7; 0 means no indexing; corresponds t o
a physical r e g i s t e r
F d 2: I3> := i n s t r u c t ion<l2: 13> i n d i r e c t address s o e c i f i c a t i m
i n d i r e c t := (F<12:13> = 1 1 )
op<S,1;11> i= instruction<S,l111> OD code; corresponds t c a oh!isical r e g i s t e r
hi,opd:2> != i n s t r u c t i o n < S , l .2> s p e c i a l o p coi'es
Data Formats
SI 4.1: 35> l o g i c a l data; unsigned i n t e g e r / b o o l e a n v e c t o r
sxcs, 1 : 35> s i n g l e p r e c i s i o n f i x e d p o i n t ( i n t e g e r ) data
5x s i g n := s x 6 >
s x magnitude<l:35> := sx<1!35>
s f 4 , I : 35> s i n g l e v r e c i s i o n f l o a t i n g p o i n t value o f : sfusigmsf,mantissa
x2sf,exponent
5 f s i q n := s f d >
s f exponent<l:8> := 200
8 - 5f<1:8>
s f mant i 55a<D:26>:= sf6:35>
df[O:ll<S,1:35> double v r e c i s i o n f l o a t i n g p o i n t value of: df,sigmdf,mantissa
df s i q n := df[O]<S>
X2d,fue~onent
df exponent<l:8> := ZOO8 - df[O]<I:R>
d f mantissad):53>:= df [ 0 : 1 ] 6 : 3 5 >
I n s t r u c t i o n i n t e r p r e t a t i o n Process
Run i (instruction tMCIC1: IC t l C + l ; next ,fetch
instruction,izxecution) execute
I n s t r u c t i o n S e t and I n s t r u c t i o n Ezecution Process
STZ ( : = op =
-
I n s t r u c t i o n g x e c u t i on := (
O p e r a t i o w on M: M[e] f; o r MLe]
600) i M [ e ] -0;
- fIM[e]l;
* s t o r e zero
MSP ( : = (op = -1623) A ( c = 7 ) ) +M[e]6> to): make s i g n n o s i t i v e ; 704 s e r i e s onip
MSM ( : = (op = -1623) A (c = 6 ) ) +M[e]<5> +I); make s i g n minus; 704 s e r i e s only
Block t r a n s f e r of data, ? 4 - ?4 1704 series o n l y )
TMT ( : = op = -1704) + (M[ACVI:35>:.(ACQI:35> + e'Q8:35>)] t
o r ACQVQ w i t h AC,MQ,ACMQ,
x 6+5)> - AC<30:3p);
Keys and M operands.
CLM (:= (op = 760) A ( e ' = 0 ) ) +(AC4l,P,l:35>
S S P (:= (op = 760) A ( e ' = 3 ) l
SSM (:= (op = -760) A (e' = 3))
-4
CARS>
(Ac<S> - 1 ) ;
- 0);
eo); C l e m wgnitude
*
*
s e t s i g n PLUS
s e t s i g n minus
C L A ( : = OP = 500) + (AC
CAL ( : = op = -500)
C L S (:= op = 502) i
- eo;
(AC - 0 ;
next ACs +AC+M[el);
n e x t A C I +ACI+M[el);
FOP ( : = op - D i v i d e u c h e c k +Run
241) + (AC,MQ e A C / M [ e l { s f ] ) ;
eo);
* d i v i d e o r proceed
UFA ( : = op -
Unnormalized s i n g l e p r e c i s i o n f l o a t i n g p o i n t
-300) + (AC,MQ c A C + MCel [ s u f ) ) ;
UAM ( : = op = -304) +(AC,MQ c A C + abs(M[el) (suf]);
*
*
add
add magnitude
UFS ( : = op = -302) + (AC,MQ t A C - M [ e l { S u f ) ) ; * subtract
USM (:= op E -306) + (AC,MQ t A C - abs(M[e]) (suf)); * s u b t r a c t magnitude
UFM ( : = op = -260) +(AC,MQ t M Q x M[el [Suf]); * multiply
Double p r e c i s i o n f l o a t i n g p o i n t
~n DF operations, t h e S I are used a s temporary r e g i s t e r s and w i l l be changed.
DFAD ( : = op = 301) + ( "a&
ACMQ c A C M O + M[elOi%e+l 1 {df); SI +?);
*
ACMQ
DFSB (:=
-
DFAM (:= op = 305) + (
OD
ACMQ +
= 303)
abs(MCelCW[e+ll)
+ (
(dfj; SI ?);
*
add magnztude
subtract
ACYO +ACMO - M[eIQl[e+i 1 (df1; S I +?);
DFSM (:= OD 307) + ( * s u b t r a c t magnitude
ACMO eACMQ - abs(MEeIM[e+l 1) {df 1: SI + 7 ) ;
530 Part 6 I Computer families Section 1 1 The IBM 701-7094 II sequence, a family by evolution
Unnormalized double p r e c i s i o n f l o a t i n g p o i n t
DUFA ( : = o p = -301) +( * add
A C M Q +ACMQ + M[e]nM[e+l] [duf]; S I e?);
DUAM (:= op = -305) i ( add magnitude
ACMQ t ACMQ + abs(M[elOM[e+l 1 I { u n d f 1; SI t 1 );
DUFS (:= o p = -303) + ( * subtract
ACMQ t ACMQ - M [ e ] d l [ e + l ] (duf I ; S I e ? )
;
DUSM (:= op = -307) i ( s u b t r a c t magnitude
ACMQ -ACMQ - abs(M[e]dl[e+l]){duf1; SI t?);
Logical
ORA ( : = o p = -501) - ( A C l +ACl V MCel); * o r t o accwnulator
ANA (:= o p = -320) + ( A C l +ACl A Mrel); * and t o accumulator
ERA ( : = op = 3 2 2 ) i (ACI t A C I @ M [ e l ) ; * e x c l u s i v e or t o accumuZator
Transmission t o Sense I n d i c a t o r s
PA1 (:= op = 44) + ( S I +ACl); place accumulator i n i n d i c a t o r s
LDI (:= op = 441) + ( S I +M[el); load i n d i c a t o r s
O A l ( : = op = 43) + ( S I +SI VACI); o r accumulator t o i n d i c a t o r s
R I A (:= op = -42) +(SI +SI A YAC1); n e s e t i n d i c a t o r s from accumulator
I I A ( : = op = 41) +(SI +SI 62ACl); i n v e r t i n d i c a t o r s from accumulator
O S I ( : = op = 442) +(SI +SI V MCeI); o r storage t o i n d i c a t o r s
R I S ( : = op = 445) + ( S I +SI A ~M[el); r e s e t i n d i c a t o r s from storage
IIS ( : = op = 440) + ( S I +SI @M[eI); i n v e r t i n d i c a t o r s from storage
SIL (:= op = -55) + (SIal:17> t S I a l : 1 7 > V R); s e t i n d i c a t o r s of l e f t h a l f
RIL ( : = op = -57) + (Slal:l7> t S I a l : 1 7 > A 1R); r e s e t i n d i c a t o r s of l e f t h a l f
I I L ( : = op = - 5 1 ) + (Slal:l7> c-SIal:17> 62 R ) ; i n v e r t i n d i c a t o r s of l e f t h a l f
S I R ( : = op = 55) + (Sl<l8:35> tSI<I8:35> V R); s e t i n d i c a t o r s of r i g h t h a l f
R I R (:= op = 57) + (SIiI8:35> tSI<18:35> A i R ) ; r e s e t i n d i c a t o r s of r i g h t h a l f
I I R (:= op = 51) 4 (S1<18:35> cSl<l8:35> CdR); i n v e r t i n d i c a t o r s of r i g h t h a l f
Program flow control i n s t r u c t i o n s
NOP (:= op = 761) + ; no operation
HPR ( : = op = 420) + (Run to); * h a l t and proceed
HTR ( : = op = 0) + (Run +O; I C +e); * h a l t and t r a n s f e r
TRA ( : = op = 20) --f (IC +e); * transfer
X E C ( : = op = 522) + ( i n s t r u c t i o n +M[el; next execute
Instructionsxecution);
Conditional t r a n s f e r s
T Z E (:= op = 100) + ( ( A c < ~ , P , I : ~ s > = 0) + I C te); * t r a n s f e r on zero
TNZ ( : = op = -100) + ( T (AC<a,P,1:35>= 0) + I C +e); * t r a n s f e r on no zero
TPL ( : = OP = 120) + (-, AC<S> + I C +e); * t r a n s f e r on p l u s
T M I ( : = op = -120) --f (AC<S> + I C +e); * t r a n s f e r on minus
TOV ( : = op = 140) + (AC&verflow +IC t e ; * transfer on overflow
AC,overf low t 0) ;
TNO (:= op = -140) + l( ACdverflow -f I C +e; * t r a n s f e r on no overflow
AC,overf low t 0) ;
TQP (:= op = 162) + l( MQ<S> +IC +e); * t r a n s f e r on MQ p l u s
TQO (:= op = 161) + (MQ,overflow +IC +e; * t r a n s f e r on M Q overflow
tieoverflow t o ) ;
TLQ ( : = op = 40) + ((AC > MQ) + I C +e); * t r a n s f e r on low MQ
T I 0 ( : = op = 42) 4 ((ACI = (ACl A S I ) ) + I C +e); * t r a n s f e r when i n d i c a t o r s on
T I F (:= op = 46) --f ( ( 0 = (ACI A S I ) ) + I C +e); * t r a n s f e r when i n d i c a t o r s o f f
Index manipulation and c o n t r o l and subroutine c a l l i n g
TSX (:= op = 74) 4 (XR[T] +Zi5 - IC; I C CY); * t r a n s f e r and s e t index
TSL (:= op = -1627) + (M[e]<21:35> +IC; IC + e + 1); * 704
Loop c o n t r o l
TXI ( : = hi,op = I ) + (XR[T] +XR[Tl + 0; IC CY); * t r a n s f e r w i t h index incremented
TXH ( : = hi,op = 3) + ((D < XRCT]) + I C +Y); .+ t r a n s f e r on index high
532 Part 6 I Computer families Section 1 1 The IBM 701-7094 II sequence, a family by evolution
< 6)) + (
M[e]<S>
+ IC
- c IC +
IC t
I); storage minus t e s t ; 704 s e r i e s Oniy
I C + I ) ; storage p l u s t e s t ; 7 0 4 s e r i e s onlu
compare
(AC<30:35> = M[ej<(c x 6 ) : (c x 6 + 5)>) + IC t IC + I ; character w i t h storage; 704 series o n l y
(AC<30:35> < M[e]<(c x 6 ) : ( c x 6 + 5)>) + I C + IC + 2));
PBT (:= (op = -760) A (e" = I ) ) i(AC<P> + IC t IC + I ) ; * P bit test
DCT ( : = (op = +760) A (e" = 1 2 ) ) + (Divide-check + IC IC+ I).* D i v i d e j h e c k t e s t
--
t
t
IC
IC
+ I);
+ I); *
*
* storage zero t e s t
storage own zero t e s t
compare AC w i t h storage
(ACs = MCel) iIC c IC + 1;
(ACs
LA8 (:= op
<
= -340) -
M[eI) + I C c I C
(
+ 2);
* l o g i c a l compare AC w i t h storage
(AC<Q,P,l:39
(AC<Q,P,l:39
SWT ( : = (op = 760) A
= M[el<S,1:3P)
< M[el<S,1:39)
(e1<9:lW = 16))
-
+ (IC c IC
(IC t
+ (
IC
+ 1):
+ 2));
Sense-Switches test
Sens~Switches<e'<l5:I;n> iI C c I C + I ) ;
SLF (:= (op = 760) A
SLN (:= (op = 760) A
SenscLightxe'<i5:ib>
( e ' = 140))
I);
i(S ensLLightX0:b c 0);
( e ' < g : l b = 14) A ( e ' < i 5 : l P # 0 ) ) - (
S e n s e - l i g h t s off
S e n s e - t i g h t s on
--
LTM ( : = (op = -760) A ( e ' = 7)) i (Trap-Mode t 0); leave TrapJdode
EMTM ( : = (op = -760) A ( e ' = 1 6 ) ) + (Multiple-Tag-Mode I); e n t e r MuZtipZe&zg-fdode
LMTM (:= (op = 760) A ( e ' = 1 6 ) ) + (Hultiple,Tag,Mode 0); Zeave M u l t i p l e 2 a g d o d e
1 end I n s t r u c t i m & x e c u t i o n
Chapter 41 1 The IBM 7094 1,Il 533
Appendix 2
Although t h e f o l l o w i n g d e s c r i p t i o n i s o f a Pio, s i g n a l s generated i n D c , '4, and K are necessaru. Ppnendices 1 , 3, and 4 are
a l s o necessaru f o r a complete d e s c r i p t i o n . The Ms attached t o K c o n t r o l s t h e p r e c i s e time information f l o w s .
P i 0 State
CC<21:35> Cornnand Counter: 15 b i t command ( o r i n s t r u c t i o n ) counter
containing t h e l o c a t i o n c,f t h e n e x t command
AC<21:35> Address Counter: during v e c t o r data t r a n s f e r s A C contains
t h e address o f t h e n e s t data word t o t r a n s f e r . Durina a
t r a n s f e r comand A C i s s e t t o t h e address 01- t h e n e s t command
AR<S,I:35> Assemblu R e g i s t e r : a buf,fer ,for 4ata f l o w behjeen t h e data
r e g i s t e r and t h e device c o n t r o l r e g i s t e r s
ARc[O:5]4:5> := AR<SS,l:35> character arrau defined bu RR; a character is normally
s e l e c t e d ARd PSR]
CTC<O:P Control Counter: a E b i t r e g i s t e r u h i c h can be loaded and
s t o r e d bu t h e ISP
wc<3 : 1 I> Vord Counter: a counter c o n t r o l l i n a t h e number of words l e f t
t o t r a n s f e r during a command
Sequence Jheok
A Sequence-Check i n d i c a t e s an i n v a l i d sequence 0.f channe 1 commands, I f a Seauence,Check occurs during data transmission,
t h e adapter i s l o g i c a l l y disconnected and t h e i n t e r r u p t occu21s when t h e KJnd signal i s received.
The following i n s t r u c t i o n s cause a Sequence-Check and a channel i n t e r r u v t . (The checks ore not described i n t h e IS?
descriotion. i
I . I f a CTLW, CTLR, or SNS is folloided by CTL, CTLW, WTR, TWT, o r SNS.
2 , If an SNS o r CPYP i s followed by an,u command o t h e r than a P Y P , CPYD, TCH, o r TDC.
3. I f a TCH o r TDC followina an SNS or CPYP t r a n s f e r s c o n t r o l t o anu command other than a PI". P Y D . TCH. o r TDC.
ff
~~
4. a CPYP o r CPyb has no; been proverly preceded by a CTLW, CTLR, o r SNS.
K Jnusualdnd
This signal i n d i c a t e s an error c o n d i t i o n recognized bu K . I t causes an { m e d i a t e i n t e r r u p t t o Pio. The s i g n a l may be
determined by sensing t h e K error i n d i c a t i o n .
Attention Conditions
This i s a s i g n a l i n d i c a t i n g a change i n s t a t u s o f t h e attached i n p u t output d e v i c e . For e x q l e , during d i s k operations, an
attention signal i s generated when an access mechanism has comvleted a seek overation. The v a r t i c u l a r access mechanism t h a t
generated t h i s i n d i c a t i o n may be determined .from sense data.
Kdheck
Adapter check (=heck) i n d i c a t e s an error and i s recognizer' bzj t h e 7909, hut does not n e c e s s a r i l u i n d i c a t e a K ma1,function.
The conditions which cause an adapter check are:
1. C i r c u i t f a i l u r e occurs i n t h e ASR or CR.
2 . The character r a t e 0.f the attached I O device ezceeds t h e c a p a b i l i t y of the channel.
3. The adapter ( K ) i s not operational. T h i s i n d i c a t i o n occurs i.fpower i s off on the a d m t e r and an attempt i s made t o
read, w r i t e , c o n t r o l or sense.
Harduare & i t c h e s
These g a t e s route information among t h e r e g i s t e r s on a s e l e c t e d b a s i s , They are not under control of t h e program and are
not r e g i s t e r s .
S t o r a q e Bus S w i t c h e s d , l : 3 5 > These 36 switches (and/or g a t e s ) provide t h e data path t o
and from t h e 7 6 0 6 bdultinlexor f o r data or comand entru i n t o
t h e Pio.
Channel Address S w i t c h e s Q l : 3 D These 35 switches provide the 9 w i t h address i n f o r m a t i o n .
Address i n f o r m a t i o n i s s e l e c t e d from the Address Counter o r
t h e Comand Counter.
C h a r a c t e r SwitchescO:5> These 6 b i t switches enable the character t o be read from
o r w r i t t e n i n t o t h e Pssemblir S e g i s t e r .
Pi0 S t a t e ( n o t i n ISP)
Yardblare r e g i s t e r s not i n ISP but used i n t h e d e s c r i o t i o n and t h e Pi0
OR<O:4> Operation R e g i s t e r . The r e g i s t e r containing t h e o p e r a t i o n
part o.f t h e i n s t r u c t i o n . OR i s made up from i<S,l:3,19>.
DR<S,l i 35> Data R e g i s t e r . A b u f f e r f o r 4ata .fZow between M and the AR.
CR
Character Ring. P r e g i s t e r t o control t h e timing or trans-
mission i n t o AR.
Assemblu Ring. The counter t o c o n t r o l t h e g a t e s t o / f r o m
ASR6
AR from/boK. Data are s e n t t o or received from t h e c o n t r o l ,
Y, one 6 - b i t character a t a time v i a t h e Character ,%itches
under c o n t r o l of A.qRR.
I n s t r u c t i o n .Format
i 4.1: 35> i n s t r u c t i o n : normally I W c a l l s these comands because a Pio
executea them
f := i<18>
indirect
operation code
0p4:4> ;= iC;,1;3,19>
address
ycO:14> ;= iQl135>
cd):14> I= i d t l h
count part
c'<O:Z> := i<315>
mask
rn4:5> := i<12:17>
Chapter 41 1 The IBM 7094 1,Il 535
%I State
MCO:32768-1 la,1 :35> Computer ' s primaru memory
I n s t r u c t i o n T n t e r p r e t a t i o n Process
Inter$upt,request A 7 Wait + ( I n s t r u c t i o n +M[CC]; f e t c h , no i n t e r r u p t
CC + C C + l ; next
Instruct ioLexecution) ; execute, no i n t e r r u n t
Interruptjequest A Interrupturnode + ( i n t e r r u v t process
(M[IL]Q1:35> + CC;M[ILIB:17> cCC;
Interrupturnode tl: next CC c l L + 1 ) ;
P i 0 I n t e r r u p t s and P c Traps
The ? i o i s capable o f having i t s s t o r e d program i n t e r r u p t e d independently o f o t h e r P I S . T h i s operation i s separate and
d i s t i n c t from a data channel t r a p in which ?io i n t e r r u p t s t h e P c . On r e c o g n i t i o n of an i n t e r r u p t c o n d i t i o n t h e P i 0 s t o r e s t h e
c o n t e n t s o f t h e command and address counters i n a f i x e d memor,u l o c a t i o n , IL, and t h e n executes t h e command located i n t h e n e x t
location.
I f t h e 7909 channel i s to be d i v e r t e d from normal command e x e c u t i o n seouence, t h e cornan? i n t h e f i x e d l o c a t i o n must be one
t h a t w i l l change t h e c o n t e n t s of t h e command counter ITCH, LIPT, or s u c c e s s f u l TDC o r TCV). I f t h i s command i s o t h e r than a
s u c c e s s f u l t r a n s f e r , t h e channel executes it and resumes operation a t t h e l o c a t i o n i m e d i a t e l v .following t h e l o c a t i o n where t h e
i n t e r r u p t occurred. I f t h e command a t t h e f i x e d l o c a t i o n i s a WTR or TW?, t h e channel suspends ooeration as described i n t h e
channel command s e c t i o n , b u t t h e command counter contains t h e l o c a t i o n p l u s one o f t h e command r e s p o n s i b l e f o r t h e i n t e r r u p t .
I n t e r r u p t c o n d i t i o n s are s t o r e d i n a s i x - p o s i t i o n r e g i s t e r i n t h e data channel and may be e x m i n e d w i t h t h e TCM command.
Any combination o f i n t e r r u p t c o n d i t i o n s causes an i n t e r r u p t : however, Once i n t e r r u p t e d t h e channel i s placed i n i n t e r r u p t mode
and f u r t h e r a t t e m p t s t o s e t t h e i n t e r r u p t c o n d i t i o n o r t o i n t e r r u p t are i n h i b i t e d . The channel remains i n i n t e r r u p t mode u n t i l
a n L I P or L I P T command i s executed by t h e channel or an RIC i n s t r u c t i o n is executed b y t h e CPU. If a channel i s i n i n t e r r u p t
mode and an RSC i n s t r u c t i o n i s executed by t h e CPU b e f o r e t h e channel executes a LIP o r LIPT command, t h e i n t e r r u p t c o n d i t i o n
r e g i s t e r i s r e s e t b u t t h e channel remains i n i n t e r r u p t mode. A n L I P or LI?? command or a RIC i n s t r u c t i o n i s t h e only program
means a v a i l a b l e t o cause t h e channel t o e x i t from i n t e r r u p t mode and become r e c e p t i v e t o f u r t h e r i n t e r r i m t c o n d i t i o n s .
I n t e r r u p t s are a l s o i n h i b i t e d i f channel t r a p i s in process on t h a t channel. T h i s i n h i b i t i n g n e r s i s t s u n t i l e i t h e r an RSC
or STC i n s t r u c t i o n (depending on whether t h e channel was enabled) i s executed by t h e Pc.
T h i s command, when decoded by a channel n o t prevared t o read or w r i t e , causes a seauence cheik and, t h u s , a channel i n t e r r u n t .
I f tlie channel i s prepared t o read or w r i t e , t h i s c o k a n d causes c words t o be t r a n s m i t t e d between t h e channel and MD, s t a r t i n g
w i t h M [ e ] . Data t r a n s m i s s i o n continues u n t i l c i s reduced t o zero o r a &End s i g n a l is received by t h e channel. I n e i t h e r case,
t h e channel read or w r i t e i n d i c a t o r i s r e s e t . I f , while a rPYD i s being executed a L F n d s i g n a l I s r e c e i v e d b e f o r e t h e count i s
reduced t o zero, t h e channel read or idrite i n d i c a t o r i s r e s e t , and t h e channel o b t a i n s a new comvand from t h e n e x t s e a u e n t i a l
location.
I f t h e n e x t command i s o t h e r than a copg, t h e channel executes t h a t cornand. I f t h e n e x t comanc' < s a copy, t h e channel
i n t e r r u p t s on a program sequence check. The l a s t word transmitted t o s t o r a g e under CPYD c o n t r o l remains i n the assemblii r e g i s t e r
i f a L E n d s i g n a l i s r e c e i v e d be.fore t h e word count reaches z e r o .
I f t h e count f o r t h e CPYD goes t o zero b e f o r e t h e KJnJ s i g n a l i s r e c e i v e d , t h e channel i n i t i a t e s a disconnect b u t does not
g e t t h e n e x t s e q u e n t i a l command u n t i l a L E n i ! o r KJinusuaLEnd s i g n a l i s obtained. I n generaz, when operating under CPYD contro1,the
channel does not o b t a i n t h e n e x t s e a u e n t i a l command u n t i l e i t h e r a K d n d or a LUnusuaLFnd s i g n a l causes an i n t e r r u p t .
) end I n s t r u c t i o n - e x e c u t f o n
Chapter 4 1 I The IBM 7094 1.11 537
1 (WC = 0)
(WC # 0)
-
i
:
CC t C C + 1 ; n e x t
(DR t M [ C C ] :
""[i'C: f[i'(' + W) I t P ' [ A C : (Ar + VC)1
I
M[AC] t D R ; AC c A C + I ; WC t W C - I : n e x t
M,block,move))
"rocess to ?Iqle u n t i l R transmits an end signal
-.
Ksnd-wait := (
(K,End v K,Unusual,End) iK sndjvai t;
(K,End v K,Unusual,End) i;)
Chapter 41 1 The IBM 7094 1,ll 539
Appendix 3
K ( ' H y p e r t a p e ) and K ( d i s k ) I S P D e s c r i p t i o n s
The following sense data b i t s for tape o r i g i n a t e i n Ens and K. These r e g i s t e r s can be read by P i 0 using t h @Pi0 SNS i n s t r u c t i o n s
Some o f t h e b i t s are s e t using t h e CTL, CTLR, o r CTLW i n s t r u c t i o n s ,from Pi0 a s control words
SDT[O:l 1 6 , 1 : 3 5 > sense data .for K('H,upertape)
SDT[Ol<l>/Operator R e q u i r e d := (
SDT[Ol<lj>/Selected D r i v e Not Ready V
SDT[0]<15>/Selected D r i v e Not Loaded V
SDT[01<16>/Selected D r i v e F i l e P r o t e c t e d V
SDT[O ]<I7>/0perat i o n Not S t a r t e d )
SDT[O]q>/Program Check := (
SDT[OI<l9>/lnval i d Order Code v
SDT[OIQl>/Selected D r i v e Busy v
SDT[O]42>/Selected D r i v e a t B e g i n n i n g of Tape v
SDT[O143>/Selected D r i v e a t End o f Tape)
SDT[O]<O/Data Check := (
SDT[OlQ5>/Correct i o n Occurred v
SDT[Ok27>/Channel P a r i t y Check V
SDT[OlQ8>/Code Check V
SDT[OIQ9>/Envelope Check v
SOT[O1Ql>/Overrun o r C h a r a c t e r L o s t Check v
S D T [ O ] ~ 3 > / E x c e s s i v e Skew Check v
S D T [ O l ( 3 b / T r a c k S t a r t Check o r C l o c k L o s t Check)
SDT[D k 5 > / Except i o n C o n d i t i o n s := (
SDTll l < l > / S e l e c t e d D r i v e Read a Tape Mark V
S D T [ I l Q > / S e l e c t e d D r i v e i n End o f Tape Warning Area)
SDT[01<7,9:11>/Selected Tape U n i t Address O : 3
SDT[I 1<7>/Read S e c t i o n Busy
SDT[I ]->/Write S e c t i o n Busy
SDT[I ]<I I N B a c k w a r d Mode
SDTCl 1<13,15: 17,19,21 :23,25,27>/Drive Attention[O:91
SDF[O:I I6,i :35> Sense data d'or t h e K ( 'Disk)
SDF[OlO>/Program Check := (
S D F [ O l q > / I n v a l i d Sequence V
SDF[Ol<S>/lnval i d Code V
SDF[O]<IO>/Forrnat Check v
SDF[Ol<I I>/No Record Found
SDF[0]<13>/lnval i d Address)
SDF[O]<b/Data Check := (
SDF[OI<lp/Response Check V
SOF[O]<lb/Data Compare Check V
S D F [ O l < I P / P a r i t y o r C y c l i c Code)
S D F [ O l < p / E x c e p t i o n C o n d i t i o n := (
SDF[O1<1F'/Access Inoperative V
SDF[01<21>/Access Not Ready V
SDF[01<2D/Disk C i r c u i t Check V
SDF[O]<2j>/Fi l e C i r c u i t Check
SDF[O1<7>/six B i t Mode/Status B i t
SDF[ Ol< 31,33: 3poSDF[ I]< 1,3 :5,7,9>/Access 0, Module[O : 9 ]
540 Part 6 I Computer families Section 1 I The IBM 701-7094 II sequence, a family by evolution
Control Orders, i . e
I n s t r u c t i o n Names and Nwibers f o r X i d i s k )
These i n s t r u c t i o n s are s e t i n t h e K op r e g i s t e r by t h e CTL i n s t r u c t i o n s from Pio. The i n c t r u c t i o n s are t h e n executed by t h e
K's. They w i l l only be g i v e n as names, mnemonics, and operation codes.
DNOP (:= K d p = AA) + no operation
DREL ( : = K&p = A4) + release
DEBM (:= K&p = A8) --f
e i g h t b i t mode
DSBM ( : = K d p = As) + s i x b i t mode
D S E K (:= K d p = EA) + seek
DVSR (:= K a p = 82) .--) prepare t o v e r i f y ( s i n g l e r e c o r d )
DWRF (:= L o p = 83) + prepare to w r i t e format
DVTN (:= K a p = 84) .--) orepare t o v e r i f y ( t r a c k w i t h no addresses)
DVCY (:= K d p = 85) --f prepare t o v e r i f y icy1 +der o p e r a t i o n )
DWRC (:= Ko
,p = 86) + prepare t o w r i t e check
D S A l (:= K d p = 87) .--) s e t access i n o p e r a t i v e
DCTA (:= Ko
,p = 88) +
prepare t o u e r i f y ( t r a c k w i t h addresses)
DVHA (:= K d p = 89) + prepare t o v e r i , f y (home address)
Control Orders, . i . e .
I n s t r u c t i o n Names and Numbers f o r K i 'Hypertape)
HNOP ( : = K d p = AAJ + no operation
HE05 (:= K a p = A I ) + end of sequence
HRLF ( : = L o p = A2) + reserved l i g h t 0,f.f
HRLN (:= Ko
,p = A3) --f reserved l i g h t on
HCLN (:= L o p = A5) + check l i g h t on
HSEL ( : = Ko
,p = A6) + select
HSBR (:= L o p = A7J i s e l e c t f o r backward reading
HCCR (:= Ko
,p = 28) --f change c a r t r i d g e and rem%
HRWD (:= L o p = 3A) + rewind
HRUN ( : = Ko
,p = 31) --f rewind and unload c a r t r i d g e
HERG ( : = L o p = 3 2 ) --f
erase long pao
HWTM (:= L o p = 33) + w r i t e tape mark
HBSR ( : = L o p = 34) + backspace
HBSF (:= L o p = 35) + backspace f i l e
HSKR ( : = Ko
,p = 36) + space
HSKF ( : = L o p = 37) --f
space f i l e
HCHC ( : = L o p = 38) + change c a r t r i d g e
HUNL ( : = L o p = 39) i
unload c a r t r i d g e
HFPN ( : = L o p = 4 2 ) + file p r o t e c t on
Chapter 4 1 1 The IBM 7094 1,ll 541
Appendix 4
IBM 7094 Pc I n s t r u c t i o n s t o P i 0 ( ~ 7 9 0 9 )
P r State
Pc,t r a p g n a b l e d ,B,C, D,E, F,G. H> An 8 b i t r e g i s t e r i n Pc which i s used t o mask or allow t r a p
r e q u e s t s frov P i o . ( # A , q, ...
H)
Instruction Set
The .Collowing i n s t r u c t i o n s i n Pc are used t o operate on each P i 0 s t a t e : thus, each i n s t r u c t i o n i s actuall!i 8 i n s t r u c t i o n s
R S C i(Wait i (CC t e ; W a i t - 0 ) ; r e s e t and s t a r t channel
?Wait +RSC):, i n i t i a l i z e s a Pi0
STC + ( W a i t + (CC L A C ; W a i t ~ 0 ) ; s t a r t t h e Pi0 program
+sit i S T C ) :
SCH + (M[elQ1:35> t C C ; M[e]<3:17> cAC): s t o r e channel. Checks s t a t u s o f a P i o .
ENB --f ( P c d r a p a n a b l e cM[e]Q8:35>); enable .from e , f f e c t i v e address
RIC + (CTCOACuARnCCnUCnWait to); r e s e t channel
TCO + (7 Wait --f IC +e); t r a n s f e r on channel i n operation
TCN + (Wait + I C +e); t r a n s f e r on channel n o t :n ooeration
I I
Section 2
The SDS 910-9300 series,
a planned family
The Scientific Data System 900-9000 series consists of the SDS and in the 65,384-word 7094 II (Chap. 41), tend to be less
910, 920, 925, 930, 940, 945, and 9300 computers. The series desirable and flexible.
includes capabilities and features found in most 24-bit ma- The SDS 930 was used at the University of California (Berk-
chines. The design implementation is among the best for 24-bit eley) as the base machine for the design of the Berkeley Time
machines, as measured by equipment utilization, the processor Sharing System (Chap. 24). SDS later marketed the system as
state, implementation technology, and ease of use. the SDS 940.
The first delivery dates for the members of the series are910 The 9300 was not a member of the original 910-930 series.
(August, 1962), 920 (September, 1962), 925 (February, 1965), There is almost symbolic language program compatibility. Sev-
930 (June, 1964), 940 (April, 1966), 945 (-1968), and 9300 eral registers and extra memory transfer paths were added to
(December, 1964). form the 9300 from the 930. The power of the 9300 is only
The 910 and 920 were designed at the same time as a a factor of 2 times the 930 for simple instructions. However,
planned series of compatible computers which spanned a range the hardwired floating-point instructions in the 9300 increases
of performance. The 910 has instructions which facilitate de- the power over the 930 by a factor of almost 10 for arithmetic
fining 920 instructions by software. For examde, these include problems. It is hard to believe that the incompatible 9300 was
the multiply and divide step1 (see page 544) instructions in a wise choice. (We suggest a more reasonable alternative could
the 910 for programming the multiply and divide instruction have been a two-processor 930'. The 930' processor would be
in the 920. a 930 but with hardwired floating-point arithmetic instructions.)
The I / O facility evolved to a clean structure, with the poten- The 9300 has interesting twin-mode instructions for simulta-
tial for having a high degree of T and Ms data-transfer concur- neously operating on 12-bit data pairs. The 24-bit fixed-point
rency at a comparatively low cost. The IBM 7094 should be word is sufficient for the real-time applications for which the
studied for a contrasting (more expensive) approach. computer was designed.
The instructions which help manipulate floating-point data A flaw in the series is the sharing of K's among peripheral
are interesting and useful. The machine's ability to execute T's and Ms's. This problem can be seen by looking at the PMS
closed floating-point arithmetic subroutines is fairly good con- structure (Chap. 42, Fig. 2, page 546). The connection to the
sidering that the instructions are not hardwired. peripheral K from K('Channe1) requires a continuous connection
The Programmed Operator (POP) instructions provide the during the data-transfer dialogue to Mp. This structure is espe-
ability to define an instruction set for efficient encoding. The cially bad in the case of a slow T, for example, a typewriter.
idea appeared earlier in Atlas. However, the POP instruction A single character transmission requires that K('W, 'Y) be
calls subprograms in primary memory, instead of in fixed assigned to the typewriter during the complete message trans-
memory like Atlas. mission (at a connected time of 100 milliseconds/character).
A nice scheme1 is described for increasing the memory The problem can be avoided by placing a character memory
address space from 16,384 to 32,768 words. Other schemes in each slow KT. Multiple devices could then run concurrently
which switch memory banks, like those in the PDP-8 (Chap. 5) without requiring the elaborate K('W, 'Y) to be attached to them.
The structure does not preclude such an improvement.
'We believe this appeared originally in the DEC PDP 1 introduced in November, A complete description of the input/output and interrupt
1960. system is given and should be read carefully.
542
Chapter 42
' M - M e m o r y or Memory Address: N - n u m b e r of shifts: T-tag field: +-also in the 910, 920 and 930; x.910 only; m o t in the 9300; $-not in the 910.
546 Part 6 1 Computer families Section 2 I The SDS 910-9300 series, a planned family
t K( 'Y)-Sfx
a t task completion (the block of data transferred). Task alarms may
c a m e Kio t o interrupt Pc. Each Kio('Channe1) can assemble data
on a 6-, 12-, or 24-bit basis for Mp accesses. A K('Channe1) recog-
nizes two types of information: data being transmitted between
Mp and the peripheral K, and initialization or controlling infor-
mation from Pc.
In the 930 or 9300 K's the principal distinction is that the actual
data-path switching routes differ. From a program operation and
control viewpoint the Time Multiplexed and the Direct Access
'Pc(l address/instruction: 1 i n s t r u c t i d w ; 24 b/w: Communication Channels (TMCC and DACC) and the Data
technology: t r a n s i s t o r ; 1962 - 1968) Subchannels (DSC) behave almost identically. T h e T M C C and
DSC differ from DACC in that the block control information
(number of words and location in memory) for the channel may
be either in primary memory or in local hardware memory associ-
Fig. 1. SDS 910 and 920 PMS diagram. ated with the channel hardware.
T.console -
-P c1-
I s
L ( I / O b u s : under P c v r o g r m e d c o n t r o l )
T-
L - L('Memory I n t e r f a c e Cnnnection/MlC)-
-c n n t r n l , data
data o n l y
-m TMCC
Para1 l e 1
Input/Output
i zs
POT
-7
Main Frame -
.................
I
Additional
Optional
............
Memory
r-l Memory
1 Multiple Access
to Memory
Feature 1 I Multiple Access
to Memory
Feature I
I-- Second Path - - -
I
I I
I Data Multiplexing System I
I I
I I
I
I
I
I Priority I
I I
Control I
I
I I
I Optional I Priority Interrupts
I EIN I
I I
I I
Where
T h e 9300 structure, though not given in the PMS diagram, is 'nstruction-set processor
essentially that of the 930 (Figs. 3 and 4). In the 9300, Mp has The interesting parts of the ISP are discussed informally below.
three access ports or a S('Memory-Processor; 8 Mp; 3 P X ) . The T h e formal ISP description given in Appendix I of this chapter
Pc('9300) requires two of the access ports for independent access should be read. The descriptions are partially taken from the SDS
of instructions and data, leaving one for K transfer to Ms and T . Programming Reference Manuals.
Instruction/operand access
i s overlapped when separate Core Memory
memory modules are accessed Expandable to 32,768 words
G
I
L
M u l t i p l e Access
to Memory
_ _ _ _[--J
I
I
,
I
System
\
I Data Multiplex
__-___- _1
I
I
I
4
To/from Special Devices
r-----
I
I
L
interface
M~~~~~
- - -Connections*
- - - - - - -JI
1 ---- Tm
II Up to 128 Data
Subchannels
instruction format is 14 bits long, allowing direct access of only W(0) contains the Indirect Address bit I.
u p t o 16,384 words. Memory extension in the 930 contains two W(1:2) contains the Index Register bits X(0:l).
3-bit memory extension registers, EM2 and EM3, and allows W(0:2) is called the Tag field.
addressing of memories of 32,768 words. T h e program loads either W(3:8) contains the Instruction code; the contents of this field
or both of the registers and activates them as desired. Each register determine the operation to b e performed.
can become the most significant digit (fifth octal) of any operand W(9:23) contains the Address; for most instructions, the con-
address. tents of this field represent the memory location of the operand
T h e program uses the first extension register, EM3, by calling called for by the instruction code.
for a n address with an 11, in the most and next most significant
address bits, respectively (a 3 for the most significant octal digit). Address modijication. Each index register contains an unsigned
T h e program calls for EM2, the second extension register, by base address of 15 magnitude bits and a signed increment of 9
setting the same two address bits t o 10, (a 2 for the most significant bits. T h e increment contains 8 magnitude bits and a sign bit and
octal digit). In this way, normal addressing compatible with the is held in two’s complement form.
910 and 920 occurs by setting a 3 in EM3, and a 2 in EM2. Index registers are modified by adding the signed-increment
value to the base address using two’s complement arithmetic. Since
910-930 instructions the increment and base address fields are of unequal lengths, the
Programmed Operators (POP’S) enable subroutines t o be called sign bit (bit 0) of the increment field is extended six positions t o
with a single instruction. This provides definable instructions of the left prior to the addition. This 15-bit sum is then stored in
the same form as built-in machine instructions. T h e computer the base address field of the index register. T h e index register may
decodes the operation codes 100, -
177, as special instructions be incremented by any value from -256,” to 255,” using a single
and transfers to a subroutine whose address is uniquely determined instruction. Incrementing and testing for a “terminal condition”
by the code. T h e computer records the address of the POP in- is done by the instruction Increase Index And Branch (BRX), as
struction at location 0 together with an indirect address bit so follows:
that the program continuity may b e maintained. By indirect If the index register has been negatively incremented, a ter-
addressing which refers to location 0, which in turn refers to the minal condition exists when the base address has been reduced
POP instruction, the subroutine can gain access t o the effective below the zero value.
address of the operand associated with the P O P instruction. If the index register has been positively incremented, a terminal
T h e instruction set for the computers in this series is listed in condition exists when the resultant base address has been increased
Table 1. T h e table should b e used t o compare the machines. beyond the maximum address value (077777,).
There are two instructions in the 910 which are not in the 920 If the terminal condition exists, the next instruction is taken
or 930: Multiply Step and Divide Step. These instructions facilitate in sequence. If the terminal condition does not exist, program
writing subroutines for multiplication and division. T h e Multiply control is transferred to the location specified.
Step (MUS) instruction is defined: T h e instruction set for the 9300 is given in Table 1.
Iw
LMemory B u f f e r Memory address
use the C register.
T h e 0 register is a &bit register that contains t h e instruction
or operation code of the instruction being executed.
The MI register is a 24-bit register that holds each word as it
comes from memory. Recopying of a word into memory takes place
from the MI register.
A *
Gccumulator 9300 registers (Fig. 8)
1SP registers ("). The A and B registers of the 9300 are the same
as in t h e 900 series computers; however, the P register is P(9:23).
There a r e three 24-bit index registers, X[l:3]. Each index regis-
ter is composed of a base address of 15 bits and a signed increment
of 9 bits.
The Flag register, F, is a @bit register that may he set and/or
sensed by the program. The first bit position of this register is the
overflow indicator.
To peripheral T and Ms Hardware registers not i n the LSY. T h e C register holds the 24-bit
operand word as it is transmitted to, or received from, memory.
All registers 24 b i t s except Sq0.23>;0<3.8>;EM2<0:2>; and EM3<0 2>
x Registers accessable t o program OSP) T h e D register holds t h e next 24-bit iiistruction word as it is
t Only in 930,930 core memory is 32768 w
received from memory.
The 15-bit S register contains the address of t h e memory loca-
Fig. 7.SDS 910, 920, and 930 registers diagram. tion to h e accessed for either instruction or operand.
The &bit O register contains the instruction code of t h e in-
struction being executed.
B register contains t h e less significant portion of double-length The A' register is an optional 15-bit register used for the
numbers. Overflow and carry bits are used with A and B opera- floating-point option. It temporarily extends the A register during
tions. t h e execution of floating-point instructions.
The index register X, used in address modification, is a full-word The B' register is an optional 15-bit register which temporarily
register. Index-register operations use the least significant 14 bits. extends the B register during t h e execution of floating-point in-
The P register is a 14-bit register that contains the memory structions.
address of the current instruction. Unless modified by the program,
t h e contents of P increase by 1 at the completion of each instruc-
Instruction interpretation i n the 900 series
tion. The instruction-interpretation process can be explained in terms
The memory extension registers, E M 3 and EM2, are 3-bit of the processor's registers (Fig. 7). The ADD instruction execution
registers that specify t h e portion of extended memory being used. (not including memory mapping) defined in ISP as A t A +
M[e]
They exist only in the 930. is interpreted as
Harcliuare registers not i n the 1SY. The S register is a 14-bit register S t P; P t P + 1: next fetch the instruction
that contains the address of the memory location to be accessed
MI t Memory[S]; next
for instructions or data. The 15-bit address is formed by S and
one of t h e memory extension registers. C t MI;next
552 Part 6 1 Computer families Section 2 1 The SDS 910-9300 series, a planned family
r---------
I
1
I r--------:
.
I M Registers I I M Registers I
I
L _________ -1 I I I
ttlft
-'m7
C (Operands) D (Instructions) - I
Memory
u
Direct Parallel I/O
?
Incr.
X I (Index)
Bose
-
r
L- X2 (Index) 2-
I
Incr. Base
Overflow
Misc. B i t s
- Incr.
X3 (Index)
Base
-
Note: Only;':
r e g i s t e r s a c c e s s i b l e t o program
6a Data Subchannel/DSC (Internal Interlace). be stored in Mp. The execution of a POT or PIN instruction sends
a signal to the external device involved in the input/output oper-
6b Data Subchannel (External Interlace).
ation, which notifies the device to send its data word as soon as
7 Memory Interface Connection/MIC link. A component has it is operational. W h e n the device becomes operational during a
a link to Mp.
Read or PIN operation, it transmits a Ready signal to the central
processor while a t the same time presenting a data word to Pc.
Methods 1 to 3 above are completely under control of a pro- During the execution of a POT instruction, the central proc-
gram and are simple time-independent instructions (or methods) essor transmits a signal to the external device, alerting it to receive
of transferring data to K’s (and onto KT or KMs). The ISP descrip-
a data word. When the device becomes operational, it transmits
tion (Appendix 1 of this chapter) has a detailed description of the
a Ready signal to the central processor, which releases the data
1/0 devices and these 1 / 0 instructions. word to the external device.
Selective input/output with these devices is accomplished by
Single-bit control and sense
preceding POT or PIN with an EOM to alert (select) the desired
Two instructions provide for single-bit O N / O F F control signals. device by a specific address. By preceding the POT or PIN with
T h e first, EOM, transmits a control signal and a 14-bit address an SKS, the Ready signal of the special device can be tested after
to an external device or a function within the computer. T h e the execution of the EOM but prior to execution of the parallel
second, SKS, selects an external device or computer function and transfer instruction; a possible Pc “hangup” can thus be avoided.
skips in response to a false (0) signal. Up to 16,384 control signals The Ready signal can also set one of the priority interrupts.
can be sent and 16,384 input signals tested theoretically. (A more PIN stores the contents of 24 input lines in parallel in the
reasonable number of physical destinations would be 50.) Execu- effective-memory location. PIN or POT requires four cycles plus
tion of an EOM causes a signal of approximately 1.4 microseconds any waiting time for Ready.
duration to be transmitted.
Interrupt
EOM instruction format. EOM is used to select a specific 1/0 T h e interrupt provides program control of input/output opera-
device by placing a 1in its select register. EOM requires one cycle. tions, aids in programming simultaneous input/output and com-
W ( 2 ) = 0. pute operations, and allows immediate recognition of special
W ( 0 : l ) is reserved for special system address bits. external conditions by causing Pc to execute an instruction in a
W(3:8) contains the EOM instructions code, 02. selected Mp location at the end of the execution cycle of the
W(1O:ll) contains the system mode specifier. current instruction. Without disturbing the program register, the
W(12:23) contains the 12-bit address field that specifies the processor executes an instruction in one of a selected set of mem-
special system destinations. ory locations. A Mark Place and Branch (BRM) instruction in this
location saves the contents of the program register, EM3, EM2,
SKS format. The SKS instruction format has each corresponding and overflow indicator and transfers to the particular interrupt
bit field identical to the system EOM format. Execution of an SKS servicing routine reqnired. To exit from the interrupt service
causes a 14-bit address to be presented to all K’s; t h e K being routine, a Branch Unconditionally (BRU) instruction using indirect
addressed responds and is tested. If the addressed external K addressing returns control to the next instruction in proper se-
supplies a “set” signal to the central processor, the computer quence in the main program; it also clears the interrupt. Processor
executes the next instruction in sequence from the SKS. If no signal state (that is, A, B, Overflow, and X ) must be preserved and
is set, the computer skips the next instruction in sequence and restored by the program if the registers are used by the program.
executes the following instruction. No registers are affected except The priority interrupt system has up to 1,024 interrupts ar-
the P register. SKS requires two or three Mp cycles if no skip or ranged in levels. The levels have priority according to a priority
skip, respectively, is executed. number; the higher priority levels have a smaller number. Inter-
rupt channels are installed in Pc in groups of 16. The assignment
Word parallel instructions of physical memory locations to interrupt levels is shown in Ap-
Two instructions, Parallel Output ( P O T ) and Parallel Input (PIN), pendix 1 of this chapter; the assignment is in order of decreasing
permit any word in M p to be presented in parallel on a physical priority from location 200, (highest) to 1477, (lowest). Interrupt
connector to a K or, inversely, permit signals sent from a K to requests can also be programmed. The power fail-safe (for power
554 Part 6 I Computer families Section 2 1 The SDS 910-9300 series, a planned family
supply off) interrupts and out-of-order interrupts have the highest An interrupt has three operational states: inactive, waiting, and
priority. active states.
Besides the interrupt mechanism just discussed, there is also In the inactive state, 110 interrupt signal has been received into
a single instruction interrupt. This permits the execution of only the level and none is currently being processed by its interrupt
one instruction before automatically being cleared and returning servicing subroutine.
t o the program that was interrupted. For example, if an external In the waiting state, an interrupt has been received but is not
clock source is connected t o the computer so t h a t it pulses an being processed. This situation may arise when an interrupt of
interrupt line at set intervals, the program can maintain a pro- higher priority is being processed. When all higher waiting inter-
grammed real-time clock. Each time the external pulse causes an rupts have been processed, this level goes t o the active state.
interrupt, the program executes the single instruction, Memory In the active state, the interrupt has caused the main program
Increment (MIN), t o add 1 to the memory word selection for use t o recognize its presence and has transferred to its assigned inter-
as a programmed real-time clock. (The main program can examine rupt location where it is being processed.
this memory location whenever necessary t o determine how many Two program control features are Arm/Disarm and Enable/
time increments have elapsed since the clock was started.) Disable. Arm/Disarm controls whether an interrupt can proceed
Interrupts can b e single or normal-instruction interrupts in any from the inactive state to the waiting state. W h e n armed, an
combination desired. interrupt signal sets the interrupt t o the waiting state. Enable/
Channel E
Other
Communication
(8-
Channels
Error
--La
(F, G, H)
Character O u t p u t
I I I Address SDS To Hp v i a S
I I t Channel 930 or to
Memory
\
Device
Control Logic
- Request - . Control Hp v i a P c
Control Line M o d u I es ( f o r Channels
Unit
W, Y , C . D)
t o KMS
or
KT
"Part o f i n t e r l a c e
Fig. 9. SDS 930 direct-access communication-channel register diagram. (Courtesy o f Scientific D a t a Systems.)
Chapter 42 1 The SDS 910-9300 series 555
Disable operates on the entire interrupt system. (When the inter- enter the Single-Character Register (SCR) where the channel
r u p t system is enabled, interrupts can occur.) buffer assembles them, one at a time, into t h e WAR.
The channel interlace contains two working registers: t h e Word
Communications channels-Kio( 'Channel)'s Count Register (WCR) and the Memory Address Register (MAR).
A channel may have these registers either in K or in Mp. In the
Kio('Communication Channels) provide buffering, input/output
setup sequence for a n interlaced input/output operation, t h e POT
control, and data transmission simultaneously with computation.
instruction transmits to the interlace a data word made up of the
There can be u p to eight independent communication channels
word count (that is, length) and t h e starting address of the data
and a large number of subchannels in a single system. Figure 9
block. The 15-bit Word Count Register (WCR) contains the data
shows t h e registers in a K('Channe1).
word count during a data transfer. The number of data words is
Each channel can control up to 30 KT's or KMs's. T h e channel
decremented by 1, and the new count replaces the old one in the
handles character, word assembly and disassembly, input/output
W C R for each word transmitted.
parity detection and generation, data transmission to and from
The Memory Address Register (MAR) contains t h e starting
memory, and end-of-transmission detection.
destination or source address in memory of t h e transmitted data.
All channels are bidirectional and can communicate with 6-bit
The memory locations to or from which data words are to b e
character devices or word devices in 6, 12, and 24 bits. The main
transmitted enter t h e MAR a t the same time the word count does.
program that initializes a K specifies the number of characters to
During transmission of data, t h e interlace increments t h e MAR
be contained in each word during t h e transmission.
after each word as it decrements the contents of the W C R . These
The channel interlace controls the transfer of t h e data words
two registers provide the interlace control of block transmissions.
going through the associated channel buffer, supplies the memory
Obviously, if the interlace control registers are in Mp, then two
address of data coming from or going to memory, and maintains
extra accesses are required for each word transferred.
t h e word count determining t h e number of words transferred. This
interlace information can b e either in K hardware (external inter-
Memory interface connection link
lace) or in Mp (internal interlace). The terminal interrupts, End
of Record and Zero Word Count, come from the interlace and Once a computer is equipped with a multiple-access-to-memory
are under its control. feature, one or more Memory Interface Connections (MIC) can
The time-multiplexed channels use the memory-access logic of be attached. The MIC is a general interface to the computer that
P c to transmit input and output of d a t a words and require two allows special devices to access Mp. It preserves the integrity of
memory cycles (see Fig. 2). Each direct-access channel has inde- t h e memory by generating the parity of incoming data words and
pendent memory-access logic and requires one memory cycle (see checking t h e parity of words read from memory to indicate mem-
Fig. 2). ory failures. The device that is connected to t h e MIC must hold
both t h e data and the address until t h e transmission to/from
memory is completed (that is, MIC does not have registers).
Comrriunication-channel description. Up to 30 peripheral devices
(K's for T or Ms) may be connected to one K('Channe1) (Fig. 9).
Each device has a unique, 2-digit, octal address by which it is Conclusions
selected for an input/output operation. To select the peripheral
The SDS computers appear to be t h e first attempt t o design several
device, t h e program loads the proper unit address into the 6-bit
computers a t t h e same time with a common ISP. Over a longer
Unit Address Register (UAR) in t h e channel. This address selects
time span other compatible computers were added to the original
both t h e device and, if appropriate, t h e function to b e performed.
910 and 920 as technology (and marketing) dictated. The series
Placing a nonzero unit address in the unit address register connects
is characteristic of well-designed typical 24-bit computers. By
t h e peripheral unit addressed to t h e channel, and the unit becomes
increasing t h e arithmetic capability, t h e series could also be used
active. W h e n t h e UAR contains a zero address, or any time that
more generally.
a terminal or initial condition clears t h e contents of UAR, the
channel becomes inactive.
T h e 24-bit data Word Assembly Register (WAR) contains the
data word actively being received or transmitted during an input References
or output operation. During input, 6-bit characters (plus parity) Scientific Data Systems Reference Manuals for the 930 and 9300 computers
556 Part 6 1 Computer families Section 2 1 The SDS 910-9300 series, a planned family
Appendix 1
SDS 930 I S P D e s c r i p t i o n
Mp S t a t e
Mem0ry[O:77777~lcO:23> 32 kw prirnaru memom
Tuo 3 b i t map (or e x t e n s i o n ) r e g i s t e r s eztend t h e address space o f Mp t o 32 kw. EV2 holds a 4 kw block number when addresses
20000 -27777 are used. EM3 holds t h e 4 kw block number f o r addresses 30000-37777
8 8 8'
EM2Q):2> Fxtension Vemor,u r e g i s t e r s
~~34):2>
P c Console S t a t e
Individual r e g i s t e r s i n Pc can be read and w r i t t e n from t h e console.
B P T d :4> Breakpoint o r sense switches
I n s t r u c t i o n Format
i n s t r u c t ion/i<O: 23>
r e 1 a t i ve : = i<D> unused by I.SP; so,ft#are r e l o c a t i o n b i t
index,bi t/xb := i <I >
op,code/op<2:8> := i<2:8>
pop,code<0:5> := i<3:8> programed oneration code v a l w
indirect&it/ib : = i<9>
y<10:23> := i<10:23> address f i e l d f o r 16 kw
p microcoded i n s t r u c t i o n b i t s w i t h i n an instruction
E f e c t i v e Address Calculation Process
e<10:23>:= (7 ib +( i t e r a t i v e process of i n d e f i n i t e i n d i r e c t addressing u n t i l
no inc'irect b i t , i b , i s found
~ x +b
Y;
xb + y + X);
ib +(
-,xb -.(id)@:23> tM[yIQ)og:23>
xb i ( i < O O 9 : 2 3 > t M [ y + X]<009:23>); next e ) )
e I <I 8:23> := e<l8: 2 9 s h i f t count
Chapter 42 I The SDS 910-9300series 557
-.
I n s t r u c t i o n I n t e r p r e t a t i o n Process
~nterrupt-interpretation
instruction +MIPI; P t P
--f (
+ I; next
normal i n t e r p r e t a t i o n
,fetch
Instruct iondxecut ion) ; execute
Interrupt,interpretation +( interruat internretation
I n s t r u c t i o n +M[200 + 208 x K j l d d r e s s + l a d d r e s s ] : next
e
I n s t r u c t i o n execution)
Microcoded R e g i s t e r Exchange I n s t r u c t i o n
Each i n s t r u c t i o n can be formed from a s e r i e s of m i c r o p r o g r m e d o p e r a t i o n s . Comnound microcoded i n s t r u c t i o n s are shown below
without a p I .
CLA + (A 1 - 0 ) ; clear A
CLB + ( B to); u, c l e a r B
CLR + ( A B -0); c l e a r A and B
CLX + ( X -0); u, c l e a r X
CAB + (B + A ) ; p, cop!, A i n t o R
CBA - ( A tB); p, COpU B i n t o A
-)
(Ov,AB <-AB
(AB
(X t X
AB
-
x
X
zel
2
(rotatel);
normalize,exponent(AB)
left s h i f t
l e f t cycle
normalize, decrease X
AB t normalize(AB));
S k i p T e s t Group
SKE i ( ( A = M[e]) i(P t P + 1)): s k i p if A = M
S K B + ( ( ~ ~ ~B )] =A o ) + ( P t P + I): s k i p if B and M d o n ' t compare 1 ' s
- + I)]:
SKN
SKR
SKM
-1
-
-1
(MleI;O>
(Ov,M[el
* (P
+M[el
( ( M [ e ] A E) = (A A 8 ) )
P
- I : n e x t M[e]<O>
+ (P C P + 1 ) ;
- (p + P + 11);
s k i p i.f M negative
reduce F , s k i p < 0
skib on masked M
- (A > M[e]) + (P t P + 1); s k i p if g r e a t e r than M
-
SKG
SKD (XR<0:23> t a b s ( k l 5 : 2 3 > - M[e]<15:23>l; di.F,ference e m o n e n t s ami s k i p
(M[e]<l5:23>>B<l5:23>) i( P P + I)):
-
t
t
e):
x + I: x<p i P t e);
branch uncomiitionally
increment Tndex, Rranch
BRM + ( M [ e ] < b t O v ; M[e]G:5> t E M 3 ; M[e]<l ,2,9> to; mark nlace and branch
M[e1<6:8> r-EM2: M[e1<10:23> t P : next used t o eel2 subroutines
P +e + I):
BRR i( P t M L e ] + I ; Ov t O v v M[e]a>): branch r e t u r n ; used in terminating subroutines
Control Group
HLT 4 (Run to): halt
NOP
EXU - 4:
(instruction tM[e]:
Instruction,execution) :
no operation
execute
Overflow T e s t Group
OVT + (OV + (P + P + I); (0" + 0));
overfZo7i t e s t
ROV -1 (0" c 0 ) ; r e s e t overflow
REO 3 ( X < I b C8 X<15>) + (Dv t
record exponent
Chapter 42 1 The SDS 910-9300 series 559
Breakpoint T e s t Group
((BPT I A BPT<I>) V (BPT 2 A BPTQ>) V (BPT 3 A BPTO>) V (BPT 4 A BPT<4>)) -f (P t P + 1);
Memory Extension R e g i s t e r Control Group
SET + ( i n s t : u c t i o n < l 7 > + (EM2 t i n s t r u c t i o n Q l :23>);
instruction<l6> i(EM3 t i n s t r u c t ion<18:20>)) ;
EXT + c o n d i t i o n -(P t P + I);
c o n d i t i o n := ( ( i n s t r u c t i o n Q Z > A (EM2 = 2 ) ) A ( i n s t r u c t i o n Q 3 > A (EM3 = 3 ) ) )
POP + ( M [ 0 1 4 , 9 : 2 3 > tOvUlC4'; P -100
8
+ pop~ode); programmed operator; 64 user de.fined i n s t r u c t i o n s c a l l e d via
subroutine ZinP i n b q n ]
EOM + I0,ins t r u c t io n g x e c u t ion: see t h e d e f i n i t i o n o{ t h e i O i n s t r u c t i o n s e t below
POT + I O - i n s t r u c t i o n g x e c u t i o n :
PIN
SKS -
iI O - i
I0,i
ns t r u c t i o n g x e c u t i o n ;
n s t ruct io n g x e c u t ion ;
) end i n s t r u c t i o n ~ ~ e c u t i o nnot
: i r c l u d i n g Input Output
instructions
Input-Output Control from t h e Pc
XT and ILnls S t a t e
17euices c o n s i s t o$ t h e f o l l o w i n g parts:
lO,De~ice[O:77777~1 name lor addressi of a s p e c i f i c I O d e v i c e : t h e EOM command
i s f i r s t g i v e n t o s e l e c t t h e s p e c i f i c d e v i c e : subseauent
commands are i m n l i c i t l u t o t h e s e l e c t e d device
10,output E O : 7777781<0:23> Irput and Output Data b u f f e r s a s s o c i a t e d w i t h s p e c i f i c
devices
lO,inp~t[0:77777~]<0:23>
b i t for each 3 e v i c e t o denote when device is rea& t o t r a n s -
I0,Ready[0:7777781 m i t data
IO,Sele~t[0:77777~]
a b i t w i t h i n each deuice denoting i t has been s e l e c t e d .for
an operation
i o - u n i t<:0:14:, t h e p a r t i c u l a r i o devzce s e l e c t e d by t h e EOM cononand;
I O Instruction Set
E O M - + (io,unit ~-e); cornand t o s e l e c t or a f i i r e s s t h e d e v i c e : energize o u t p u t Pf
POT -3 (IO,,Select:io,unit] A IO,Ready[io,unitl -f ( o u t n u t data commard
l0,Out~utLiodnitl t M[el; io-unit to):
IO,Select[io,unit] A IO,Ready[io,unit] -f (POT)): w a i t u n t i l read.u
PIN i(IO,Select[io,unit] A IO>eady[io,unit] +( i n p u t data command
MLe] e- IO,lnput[io,unit]; io,unit to):
SKS - IO,Select[io,unit]
(io,unit t e :
(IO,select[io,unit]
next
A IO,Ready[io,unit]
A IO,Ready[io,unit] i
+(PIN));
(
w a i t until r p a h
s k i p i f signal i s not s e t
P t P + I);
io-unit -0):
Tnterruot .C!usternS t a t e s
Interrupt c o n t r o l s whether i n t e r r u p t s w i l l be processed
I,RQ[0:63]<0:15> arrau o.f 1024 i n t e r r u p t r e o m s t s
IO
,N [ 0 :63 ]<O : I 5> array of i n t e r r u p t enable t o enable or i n h i b i t i n t e r r u o t
reouests
IS, i gna I [ 0 : 63 ]<0 : 15> := I ,RQ[O : 6 3 ~ :0l5> A i O
,N [ 0 : 6 3 ~ :015>
K d d d ress<O : 5> groui, number
I,address<O: 3> l e v e l number w i t h i n a group 0.f the active i n t e r r u p t
560 Part 6 1 Computer families Section 2 I The SDS 910-9300 series, a planned family
(c = 3) +
+ lJN[al<0:15>
( c = 2 ) + I,ON[a]<O:15>
luON[a]4J:15>
c I$N[a]<0:15>
t IJN[al4:15>
tb<0:15>);
v
V -
B<0:15>)
R<Oi15>:
arm a channel l e v e l group
disarm a channel l e v e l group
s e t a channel l e v e l group
a<0:5> := M[e]<O:5> p r o w s e l e c t or K d d d r e s s
b<0:15>:= M[e]<8:23> data .for I-address
c4:l> := M [ e ] 4 : 7 > command control b i t s
Section 3
561
562 Part 6 1 Computer families Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range
The architecture“ of the newly announced IBM System/360 The lower RCA prices do not reflect entirely implementation and
features four innovations: technology but include RCA marketing and profit strategy. In
1 An approach to storage which permits and exploits very addition, of course, there should have been lower development
large capacities, hierarchies of speeds, read-only storage costs.
for microprogram control, flexiblestorage protection, and An interesting aspect of the design is the method used to
simple program relocation. implement the individual computer models (of the range) and
2 An input/output system offering new degrees of concur- their associated costs. From the standpoint of innovation, the
360 was the first computer series to cover a wide range. The
rent operation, compatible channel operation, data rates
approaching 5,000,000 characters/second, integrated more basic P’s (Models 20 -
65) were implemented via a
design of hardware and software, a new low-cost, multi- microprogrammed processor. This is based on a computer
ple-channel package sharing mainframe hardware, new program within an M(read only), i.e., a Read Only Storage/ROS,
provisions for device status information, and a standard to interpret the common ISP. A payoff from this implementation
c han ne1 interface between central processing unit and strategy is a solution to the “compatibility design constraint,”
input/output devices. which is the ability to provide compatibility with the customer’s
3 A truly general-purpose machine organization offering previous (IBM) machine, which of course was not a member
new supervisory facilities, powerful logical processing of the 360 series. This is undoubtedly the most difficult con-
operations, and a wide variety of data formats. straint to meet in the P designs, and probably the most signifi-
4 Strict upward and downward machine-language compati- cant real innovation. From the marketing viewpoint, it provided
bility over a line of six models having a performance the user with a crutch to go from a former IBM computer to
range factor of 50. the System/360. This is accomplished through “emulation,”
which (as defined by IBM) means the ability of one C to inter-
The above four featured innovations are all stated as IBM pret another’s programs at a reasonable performance leeel.
Corporation design results. It seems better to analyze them in These emulations are realized by various microprogrammed P’s
terms of design constraints and implementation results. It being designed to interpret both the 360 ISP and one or more
appears that the design constraints, from marketing and man- of IBM 704, 709, 1401, 1410, 1440, 1460, 1620, 7010, 7040,
agement directions, were compatibility (item 4 above) and the 7044, 7070, 7074, 7090, 7094.
use of common peripheral equipment (item 2 above). Thus we Most of the above ISP’s have a different structure from the
can measure the 360 design in terms of how well it meets these 360 ISP. For example, the 1401 (Chap. 18) series instructions
constraints. With some minor exceptions, all the peripheral and data are variable-length character strings; the 1620 has
components existed at the time of the design and had been variable-length data strings; the 704 series process fixed- and
used with other IBM computers; thus a goal was already real- floating-point data with single-address instructions; and the
ized. A measure of the design can also be based on a compari- 7070 is a fixed-word decimal computer. Thus the 360 C’s repre-
son with alternative designs. In the following sections we sug- sent the first machines to be two logical processors in the same
gest that several forms of multiprocessing would yield higher physical implementation.
performance at lower cost. A difficult and important constraint, The emulated speeds are often better than that of the origi-
though not mentioned above, is the necessity of program com- nal hardwired computer. This is not surprising, considering the
patibility with almost all earlier IBM computers. change in technology; it is a very attractive feature. The 360
It should be noted that, at the outset of the IBM System/360 Mp performance is often a factor of 5 to 10 times the “emu-
announcement, another company, RCA, adopted the 360 ISP lated” computers; and the M(R0S) data rates are a factor of
as a design constraint for its own future computer development. 25 times the Mp’s. For example, the Model 65 emulating a 7090
Although some price-performance characteristics appear to be runs faster than a hardwired 7090 (Table 1). The use of an
better in the RCA series, the implementation scheme is similar. M(R0S) for defining an ISP is questionable if we ignore the
emulation constraint. Note, by way of evidence, that the hard-
The term nrchitecturr is used here to describe the attributes of a system as seen
wired models 9 1 and 44 have the lowest cost-to-performance
by the programmer, i.e., the conceptual structure and functional behavior, as
distinct from the organization of the data flow and controls, the l o g m design, ratios in the series.
and the physical implementation. There are minor deviations in the particular models, but all
Table lt IBM System/JCO Models, IBM 1130, and IBM 1800 computer characteristics
Pc (technology: (hybrid/hlp.rolp.rw)); h;h h;h p.ro rw p.ro p.ro h;h p.ro p.ro;h h; h h.p.ro.p.rw,h h; h
Pi0 (technology)
M (rol rw; t.cycle: ps/w; . . ... ? (MP) 1.0 0.625 ... 05 0.2 ... 0.08
size w; ... ... 4096 4096 2816 4000 ... 2000.500
b/w; ... ... . . 60 60 ... 90 100 ... 108
technology: (ind lcaplcore); ... ... core cap ind ... cap cap ... ro,rw
I S P s implemented in P.microprogram ... . .. ... 1401c 14011 14011 ... 1410) 7070 1 ... 7090
1620d 1410e 70701 70900
S (concurrency: (Mp;Pc)) 1;l 1.1 1;l 1;l 1;1 1;l 1;l 1,l 1;218;5 1.2.4;l (4.1) 1 ~ 1 . 1 ) 16;1
Mp (i.width: (by); (8, 1 parity) b/by; 2 2 1 2 1 2 4 4 8 8 16 8
t.cycle: ps/w; 3.6 214 7.2 0.9 1.5 2.5 1.0 20 0.75 0.75 (0.96.1.04)10.08h 0 75
size: log,(by), 13-14; 13-16; 12-14; 14-15; 13-16, 14-18 15-18 16-19, 17-(20 124); 18-20; 19-22; 20-22;
/-
t T h i s table is presented as PMS expressions. "wo M's; an M(content addressable) working with Mp
" Not IBM System, 360 compatible. but made with hybrid technology ' Estimated. see Chap. 44.
bSimilar. but not identical to System 360 ISP. 'See Conti [1968], based on running many programs.
?C('IBM 1401, 1440. 1460). kModels 85. and 91 are too difficult to predict because of instruction buffering based on Conti [1968]
dCC('IBM1620). 'Cost derived from purchase cost,45.
?C('IBM 1410, 7010) Varies depending on buffering and multiply options.
'C('IBM 7070, 7074) "Meaningless per sei Mp is used by microprogram defining Systemi360 ISP.
"C('IBM 709. 7040. 7044. 7090, 7094). " 1130 and 1800 are not program-compatible. The very high penalty factor of 3 is used to compare them to System, 360 ISP
564 Part 6 1 Computer families Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range
implementations belong to a common ISP subset. The Model nology of the 360 series is outstanding, perhaps surpassed only
20 and the Model 91, the extremes of the series, deviate most by the 360 marketing plan.
from the standard 360 ISP. The range of models (Table 1)
shows the comparative effects of implementation on the actual
The Instruction-set processor
processing times. For example, the designers of the various C’s
were constrained by memory bandwidths. Since the core mem- The following discussion covers only the Pc. The instruction set
ories have about the same cycle time (0.75 -
2.0 microsec- consists of two classes, Scientific ISP and Data Processing ISP,
onds), variation in bandwidth is obtained by increasing the data which operate on the different data-types. These data-types
path width from 8 to 6 4 bits and by increasing the number of correspond roughly to the IBM 7090 (Chap. 41) and IBM 1401
independent Mp’s. By looking at just Mp bandwidth, for models (Chap. 18). For the scientific ISP they are half- and single-word
30 - 65, we obtain a range of 5.3 to 85 megabits/s, corre- integers, address integers, single, double, and quadruple (Model
85) floating point, and logical words (boolean vectors); for the
sponding to a performance range of about 1 to 16. By doubling
the number of independent memories, this factor can be in- data-processing ISP they are address or single-word integers,
creased to 32. These models correspond to a Pc performance multiple byte strings, and multiple digit decimal strings. These
range of 1 to 32. Although we might expect a narrower range many data-types give the 360 strength in the minds of its various
(based on Mp speed), the range can be increased by perform- types of users. The many data types may be of questionable
ance suppression (at the low end). Power range can be in- utility and constrain the ISP design by having t o perform few
creased by lowering the absolute performance of Model 30. This operations, rather than having a more complete operation set
is accomplished by making performance tradeoffs to lower cost. for a few basic data types. The viewpoint taken here is a biased
one; we feel that, unless a particular data-type adds significant
processing and storage capability, it should not be fundamental
Logic technology to the ISP. The decimal-string integers appear to cost in storage
The logic of the 360 series is realized in a hybrid technology, and processing time. Their redeeming virtues are that little or
composed partly of integrated-circuit techniques and partly of no conversion is required at input or output time, and their
the solid-state techniques standard in second-generation ma- internal representation is easily recognized by people.
chines. lt is a “thick-film’’ technology that deposits the circuitry
on a ceramic substrate. This is called Solid Logic Technology Advantages of general-registers organization
(SLT) and is used solely by IBM. This production technique
The ISP uses a general-register organization. The ISP power
allows only for the fabrication of passive circuit elements on
can be compared with several similar general-register ISP
the substrate. The semiconductor elements (diodes and tran-
structures such as those of the UNIVAC 1107, 1108; the DEC
sistors) are produced independently, using standard semicon-
PDP-6, PDP-10; the SDS Sigma 5, Sigma 7; and the early
ductor production techniques on a wafer. The semiconductors
general-registers-organizedmachine Pegasus (Chap. 9). Of the
are then cut and bonded to the substrate, and the complete
above machines the 360 Scientific ISP appears to be the
SLT logic unit is encapsulated. The substrates correspond
weakest in terms of instructions and the completeness of the
roughly to logic elements (gates, inverters, flip-flops, etc.). The
instruction set.
SLT units are placed on larger printed-circuit boards.
For example, in Pegasus, PDP-6, and the UNIVAC 1107
Although SLT differs fundamentally from integrated-circuit
symmetry is provided in the instruction set. For any binary
technology, the overall size of the final printed-circuit boards
operation b the following are possible:
is about the same. At the time the decision was made to develop
the technology, it was unclear that integrated-circuit technology GR t G R b Mp
would reach mass-production state. Thus the SLT program was GR c G R b GR
an intermediate design prior to integrated-circuit technology. Mp c G R b Mp
The two approaches are about the same from the standpoint MptMp b Mp
of reliability, especially when one considers the soldered
printed-circuit mounting. The number of connections to the The 360 ISP provides only the first two. Additional instructions
printed-circuit board are about the same. The production tech- (or modes) would increase the instruction length.
Section 3 1 The IBM System/36O-a series of planned machines which span a wide performance range 565
In the System/360 the only advantage taken of general 1 Program state word, including the instruction counter (2
registers is to make them suitable for use as index registers, words)
base registers, and arithmetic accumulators (operand storage). 2 Sixteen general registers (16 words)
Of course, the commitment to extend the general-purposeness
3 Four 2-word floating-point general registers (8 words)
of these general registers would require more operations. Chap-
ter 3 (page 61) suggests advantages for general register
Many instructions must be executed (taking appreciable time)
organizations.
to preserve the Pc state and establish a new one. A single
The 360 has a separate set of general registers for floating-
instruction would be preferable; even better would be an in-
point data. This provides more processor state and temporary
struction to exchange processor states, as in the CDC 6600
storage but again detracts from the general-purpose ability of
(Chap. 39).
the existing registers. Special commands are required to ma-
nipulate the floating-point registers independent of the other
general registers. Unfortunately the floating-point instruction Addressing and multiprogramming
set is not quite complete (e.g., fixed- to floating-point conver- The methods used to address data in Mp have some disad-
sion), and several instructions are needed to move data be- vantages. It is impossible to fetch an arbitrary word in Mp in
tween the fixed and floating registers. a single instruction. The address space is limited to a direct
When multiple data-types are available, it is desirable to have address of only 212 bytes. Any Mp access outside the range
the ability to convert among them unless the operations are requires an offset or base address to be placed in a general
complete in themselves. The System/360 might use more data register. Accesses to several large arrays may take significant
conversion instructions, for example, between the following: time if a base address has to be loaded each time. The reason
for using a small direct address is to save space in the in-
1 Fixed precision integers and floating-point data struction. We know of no published attempt to analyze the
tradeoffs, even of instruction efficiency alone, although un-
2 Address-size integers and any other data
doubtedly such comparisons were made within IBM.
3 Half-word integer and other data Another difficulty of the 360 addressing is the inhomogeneity
4 Decimal and byte string and other data (decimal string of the address space. Addressing is to the nearest byte, but
to and from byte string conversion is provided) the system remains organized by words; thus, many addresses
are forced to be on word (and even double-word) boundaries.
For example, a double-precision data-type which requires two
Some of the facilities are redundant and might be handled words of storage must be stored with the first word beginning
by better but fewer instructions. For example, decimal strings at a multiple of an 8-byte address. (However, the Model 85,
are not completely variable-length (they are variable up to 3 1 which is a late entry in the series, allows arbitrary alignment
digits, stored in 16 bytes), and so essentially the same arith- of data-types with word boundaries.) When a general register
metic results could be obtained by using fixed multiple length is used as a base or index register, the value in the index register
binary integers. This would remove the special decimal arith- must correspond to the length of the data-type accessed. That
metic and still give the same result. If a large amount of fixed is, for the ith value of a half integer, single integer, single
field decimal or byte data were processed, then the binary- floating, double floating (long), and quadruple floating (ex-
decimal conversion instructions would be useful. tended), i must be multiplied by 2, 4, 4, 8, and 16, respectively,
The communication instructions between Pc and Pi0 are to access the proper element.
minimal. The Pc must set up Pi0 program data, but there are A single instruction to load or store any string of bits in Mp
inadequate facilities in Pc for quickly forming Pi0 instructions (as provided in the IBM Stretch) would provide a great deal of
(which are actually yet another data-type). There are, in effect, generality. Provided the length were up to 64 bits, such an
a large number of Pio's as each device is independent of all instruction might eliminate the need for the more specialized
others. However, signaling of all Pio's is via a single interrupt data-types.
channel to Pc. A basic scheme for dynamic multiprogramming is nonexist-
The Pc state consists of 26 words of 32 bits each: ent (i.e., although static multiprogramming is done, relocation
566 Part 6 1 Computer families Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range
hardware is not present). Only a simple method of Mp protec- the Model 44 (complete compatibility can be purchased as an
tion is provided, using protection keys (see Chap. 43, page 597). option). We take up the main group first and then discuss the
This scheme associates a 4-bit number (key) and a 1-bit write others i n d ividua Ily .
protect with each 2 kby block, and each Pc access must have
the correct number. Both protection of Mp and assignment of Models 30, 40, 50, and 65
Mp to a particular task (greater than 24 tasks) are necessary The PMS of Models 30, 40, and 50 is the tree-structured Mp-Pc
in a dynamic multiprogramming environment. Although the shown in Fig. 2.l They all use a P.microprogram, although
architects of System/360 advocate its use for multiprogram- with different ISP's. Some gross characteristics are given in
ming, the operating system does not enforce conventions to Table 1. The Pc of Model 65 is also microprogrammed, but it
enable a program to be moved, once its execution is started. has hardwired Pio's. A PMS diagram of Model 65 (and Model
Indeed, the nature of the 360 addressing is based on absolute 75) is given in Fig. 3.
binary addresses within a program. The later experimental The C structures with M(R0S) use a single physical P.mi-
Model 67 does, however, have a very nice scheme for protection, croprogram to realize the Pc, the Pio('Mu1tiplexor Channel),
relocation, and name assignment to program segments [Arden and the Pio('Se1ector Channel). This technique of using a single
et al., 19661. shared physical P for multiple logical P's with fast changing
of P.state is the same one that Pio('Mu1tiplexor) uses. The
PMS structures and implementations of the computer
'The structure of the Mp's does not include the local M's used for access control,
The PMS structures of the various models in System/360 are i.e., the storage protect key mechanism, which it is hoped the student will forget
basically similar, except for the upper end of the series and for about (forever).
T. c o n s o l e -
=See T a b l e 1 f o r p a r a m e t e r s .
L ( Se I e c t o r ,Mu I t i p l e x o r Russes) -
T.console-
M P (#O: 3) ic(("2065; microprogramed)1 '2075; ,s?e "&ZP I )
K('Direct)
Mp(#0:3)?.- P('2870) := [-ST Pio(#l :192)4-
pi o (#I :4)5-
P('2860) := [-S--Pio(#l:3)"--
Stm-
Sfx-
Sfx-]
3 -K(#0:19I7)"
-K(#0:7)'
- K(#0:7)'
P('2860) := ES-Pio(#l:3)6- Sfx-] -K(#0:7)'
'Mp('2365-3) := (Mp(#O,l; '2365-2; core: .75 us/w; 8 by/w; 16 kw; ( 8 , l parity) b/by)-S-)
"Mp('2361-2 Large Capacity Store/LCS; 8 us/w: t.access: 3.2 u s ; 262 kw: 8 by/w; ( 8 , l parity)
b/by)
3 S ( 8 M: 4 P: time multiplexed; concurrency:l: 'Bus Control Unit/BCU)
4Pio('2870 10 Multiplexor Channel)
5Pio(12870 IO Selector Subchannel)
oPio('2860 Selector Subchannel)
70nly 8 physical K's
'See Figures 1 1 to 16.
Fig. 3. PMS structure for IBM System/360 Models 65 and 75 PMS diagram.
Pio('Multip1exor) is equivalent to multiple Pio's. Within the in Chap. 32, page 386. Tables 2 and 3 in Chap. 44 give the
physical P both interrupts and polling are used to switch among additional parameters which influence the instruction inter-
the P's. Polling is used to service the several P's since the main pretation rate of the P.microprogram. The significant param-
program loop of the ISP interpreter returns to a common point eters for a P.microprogram are the M(R0S) hardware char-
each time the next instruction is fetched. That is, the interpre- acteristics (speed, size, and information width); the number
tation cycle for the 360 ISP starts by fetching the instruction, of fields in the M(R0S) instructions, which gives an indication
proceeds to fetch the operands, executes the instruction, and of the number of control functions performed in parallel; the
then returns results to Mp. The instruction-interpretation proc- M(genera1 register) rates and their location in the structure;
ess takes only a few Mp references for most instructions. the Mp data rate; and the characteristics of M(temporary)
A few instructions require a long (or indefinite) interpreta- within P. The activity of transferring data from a K, via the
tion time, e.g., character translate, edit, etc., since the opera- Pio('Selector), is done concurrently with normal instruction
tions are on character strings. Here, the iterative program loop interpretation in Models 30, 40, and 50. A program in M(R0S)
which operates on each character of the string must test the sets up the data transmission with Mp, and transmission is
attached K's to detect when the Pi0 interpreter is to be run for controlled by an independent hardware control.
data transfers. The long instructions can take several hundred
Model 20
microseconds and cannot be interrupted; thus the response
time for an interrupt can be very poor. Figure 4 gives a simpli- This model is a subset of the System/360. It has eight 16-bit
fied picture of the registers organization of a Model 50, but it general registers. It is possible to write programs which will run
is also typical of Models 30, 40, and 65. on both the Model 20 and other models. Model 20 does not
The actual System/360 ISP interpretation program in each have Pio's, and Pc issues instructions to control the attached
of the models is different. In addition, each model has micro- K's.
programs for interpreting other ISP's through emulation. Tucker
[1967] discusses how the models were changed as the emula-
Model 25
tion constraint was added. Table 1 gives the computers which The Model 25 is an interesting C. Perhaps some of the interest
each of the models can emulate. A register structure of the of the authors is caused by the mystery (to the authors) as to
C('30) and the operation for the P.microprogram ISP are given what its ISP is. Its ISP is no doubt described in maintenance
5 6 8 Part 6 1 Computer families Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range
ROS
Reod Only Storoge
Micro-Coded Sequencing
Control
- 0
M a i n Storoge
Multiplexer Channel
Control Storage
Local Storage
General Regirten
Floating-Point Registerr
Selector Channel
Control Storage
Working Registers
Doto tronsfen
Processor to storage 4 bytes
Storage to rtomge 4 bytes
Selector channel to procersar 4 bytes
Multiplexer channel to processor 1 byte
Control unit to channel 1 byte
Fig. 4. IBM System/360 Model 50 data-flow diagram and system characteristics. (Courtesy of International
Business Machines Corporation.)
Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range 569
manuals. We can make the following observations based on its We assume that if the P.microprogram, which is used to define
characteristics taken from its manual of Functional Character- the System/360 ISP, were used to interpret a FORTRAN ISP,
istics. These appear in Table 1. The observations are: the speed for a Model 25 FORTRAN ISP might easily approach
that of the Model 50.
1 It has a very high-performance Mp, namely, Mp(core;
.9 pslw; 16124132148 kby; 2 by/w); the Mp power is al- Model 44
most that of a Model 50. Model 44 does not use M(ROS), but its Pc and Pi0 are hard-
2 There is a relatively straightforward Pc which is micro- wired (Models 75 and 9 1 are also hardwired). The PMS structure
programmed. The Pc uses Mp for its memory. The Sys- of the Model 44 is given in Fig. 5. Model 44 (and 91) stand
tem/360 ISP is defined in conventional M(read,write). out as having better performance per unit of cost than their
Of the Mp(48 kby) 16 kby is reserved for a microprogram. nearest neighbors, which are implemented with M(ROS), as can
3 Its performance is between that of Models 20 and 30, be seen from Table 1. It must be noted that Models 44 and
performing a 360 ISP instruction in about 80 p s . 9 1 are not strictly compatible with the 360 ISP since they do
4 The penalty paid (slowdown factor) to interpret the 360 not process variable-string and variable-decimal-data formats,
ISP is therefore 8011.8 N 45. although Model 44 options can make it completely compatible.
(Subroutines will probably perform satisfactorily for most ap-
5 A small 180-nanosecond local store is used for operands.
plications.)
6 The Pc cost appears to be about the lowest in the series. The PMS structure of the Model 44 (Fig. 5) is a tree. The
C('44) structure indicates 2-Pio('High Speed Multiplexor Chan-
We should ask ourselves: nels/HSMPX) which are between a P('Se1ector) and P('Multi-
plexor) in power, since a single physical P('HSMPX) with four
subchannels can behave as four independent Pio's. The orga-
1 Why do we want an intermediate-level P.microprogram
with its own M.read-only, as in the other processors? nization of the Model 44 Pc registers is given in Fig. 6, which
These P's just seem to waste power. reveals a straightforward implementation. The heavy lines in
Fig. 6 indicated an ORing of register outputs to form a single
2 Why should we bother to implement an intermediate-level
data bus (usually 16 or 32 bits wide). The 16-bit crossover
360 ISP? We know the final user will write programs in
a much higher level language. Thus two levels of inter- function box allows the right and left halves (16 bits) of the
pretation are required instead of one. It is assumed that input to be exchanged when output. Almost all the units are
to program a given task will take, say, x p s if using the registers (except the adders, parity generators, and ORers). The
360 ISP. We assume the same task programmed directly A, Ax, B, and Bx registers are used as the M.working for per-
in the Pc could take as short a time as x / 4 5 ps if the Pc forming instructions, where the x indicates an extension regis-
were used directly. ter used in the 64-bit floating-point operations. The C register
T . consol e -
Mp c o r e ; l us/w; 8192 .- -Stm
]
P i o ( l M u l t i p l e x o r Channel )-Stm -K(#0:63l )7
1
1:4: ' H i g h Speed M u l t i - S f x - K ( # O : I ) ' -
'Only 8 logical K ~ S
'See F i g u r e s 1 1 t o 16.
Doto O u t to
Channelr
Sixteen-bit
I 1 1 1
I
I
7 Function
Address
21-23, ets
1
} From HSMPX
= B i t numbers
* Includes por;h/
f High-Speed Generol R e g i s t e r s
t Con be dirploved on system control panel
Fig. 6. IBM System/360 data flow in Model 44 CPU. (Courtesy of International Business Machines Corporation.)
Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range 571
is a second operand register used for arithmetic and logical ming and memory mapping. Because of software uncertainties,
operations. the Model 67 ran as a Model 65 in most installations (in 1968).
The University of Michigan and M.I.T.'s Lincoln Laboratory, the
Model 75
first two customers having considered the MULTICS proposal,
The PMS structure of Model 75 is given in Fig. 3. Models 65, were instrumental in outlining the specifications [Arden, et al
67, 75, and 9 1 all use the same basic Mp('2365; core). The S(n 19661. Several 67's have been delivered, and the software con-
Mp; mP), which switches between the n Mp modules and the tinues to evolve and be scheduled for completion (see Fig. 1).
m Pc and Pio's, varies with model, however. C('65) and C('75) Questions of costs per console must wait until the system is
use a simple time-multiplexed S in Pc, called the S('Bus Control stable enough t o test and evaluate, although in April, 1969
Unit/BCU). This S makes decisions about which P is to use IBM considered the system attractive (operational) enough to
which Mp, rather than having each Mp arbitrate the P request- market. The most significant outcome of the experiment to
ing service locally. When the memories are all about the same date is:
speed, such an S is all right; however, it has severe limitations
when slow speed (8 microseconds for the large core store) and The hardware seems capable of supporting a straight-
high-speed memories (0.75 microsecond) are intermixed. The forward time-sharing system [Corbato et al., 19621. Had
IBM first developed a simple system based on proved
principal difference between Models 65 and 75 is that C('75)
concepts, they would be capable of undertaking research
is hardwired and, depending on the size of the configuration,
into more complex systems like the version to which they
may have lower cost/performance.
originally committed themselves. (Vendors should have
The simplified functional unit diagram of C('75) (Fig. 7) is some basis of actual operating experience before com-
more abstract than the register interconnection diagram of a mitting a product to market.)
C('44) (Fig. 6). From this description (Fig. 7) of the logic design,
The problems of building really large-scale software sys-
one is able to conjecture what is necessarily within the instruc-
tems are not fully understood yet.
tion, execution, variable field length, and decimal functional
units. The diagram is presented at a nonuniform level at both The idea of a virtual memory with a large address space
the PMS and register-transfer levels. There is somewhat more (232w)is excellent. Many storage allocation problems are
detail than in the PMS structure (Fig. 3). The Model 75 is simplified by this concept. Unfortunately, the system
software builders seem well on their way to filling such
possibly the first System/360 to require an intermediate-level
a memory. Thus the new freedom allows relaxation in
diagram between a PMS structure and a register-transfer dia-
this level of programming.
gram. The instruction unit contains the instruction location
counter (part of the ISP) and is responsible for obtaining the There is a problem of getting users into Mp.core so that
next instruction and the operands. Since there can be overlap Pc can be kept busy. Thus a swapping system is often
found waiting for Ms.drum or Ms.disk information. Work
in the instruction fetching process, this unit is responsible for
holding a number of instructions and stores up to 128 bits
(2 double words) of instructions at a time. The execution unit
.5 -
at Carnegie-Mellon University using a Mp('LCS; core;
1 mw; 8 by/w; 8 ps/w) seems to indicate that a
large number of users can have adequate response from
and the variable field and decimal units carry out operations the Model 67 if the users reside in core and are not
on data. The execution unit processes floating-point and subjected to swapping [Lauer, 1967; Fikes et al., 19681.
fixed-point data.
The above items relate to the software. The hardware (Fig.
Model 67
8) is interesting from several aspects. First, there are adequate
The Model 67 was introduced in April, 1965, for the purpose facilities for memory mapping and program segmentation. This
of time sharing. The entry was prompted by M.I.T.'s project general scheme is outlined in Fig. 9. In the Model 67 a user's
MULTICS. M.I.T. had ordered a GE 645 for experimental re- segment and page maps are in Mp, and these maps point to
search in time sharing. IBM formed a group for the development physical Mp blocks of the program. Each time a reference is
of a time-shared computer and responded with the Model 67. made, the map is checked for the actual reference. In order
The Model 67 is essentially a Pc('65) with adequate S's for to avoid the accesses to Mp for each Mp reference, a K, with
multiprocessing and a K between Mp and Pc for multiprogram- an M(content address), is located between Pc and Mp to trans-
572 Part 6 I Computer families Section 3 1 The IBM System/36O-a series of planned machines which span a wide performance range
99999 Q 9
Multiplexor Selector Selector
Channel Channel Chonnel
Eight Bytes
'
2365 Processor Storage
(Main Storage)
. ~~~h~
Bytes
Four Floating-
16 General Point Regirterr
Regirterr
Eight
1
Eight Bytes Eight
Byter Bytes
I
I Variable Field
Instruction Unit Execution Unit I Length and
1 Decimal Unil
I I
Eight
Byter
Exponent
Adder
I
F~~~ Bytes Eight Bytes
I One Byte 1
oneByte
* One byte oddreir byparr
Fig. 7. I B M System/SCO Model 75 data-flow diagram and system statistics. (Courtesy of International Business
Machines Corporation.)
Section 3 I The IBM Systern/360-a series of planned machines which span a wide performance range 573
T.console
1
'Dynamic Address Translation1 -Pc(#O:I;
-
'2067)-~(tni~~~~) -
S(#O:l; *2846 Channel Controller) P io('2870: # (0 : 191 ) , ( 1 :4 ) )?
-E pio('2860; #1:3)-
pio(I2860; # I :3) -
form a 24- or 32-bit virtual address in Pc into an actual 19- to Channel Controller). It is used to arbitrate the Pi0 accesses to
22-bit physical address in Mp. This K is not shown in Fig. 9 MP.
because it is not logically necessary. The scheme suggested Without multiprocessing, the Pc seems very badly mis-
in Fig. 9 uses control bits in the map to determine legal Mp matched with respect to Mp. Consider, for instance, the data
accesses. In the Model 67 the storage key mechanism holds rates on the C('67). From Fig. 8 its maximum possible Mp
whether a given page can be accessed by a given numbered data rates are:
user (instead of associating the control with the mapping as For 1 Mp('2365-12):
shown in Fig. 9).
Second, the Model 67 is the first acknowledgment by IBM 64 bits = 171 megabits/sec
of multiprocessor computers, since it provides adequate 0.75 p~
switching to allow multiple Pc's. The C('65) multiprocessing and for 1 Mp('2361 Large Core Store):
configuration has been introduced based on Model 67 structure.
Multiprocessors are necessary for reliability, not solely for per- _bits
_64 __- - 8 megabits/sec
formance reasons. 8 PS
The PMS structure of C('67) in Fig. 8 does not have to use Thus the total data rate is
the S('Bus Control Unit/BCU),I as in the C('65). The C('67) can
have an S in each Mp, so that four P's can communicate with 171 x 8 +8x 4 = 1,368 + 3 2 megabitslsec
= -1,400 megabits/sec
an Mp, as shown in Fig. 8. Each Mp makes the decision about
the P request to be honored next. Thus the problem of having The processing rate is approximately
an "all knowing" S('BCU) is solved by allowing each Mp to do
local scheduling, rather than having a dialogue with another ~-
64 bits - 29 megabits/sec
2.2 ps
component (with time delays). The S('BCU) in a duplex C('67)
is still present, but with less power, in the form of the S('2846 An Ms.drum rate is approximately
' A system with only one port at Mp, controlled by BCU, is called a simplex. A
system with multiport Mp is called a duplex.
8b
P
'" = 10 megabits/sec
574 Part 6 1 Computer families Section 3 I The IBM System/36O-a series of planned machines which span a wide performance range
---
User segment table register loss. Since the M.working is necessary to store the Pi0 state,
the additional space for buffering is not expensive. An alterna-
tive design might use Mp for this buffering.
Segment tableZ 1 I
The four Pio('2860 Selector Channel)'s are implemented as
independent Pio's, using conventional hardwired logic and
buffering. However, they are packaged as one unit.
Segment
Model 85
The Model 85 was announced in February, 1968, with the goal
of being the highest-performance Model 360 in production. The
performance is -(3 -5) times the Model 65 and in some cases
outperforms a Model 9 1 [Conti et al., 19681.
'
----- -
Page tables for segments2 The PMS diagram of the Model 85 is shown in Fig. 10. The
Pio, T, Ms structure is identical to that of Models 65 and 75
(Fig. 3). The two interesting aspects of the structure in Fig. 10
table
length are the M(content addressable; 'Buffer Storage; 16 I32 page;
-
1024 bylpage) and the Pc. The pages are filled in groups of
Address translation (user maps) 64 bytes, as references to a particular physical block in Mp.core
Primary memory component are made. Conti [1968] gives running times for various pro-
within page
grams as a function of buffer memory size. Multiprogramming
may degrade the performance more than any other case. This
process, which has been referred to as "look aside," or a "slave
"+'an addition operation memory," was suggested by Wilkes [1965]. It is completely
access and activity infoimation(read,write,read orly,etc) analogous t o the Model 67 M(content,addressable; 8 w) which
located in primary memory during execution
is used to hold the segment-page map for a multiprogrammed
time-sharing system. It is also analogous to a one-level storage
system (Atlas; see Chap. 23) which is formed from two physical
M's whose performance differs significantly. Here, the effect
Fig. 9. Memory allocation using pages and segments.
is to try to approximate a computer with a large Mp(80 ns/w)
by using a large Mp(1 p / w ) and a small Mp(80 ns/w). The
CDC 7600 (page 475) has a similar structure, but the Mp-Ms
Thus, for the several P's, an effective Mp request rate of 100 migration is under programmed control.
megabits/sec might be needed. The data-flow mismatch (be- The P.microprogram used for controlling the Pc(K('Exe-
tween Mp and the P's) occurs because of the P's, the S (the cution Unit)) allows for great flexibility in the definition of ISP's.
L's connecting P and Mp), the lack of P's, and the fact that An Mp(500 w) is available for the user: this may be loaded by
t.access = -
'/z t.cycle. a program, and it specifies an ISP. One standard option is to
The Pio('2870), used in Model 65 and above, is described emulate the 704-7094 series.
at two structural levels in Fig. 3. The Pi0 includes a large The Model 85 removes the restriction of aligning words at
M.working to store the state of each of the logical Pio's. This particular boundaries. Thus any logical word, independent of
Pi0 state includes the instruction location counter, the control its length, can be located at any physical location addressed
state bits (active, running, interpreting an instruction, process- in bytes.
Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range 57!
K
Mp1-k ('Storage Control) T(#l :3)-L(#l:3)2-
I
M('Buffer Storagel4 L L ( i n : 16 by; out: (8,16)by)-Pc6-T.consoles -
L L('Direct) -
"Pc :=
-L
Mps(4 w; 8 by/w)
7
C .microprogrammed
L
M. buf fer
M.pararneter(read o n l y ; 80 nsfw: 2000 w)
M.parameter(read wci te; 80 ns/w; 500 w)
I
Fig. 10. IBM Systern/360 Model 85 PMS diagram.
The Pc's data operation performance is impressive. A fixed- Model 92 was a paper machine,l and the Model 95 was unan-
point multiply is done in 0.4 ps, and a floating-point multiply nounced but produced, a version of the Model 9 1 with an Mp(in-
takes 0.56 ps (not including accesses). tegrated circuit; 60 ns/w; 8 by/w). The Model 91 is not covered
The data-type, extended floating-point number, is used in in any detail here because of space limitations. It is similar to
Model 85. Thus a 24-, 56-, or 112-bit fraction part can be used. other very large computers in that many techniques are em-
ployed to obtain parallelism. The January, 1967, IBM Joournul
Model 91 of Research1 is devoted to design issues of the Model 91.
This model has a very low cost/performance ratio (see Table
1). Only about 20 Model 91's were produced before it was Models 1130 and 1800
withdrawn from the market. It has the highest performance of These computers are presented as reference points and have
the series. The Mp is 0.75 ys, but 16 are overlapped to provide nothing t o do with the C('360). They are implemented outside
a theoretically maximum bandwidth of 16 X 64/0.75 = 1,370 the System/360 framework but use its technology, and so cost
megabits/s. About 2.5 mega-instructions/s are executed; thus, com pa risons are sti II somew hat mean ingf u I. These com puters
a total of 160 megabits/s of Mp are absorbed by Pc.
There are other interesting models in the '90 series; the 'See bibliography a t the end of this chapter
i 7 6 Part 6 1 Computer families Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range
are straightforward, and for a given task which does not use
floating-point arithmetic, they should perform as well as any - ;Cc 0' i 0 ))
System/360 model. The arguments we use for the intermediate K(IChannel t o Channel A d a p t e r .
Pc for the Model 25 apply equally well here, too. Namely, why I used t o trans,fer d a t a among 2 c f s )
have such a complex ISP when simple ones will do just as well? - L ( C( ? i o ) )
the same for Model 30 and the slower 1800. The cost/perform- 1 use8 in place a f regular c h a n n e l ) )
ance is especially low with the 1130 (Table 1). I n Chap. 3 3 we P(b1ock t r a n s f e r ; ' S t o r a q e t o Storaqe Channel)
I
are not especially interesting, but they give an idea of the
-L(S('Selector Channel, Models 44, 65, 75:
behavior and parameters. For example, the expression T('1403
used in p l a c e o f regular Channel)
Model 3; line; printer; 1100 line/min; 132 char/line; 8 bits/ P (array: '2938: microprogrammed: Mps(- 64 w; 32 b/w):
character; 6 4 -
240 character set) pretty well describes a operations: ( v e c t o r move, v e c t o r m u l t i p l i c a t i o n ,
typical line printer. From the above description one can de- v e c t o r i n n e r p r o d u c t , sum o f v e c t o r e l e m e n t s , sum o f
duce the data rate of a T(line printer). It is 132 char/line X squares, c o n v o l u t i o n , d i f f e r e n c e e q u a t i o n , f i x e d f l o a t
1100 line/min X 1/" m i n / s X 8 b/char = 19.4 kb/s. i n g conversion); data lengths; s c a l a r , v e c t o r , m a t r i x ;
data-types: fixed, f l o a t i n g )
in a bus (or chained) fashion. Such a single interface to handle a multiprogrammed environment requiring programs to be
a wide range of needs (high and low response and data rates) moved.
via a single set of electrical conductors requires a great deal
of control information to be passed along the link. Therefore The 2938 array processor. The P.array('2938) is an extremely
a K must have a great deal of knowledge of the dialogue in interesting special P (Fig. 11). It can be connected to Models
order to communicate. The hardware to attach to the 1/0 bus 44, 65, or 75. It has a limited instruction repertoire, but the
at a K is costly and must be designed carefully. The K('SCU) instructions it interprets are more complex than those in the
provides a rather simplified interface to the Pio. All 1/0 bus ISP of the Pc. The instructions are algorithms for operating on
synchronization control, communication protocol control, an array (a vector or a matrix). These instructions include:
buffering, and electrical isolation are within K('SCU). The
1 Vector move, similar to the P('Storage to Storage) de-
K('SCU) is fairly flexible, in that devices connected to it can
scribed above, with conversion either way between fixed
communicate with one another without Pi0 (see Fig. 11).
and floating point
-L ( # I :2 ) - S f x-K ( 2R41)- Sf x-
'-L(Pio(('Selector) 1 ('Multiplexor))) -
'-L ( P i 0 ( ' S e l e c t o r ) ) -
Fig. 12. IBM System/360 Ms(drum, disk, data cell) PMS diagrams.
878 Part 6 I Computer families Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range
-
M: ('2415; maqnetic t a p e :
18.75 i n / s : area: 1.5 i n x I800 f t )
[model. !f: b y / i n : b/by). (
( I : 1 : 2 : 200,556,800: (6+1),(8+1))l
(2: 1 : 4 : 200,556,800: (6+I),(X+I))/
( 3 . I : 6 : 200,556,Rno: ( h + l ) , (8+1))
(4; 1 : 2 : 200,556,800,1600: (8+1))1
(5: I : 4 : 200,556,R00,1600: (8+1))(
(6: 1 : 6 ; 200,556,800,1600; (R+I)))
-
- L -K('2802)-Sfx4-Ms
K ( ' 2 4 0 3 ) := (
- L- K('2803)--SfPrM5(#I: '2401" 1 ' 2 4 0 2 3 ) -
Ms(#2:R: '2401" 1'2402') - 1
K ( ' 2 4 0 4 ) := (
- L- K('2804)-Sfx4 Ms(#l: '2401") '2402') -
TMs(12:R: I240l2 1 ' 2 4 0 9 ) -)
- L(#l:2)-SSFx-K('2807)--Sfx~Ms
- -K 1
L(#l:2) #1:2;
[2804)
-S x:
I n :
out:
I
I : R : '2401"
T m a g n e t i c tape
'24023 :
1 -
"-
- L(to:Pio('Selectorl'Multiplexor)) -
Mr '2401: m a a n e t i c t a p e : a r e a : ( . 5 i n x IROO ft):
--
(model; i n / s : by/in: b/by): (
(I: 37.5; 200,556,ROO: (6+l),(fi+l)) I
(2; 75: 200,556,800; (6+1),(R+l))1
(7: 112.5: 2 0 n , 5 5 6 , ~ 0 0 : ( ~ + I ) , ( R + I1 ) )
(4: 37.5; 200,556,800,1600: (X+I)) I
(5; 75; 200,556,800,1600; (R+I)) I
(6: 112.5, 2oo,556,Roo,16oo; (1(+1)))
J
makes it possible to construct complex algorithms in a flexible algorithm has gone to zero. A measure we might apply to a
manner. The hardware logic is capable of doing a combined P is the ratio of the time it spends fetching the algorithm's data
floating-point multiplication and addition in 200 nanoseconds. t o the total time it spends executing the algorithm. In a con-
The impressive results this P achieves in the interpretation of ventional computer Pc we suggest that a ratio of nearly % is
the algorithms are principally because the time to access the very good. Two fetches are usually required-one for data, one
Section 3 I The IBM System/360-a series of planned machines which span a wide performance range 579
1
Secondary-memory structure. Figures 12 and 13 present the Ms
PMS structures. All the K's have an optional S, which can be
placed between the K and the S(P;K) to allow two Pio's to access
a common K (from either of two C's or two Pio's of the same
C). The K('2841 Storage Control) is interesting only in being T ( # l :24; t y p e w r i t e r p r i n t e r ) +
M ( b u f f e r ; 16384 b y )
able to control a series of quite disparate devices, on a one-at- I
-L2-K('2840-1)-Stm T #1 : 6 ; '2250-2: (CRT:
a-time basis.
[~::PF
d i s p l a y ; a r1024 x 12
e a : x12 1024
Figure 13 presents all the M(s; magnetic tape)'s. The
switch is interesting as it can be used for up to four K's to p o i n t / p a g e ) ; (keyboard :
access simultaneously any of 16 M.tapes. (The vast array of
very similar devices is due undoubtedly to marketing rather than T ( # 1 : 6 ; l i g h t ; pen: input)+
production or engineering reasons.) It should be noted that T '2280; film; writer; 35 -
there are two distinct M.tapes: conventional magnetic tape and rnm; 4096x 4096 p o i n t / p a g e .
-L --?(I25201 card: punch; ('model 82; 500 card/min)](Vmdel 63; 300 card/min))i
- L -K('282l)-
I '2671-1; paper tape; reader; I kchar/s; c
5,6,7,8b/char: area: -1 x .1 in2/char 1
-L -K('2821)--5(3T)
- L-KT
[I
'1231-NI; optical; pencil mark page; reader; area: (8.5 x 1 1 ) in*/page;
1.8 s/page 1 1-
- L-KT
C '1285; optical; printed character roll paner; reader; width: (.9375w 3.5)
'n: 22 char/col: 300 char/s
I t
- L -KT
c '1287 Models 1 and 2: optical: rnader: handprinted; roll, document:
area: (2.25 x 3 in2) j(5.91 x 9 in2) 1-
- L -KT
- C -K?('1445
'1418,1428 'Models 1,2,3; optical; typewritten character; reader; area:
2
(2.75 x 3.66 in2) i(5.875 x 8.75 i n )1(2.33 x 4.18 in2)1(3 x 8.75 in2);
[288 - 420 docurnents/min
- L -KT
cmagnetic; character; reader; bank checks; ('1412; 950 document/min)1('1419;
1600 docurnent/min)
L(Pio((Se1ector ('Multiplexor))
I <-
made. The models use essentially the same technology, imple- can be obtained. These costs are expressed as dollars per
ment the same ISP, and are probably constrained by a common second ( $ I s ) to rent the equipment. They have been derived
corporate profit goal. Even here, as we noted earlier, compari- from the IBM monthly rental prices. The computer prices are
sons are difficult to make. based on estimates of minimum, average, and maximum con-
In Table 3 we present the costs for various PMS component figurations in the Adams Computer Characteristics (kurterly
primitives. From this table, costs (relative to other components) [Adams Associates]. The conversion factors are
Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range 581
-L(#1;2)*-
- L2 -K(' 1827)-
Stm-K('2702)-Stm-T(#l
M buffer;
'[3l w 1
Stm
:?,I)-L
50 .-
Telephone L i n e ;
600 b / s ; s t n r t , p t c p ,
asynchronous; t o : T(Dataphone)
-
12740112741 Communications
Terminal; typewriter;
14.8 c h a r / s : 9 b / c h a r ;
(44 x 2 ) symbol/char
-L
133 b / s ,
1 -
]
-L -L -
9[
#1:14;
134.5 b / s :
b/char
-S '2712 Remote
Lultiplcxor I] [full
2 kb/s;
duplex]
-S '2712 Remote
[Multiplexor]
#I 14:
L 3 4 . 5 b/s]
9 b/char
Fig. 16. IBM System/360 T(te1ephone line, analog, typewriter) PMS diagrams.
$ / s = 1 / [ ( 1 7 3 . 3 hour/month) x 3 , 6 0 0 s/hour]
= 1 . 6 x 10V $/month
$/month = 0 . 6 2 5 x lo6 $ / s
The cost to buy, in dollars, is approximately
$ = 4 5 x ($/month)
$ = 45 X 0.625 X lofi($/s) = 2.82 X lo7 X ($/s)
-XS -,-KS
Pc(cost: ($/si$)) : = c.Pc : = cost of Pc alone
Mp(cost.avg) : = c.Mp.avg : = cost of average-size Mp for r-L-r 7 -xs J
a model
C(cost.min:) : = c.C.min : = cost of minimum-size com-
puter configuration
C(cost.avg:) : = c.C.avg : = cost of average-size computer
configuration
'Systed360 l/O I n t e r f a c e Bus
Primary memory "x := (T~MS)
0.1 .
C(65,75)- 0.75 1
C'
0.0 1
-
..
I
v)
-
89 k 5 0 ) - 2.0 p s e c / 4 bytes
r
''words
"
0 .(30)-1.5psec- 1 by
)-1.O psec-4bytes
-5 psec- 2 bytes
3 psec- 1130
3 1 0 ~ ~ ~
t t t
1000 4096 10,000 100,000
I 1 I
12 13 14 15 16 17 18 19 20 21 22 23 24 Mp(l:logn(by
plotted in terms of $/(by/s) and allows us to compute the in electronics), at a small incremental manufacturing cost of
purchase cost of a bit. The purchase cost of most Mp.core is goods.
$0.25/bit, according to the line. The 8-ps Large Capacity Stor- The Mp size range within a model varies by a factor of 8
age/LCS cost is $0.032/bit. There appear to be slight cost for Models 30, 40, 44, 50, 65, and 75, although by only a factor
savings for large Mp's and a significant saving for lower per- of 4 at the ends of the line (Models 20 and 91). The Mp imple-
formance in the case of LCS, a factor of 8. A reasonable formula mentation is usually a single common set of electronics to drive
for Mp cost is: c = (7 x 10" x i)/[t.cycle: ( , U S ) ] . This formula 214 (16,384) words in a square or coincident-current-selection
would account for Model 50 Mp and LCS costs, but not Model system of z7 by P. These square points are indicated on the
25 and 3 0 Mp costs. We really need an i l l 2 term in the formula graph, and they should be the most economical memories.
to make a good fit (and also a constant). The value i l l 2 should Smaller Mp's are implemented simply by using smaller core-
be present, if purchase prices are relcted to manufacturing memory arrays, but with the same basic electronic configura-
costs, because coincident current selecrion cost is inherently tion, e.g., the Model 3 0 above. Larger Mp's are obtained by
proportional to i l l z . replicating the whole Mp system including the core array and
An odd pricing point is the Model 44; it was developed after the electronics.
the other models and is either implemented better or priced An Mp size range of 8 for a given model presupposes a
differently. The anomalies in Mp('65; P4words), Mp('30; P4 certain structuring of problems. That is, the models assume
words), Mp('40; 217 bytes), and Mp('44) are undoubtedly due a fixed relationship between Pc capacity and Mp size require-
to pricing-strategy differences. In the case of the Model 30 the ments. An ideal system might let Pc power, Pc quantity, Mp
incremental cost to increase the Mp size from 213 to 2lcjbytes power, and Mp size be completely variable. These parameters
is the addition of only a different core array (with no change would all be selected independently to match the work load.
584 Part 6 1 Computer families Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range
Central processors numbers the utilization is low because a large number of cycles
have to be available in order to avoid conflicts when a given
The relative Pc powers (in 360 instructions/s) and costs are
cycle is requested-using an Mp with a long txycle. In the case
given in the graph of Fig. 19 and in Table 1. The most signifi-
of Model 25, the cycles are lost because the microprogram is
cant fact from the graph is that the cost/power ratio is roughly
constant for each of the Pc’s (especially if we ignore Model 44 being executed from Mp. (A ratio of 0.045 indicates 2 1 cycles
and Model 50). Figure 19 gives the relative computing power are used for microprograms to every 1 of program.)
versus cost for various configurations. Table 1 also shows a In the case of the Model 30 the power is limited by holding
the general registers in Mp. For example, by using an additional
number of relationships. One interesting relationship (Table 1)
is the ratio of actual Pc power to maximum possible Pc power fast M t o hold the general registers and working data, the Pc
for a model. This can be based on Mp utilization: power could increase. Unfortunately, such a change might
cause the cost of other parts of the system to be increased,
Actual Pc power -
Maximum Pc power
-- Mp cycles utilized by Pc
Mp cycles available
so that it would not be just a simple incremental addition. The
C(’30) performs well for the field-scan problem [Solomon, 19661
(see Table 1). The data structure for the field-scan problem
This ratio must be less than 1 unless there are many Pc’s or coincides with the 1-byte Mp organization. C(’65) and C(’75)
a single Pc has more power than Mp. In every case, the Pc is perform the worst for field scan because of the mismatch
far from fully utilizing the Mp. The technique of buffering in- between Mp organization (8 bytes) and program data (1 byte).
structions in a local Pc memory can increase this ratio to be C(’65) and C(’75) have the same Mp structure and hence
>1 (although no computers ever do so). In the higher model have the same potential power available from Mp. In the case
0.1
I
x Average size C
e Minimum size C
+ Pc only
-a
\
9
I
30
2 0 01
e
a,
oi
.-
c
Y)
a,
Y)
e
a,
.-
+
0
- 30 1 2,O
Lu
+
\
V
0
G?
25
+I13
0.000 I
of C(’75) the power of the Mp is more nearly utilized. Unfortu- served Grosch’s law to hold for Models 30, 40, 50, 65, and 75.
nately for the more complex Mp structures, which have more This line is drawn in Fig. 20 for C(cost.average). Considering
potential Mp cycles, the Pc is not able to utilize them. The C(’65) Models 20, 25, 44, 85, and 91, a line with a less steep slope
and C(’75) have several registers concerned with obtaining the might fit the points better. If we consider C(cost.minimum),
next instruction and holding it for execution while other in- g < 2; considering only Pc, a g = 1 might be appropriate (see
structions are obtained (look-ahead). The hardwired Model 75 Fig. 20) in which the power/cost is essentially constant with
Pc may account for the improvement over the Model 65 P.mi- cost.
croprogra m med.
The performance of C(’20) is inaccurately high since it is
Pc(cost)/Mp(cost.avg) : = c.Pc/c.avg.Mp = - 1.1, the ra-
tio of processor to memory cost
a limited subset of the 360 ISP. (C(’20) does not have float-
ing-point or fixed-point multiply and divide instructions, and it C(cost.min)/C(cost.avg) : = c.min.C/c.avg.C = -
0.47, the
has only eight 16-bit general registers.) The hardwired Model ratio of the smallest computer configuration to an average
4 4 has a better cost/power characteristic than any of the other configuration
C’s, by any measured criteria (see Fig. 19). In the case of the
Model 44, the Pc price also includes Ms.disk. Perhaps the Model
Pc(cost)/C(cost.avg) : = c.Pc/c.avg.C = - 0.23, the ratio
of processor to computer cost
44, designed initially for real-time scientific problem solving,
These are averages over all the series and can be rather
is priced more competitively with similar machines (DEC PDP-10
and SDS Sigma 5, 7), whereas the other models compete in misleading. For example, in higher-numbered models the
a performance-insensitive, competition-free market for gen- C(cost.min)/C(cost.avg) : = c.min.C/c.avg.C is about 0.6.
eral-purpose business data processing. Thus its anomalous whereas in lower-numbered models the ratio is 0.3. We might
position may be due to external market pressures and not have expected this, since it indicates that a higher proportion
manufacturing cost. of system cost is in Ms and T on lower-number models.
The design of the IBM System/360 models is undoubtedly
predicated on the basis that performance or computing power An alternative computer series based on multiprocessing
is proportional to the cost raised to some power, g, greater than In this section we suggest an alternative design providing a wide
1: power = k x costg; where g >
1.’ Almost all models follow range of computing power but using multiprocessing. That is,
the above relationship with g >
1. When g >
1 there is an rather than building a higher-performance model, we would
advantage to have large configurations since the cost/computa- have multiple lower-performance models. On the surface, this
tion will decrease. If g 5 1, then an alternative implementation appears feasible only if the cost of the processor is a relatively
for the 360 C’s would simply use multiple C’s or Pc’s to obtain small part of the computer, and if for a particular configuration
the same power. Unfortunately, such an approach does not there are memory cycles available in the system (so that a more
provide for the interconnection of the components to function costly memory system is not required). It is also desirable that
as a single unit. In many cases a single task cannot be broken the proposed multiprocessor configurations have rather large
into a number of parallel and independent subtasks. If the Mp’s so that it can be assumed there will be several jobs in
performance for the system varied by a factor of 100, then 100 Mp waiting to run; i.e., we should be able to multiprogram rather
Pc’s or C’s would be placed together. From Table 1 we see a than do parallel processing. These conditions are satisfied with
power range of about 314 corresponds to a cost range of 65 the System/360 models. Although we do not address the ques-
to 114 (which tells us g <
2). tion of development cost, it is clear that a multiprocessor
The following discussion takes computing power to be system would have a lower development cost because fewer
measured by instructions per second and Mp (size; t.cycle). processors would be required. Within IBM we can assume that
Costs are measured in dollars per second of rental time. The the development cost tends to go to zero because of the large
graph (Fig. 20) shows the relationship to computing power p production; unfortunately, even for IBM, the training cost for
and costs. The power (actually p.Pc) is taken from the meas- servicemen and salesmen does not go to zero but is propor-
ures of instruction times for certain fixed work. Solomon ob- tional to the number of products. Thus, we would anticipate
savings by having a smaller line.
‘Herb Grosch [Grosch, 19531 first noted this relationship and estimated g to be
2: thus we use g for this exponent. Adams suggested g = y2 [Adams. 19621. The multiprocessor view is presented in Table 4; namely, we
See also The Economics of Computers [Sharpe, 19691. suggest dropping Models 20, 30, 40, 50, 65, 75, 85, and 91.
5 8 6 Part 6 1 Computer families Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range
These would be replaced with only Models 25 and 44. Note there Actually, if we carry this view further and were forced to build
are Pc's in Table 4 (other than 25 and 44) which when multi- such a system, the view that the ideal machines are the Model
processed can perform better for lower cost, e.g., 2 Model 65's 25 and 44 would undoubtedly change. Model 25 and 44 exist
are >1 Model 75, for about the same cost. Admittedly there and can be used for the argument. The reader should note that
are major problems in multiprocessing with 11 Pc's, but other there is a major flaw in our argument using a Model 25. The
existence proofs [Anderson, 19611 have shown that two to four microprogrammed Model 25 Pc cost should include a 16-kby
Pc's can be effective (Chap. 36). If we ignore Models 85 and 91, memory for the microprogram (actually one Mp should be
the worst case is for a maximum of four Pc's needed to obtain included for each Pc to avoid memory-request conflict). Alter-
the power of model 40. Note that in the above cases the proces- natively, if we use the Model 25 directly without a microprogram,
sor cost is about one-half the cost of a single Pc. This factor we would lose performance range. With our present knowledge
of 2 might be used to answer critics of the scheme. The reasons of multiprocessors, a responsible engineer would hardly suggest
against the scheme are: There have to be good switches be- building a multiprocessor system with 11 processors as a sure-
tween Mp and Pc's; there has to be communication among the fire money-making venture. A more reasonable alternative
Pc's (which is about the same as what the Pc-Pi0 communica- would be to use the multiprocessor Model 75 as an alternative
tion should be); and there has to be knowledge of the program to Models 85 and 91. A reasonably safe alternative would be
environment to split tasks apart t o run in parallel. three basic processors and a four-processor multiprocessor
A less radical suggestion is also presented in Table 4: structure. For a power range of 320:1, then the processors
namely, examining the number of processor models which can could be 1, 20, 80, giving powers of 1, 2, 3, 4, 20, 40, 60, 80,
be used to provide processing power for the next highest model. 160, 240, 320. This structure would leave a gap of a factor of
Section 3 I The IBM System/360-a series of planned machines which span a wide performance range 587
Table 4 IBM System/360 Pc (power: cost) and an alternative design based on multiprocessors
5 between a 4 x 1 power processor and 20 power processor. The instruction-set processor for the System/360, based on
The largest gap in the System/360 is a factor of 3 between a general-registers structure, appears to be overly complex, yet
Models 30 and 40. incomplete, because there are so many data types. The address-
ing mechanism and lack of multiprogramming ability make
the System/360 a hard machine to appreciate fully. Although
Conclusions we praise microprogramming as a means of accomplishing
The IBM System/360, by achieving a production record, has compatibility with the past, it appears to stand in the way of
fulfilled its principal design objective. The technical goals, how- getting the most performance from the hardware. Perhaps of
ever, are of interest to us here. The most interesting aspect most significance, the System/360 may have a greater lifetime
of the design is achieving a performance range of 314 to 1 over than any past computer.
a series of models, with a primary-memory size range of 2,048 Selected Bibliography
to 1 for various computer configurations. Thus a user is given
Architecture and logical structure: AmdaG64a (TeagH65)’, Blaa664a2,
a very large set of configuration alternatives. The SLT technol-
BlaaG64b’; General implementations: AmdaG64b2, CartW64, PadeA64l,
ogy, though not integrated-circuit, is certainly of the third gen- StevWH4’; Microprogramming: GreeJ64, TuckSH7, WebeH67; Formal de-
eration. Using SLT the fabrication of the models is superb. scription of Pc5; FalkA642; Performance and reviews: HillJ66, SoloM66;
There is a vast array of secondary-memory and terminal Model 40 modifications for multiprogramming: LindA66; Model 67:
devices to couple with almost any other system. The Sys- ArdeB66, FikeR68, GibsC66, LaueH67; Model 85: C0ntC68~,L i ~ t J 6 8 ~ ,
tem/360 is the first computer to make extensive use of micro- PadeA6s3; Model 91 architecture and technology: AndeD674, AndeS674,
programming. Microprogramming is used for the definition of B0laL67~,F l ~ n M 6 7 ~ La11gT67~,
a, Ll0yR67~,S e ~ h R 6 7 T0maR67~;
~, Model
the System/360 instruction-set processor, but, more important, 92 (proposed): ContC64 (GrimR65a), AmdaG64c (GrimRH5b), ChenT64
microprograms define previous IBM computers so that a user (GrimR65c); Serviceability: CartW64; Other references: AdamC62,
CorbF62, GrosH53, SharW69, WilkM65; IBM reference manuals: IBM
can operate satisfactorily during the interim period when older
System/360 Functional characteristics manuals for each model, IBM Sys-
programs are being updated to use the System/360. There are
tem/3ri0 Confiyrator (diagram) for each model, A22-6821-4 IBM Sys-
provisions for multicomputer structures. Within a single com- tem/360 Principles of Operation, A22-6810-8 IBM System/360 System
puter structure there is adequate means of peripheral switching Summary
so that reliable and high-performance structures can be as-
’( ) denotes the review of previous article.
sembled. Early structures do not provide multiprocessing; we
‘ I B M Systems Journal, vol. 3, nos. 2 and 3, 1964.
have suggested multiprocessing as a technique to achieve the ‘ I B M Systems Journal, vol. 7, no. I , 1968.
same performance-range objectives. The io processor, though 41BM Journul of Research and Devclopment, vol. 11, no. 1, January, 1967.
rather elaborate, provides a certain commonality. “Given in A Programming Language/APL [Iverson, 1962j.
Chapter 43
588
Chapter 43 I The structure of SYSTEM/360 589
- PROCESSING UNIT
INPUT/OUTWT
,--Ja---_T
(MULTIPLE
-
LOW SPEED
SUBCHANNELS)
MAIN
STORAGE I
AND
LARGE
CAPACITY
STORAGE
SELECTOR
(SINGLE
HIGH-SPEED
SUBCHANNEL)
I
I
L--
(SINGLE
HIGH-SPEED
SUBCHANNEL)
--__
r--------- b
I 'k INSTRUCTIONS
I I
I I 11
I
I 1
I
I
COMWTER
SYSTEM
I, INDEXED FIXED-POINT
OPERATIONS
VARIABLE
FIELD-LENGTH FLOATING-WINT
OPERATIONS
I CONTROL I ADDRESS OPERATIONS
I
I b
I
I
I
L -------- 1
I
I
I
I t
16
GENERAL
REGISTERS
tional conditions, loads and relocates programs and data, manages must have an address that is a multiple of 2, 4, or 8, respectively.
storage, and supervises scheduling and execution of multiple pro- Some of t h e various alignment possibilities a r e apparent from
grams. To a problem programmer, t h e supervisory program and Fig. 3.
t h e control equipment are indistinguishable. Storage addresses are represented by binary integers in the
The functional structure of SYSTEM/360, like that of most system. Storage capacities are always expressed as numbers of
computers, is most concisely described by considering t h e data bytes.
formats, t h e types of manipulations performed on them, and t h e
instruction formats by which these manipulations are specified. Processing operations
T h e SYSTEM/360 operations fall into four classes: fixed-point arith-
Information formats
metic, floating-point arithmetic, logical operations, and decimal
The several SYSTEM/^^^ data formats are shown in Fig. 3. An 8-bit arithmetic. These classes differ in t h e data formats used, the
unit of information is fundamental to most of t h e formats. A registers involved, t h e operations provided, and the way t h e field
consecutive group of n such units constitutes a field of length n. length is stated.
Fixed-length fields of length one, two, four, and eight are termed
bytes, halfwords, words, and double words, respectively. In many Fixed-point arithmetic
instructions, the operation code implies one of these four fields The basic arithmetic operand is t h e 32-bit fixed-point binary word.
as the length of t h e operands. On t h e other hand, the length is Halfword operands may b e specified in most operations for the
explicit in an instruction that refers to operands of variable length. sake of improved speed or storage utilization. Some products and
T h e location of a stored field is specified by t h e address of t h e all dividends are 64 bits long, using a n even-odd register pair.
leftmost byte of t h e field. Variable-length fields may start on any Because t h e 32-bit words accommodate the 24-bit address, the
byte location, b u t a fixed-length field of two, four, or eight bytes entire fixed-point instruction set, including multiplication, division,
Chapter 43 I The structure of SYSTEM1360 591
4 WORD --
-t-
WORD -
HALFWORD
1
HALFWORD FIXED.POINT NUMBER
1-
S 15
INTEGER
I FULLWORD FIXED-AINT
NUMBER I I I
S 31
INTEGER
I I I
I
6
71
DECIMAL NUMBER
8
_-______________-___---------------------
8 8
CHARACTER CHARACTER CHARACTER
shifting, and several logical operations, can be used in address tions, CONVERT TO BINARY and CONVERT TO DECIMAL,
computation. A two's complement notation is used for fixed-point provide transition between decimal and binary radices without
operands. the use of tables. Multiple-register loading and storing instructions
Additions, subtractions, multiplications, divisions, and com- facilitate subroutine switching.
parisons take one operand from a register and another from either
a register or storage. Multiple-precision arithmetic is made con- Fzoating-point
venient by the two's complement notation and by recognition of Floating-point numbers may occur in either of two fixed-length
the carry from one word t o another. A pair of conversion instruc- formats-short or long. These formats differ only in the length of
592 Part 6 I Computer families Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range
the fractions, as indicated in Fig. 3. The fraction of a floating-point - 64 through + 63, and permits representation of decimal numbers
number is expressed in 4-bit hexadecimal (base 16) digits. In t h e with magnitudes in t h e range of to
short format, the fraction has six hexadecimal digits; in the long Bit position 0 in either format is the fraction sign, S. The
format, t h e fraction has 14 hexadecimal digits. The short length fraction of negative numbers is carried in true form.
is equivalent to seven decimal places of precision. T h e long length Floating-point operations are performed with one operand from
gives u p to 17 decimal places of precision, thus eliminating most a register and another from either a register or storage. T h e result,
requirements for double-precision arithmetic. placed in a register, is generally of t h e same length as the operands.
T h e radix point of the fraction is assumed to b e immediately
to t h e left of the high-order fraction digit. To provide t h e proper
Logical operations
magnitude for t h e floating-point number, the fraction is considered Operations for comparison, translation, editing, bit testing, and
to be multiplied by a power of 16. The characteristic portion, bits bit setting are provided for processing logical fields of fixed and
1 through 7 of both formats, is used to indicate this power. T h e variable lengths. Fixed-length logical operands, which consist of
characteristic is treated as a n excess 64 number with a range from one, four, or eight bytes, are processed from t h e general registers.
4 4567 00 01 10 11 00 01 10 11
0000 NULL 0
o001
0010
0011
0100
0101 N
0110
0111
lo00
1001
1010
1011
1100
1101
1110
1111
1 I I I
44321 00 01 10 11 00 01 10 11 00 01 10 11 00 01 10 11
m
0001
0010
001 1
0100
0101
0110
0111
lo00
1001
1010
101 1
1100
1101
1110
1111
wd IS0 d n R p r O m a I lor 6 and 7 bit coded Character lets lor mlorm~tlonprocessing Interchange Internrtlonal Standards Orgsnlzrtmn June 1964
Null/ldlc Dc2 Dewce Control ESC Escape
Start 01 headmi DC3 Device control FS File separator
start a1 t e n Dc4 D t V l C C control latloD1 GS Group separator
End 01 text NACK Negative achnaledgc RS Record separator
SYNC Synchronous idle u s U"lt .sp.ntor
ET0 End 01 transmlmon block SP SP~EI. normallynon printing
CNCL Cancel CS2 Currency symWl
EM End of mcdwm Graveaccent
ss start 01 *PCI.I sequence DEL Delete
Logical operations can also b e performed on fields of u p to 256 of Fig. 4 or t h e code of Fig. 5, which is an eight-bit extension
bytes, in which case t h e fields are processed from left to right, of a seven-bit code proposed by the International Standards Orga-
one byte at a time. Moreover, two powerful scanning instrnctions nization.
permit byte-by-byte translation and testing via tables. An impor-
Decimal arithmetic
tant special case of variable-length logical operations is the one-
byte field, whose individual bits can b e tested, set, reset, and Decimal arithmetic can improve performance for processes re-
inverted as specified by a n 8-bit mask in t h e instruction. quiring few computational steps per datum between t h e source
input and t h e output. In these cases, where radix conversion from
Character codes decimal to binary and back to decimal is not justified, the use of
Any 8-bit character set can be processed, although certain restric- registers for intermediate results usually yields no advantage over
tions are assumed in the decimal arithmetic and editing operations. storage-to-storage processing. Hence, decimal arithmetic is pro-
However, all character-set-sensitive 1/0equipment assumes either vided in S Y S T E M / 3 6 0 with operands as well as results located in
the Extended Binary-Coded-Decimal Interchange Code (EBCDIC) storage, as in the IBM 1400 series. Decimal arithmetic includes
594 Part 6 1 Computer families Section 3 I The IBM System/36O-a series of planned machines which span a wide performance range
addition, subtraction, multiplication, division, and comparison. Eight-bit field length specification
The decimal digits 0 through 9 are represented in the 4-bit Eight-bit byte of immediate data (I)
binary-coded-decimal form by 0000 through 1001, respectively.
The patterns 1010 through 1111 are not valid as digits and are The second and third halfwords each specify a 4-bit base
interpreted as sign codes: 1011 and 1101 represent a minus, the register designator (B), followed by a 12-bit displacement (D).
other four a plus. The sign patterns generated in decimal arithme-
tic depend upon the character set preferred. For EBCDIC, the Addressing
patterns are 1100 and 1101; for the code of Fig. 5, they are 1010 An effective storage address E is a 24-bit binary integer given,
a n d 1011. The choice between the two codes is determined by in the typical case, by
a mode bit.
Decimal digits, packed two to a byte, appear in fields of variable E = B + X + D
length (from 1 to 16 bytes) and are accompanied by a sign in the where B and X are 24-bit integers from general registers identified
rightmost four bits of the low-order byte. Operand fields can be by fields B and X, respectively, and the displacement D is a 12-bit
located on any byte boundary, and can have lengths up to 31 digits integer contained in every instruction that references storage.
a n d sign. Operands participating in an operation have independent The base B can be used for static relocation of programs and
lengths. Negative numbers are carried in true form. Instructions data. In record processing, the base can identify a record; in array
are provided for packing a n d unpacking decimal numbers. Packing calculations, it can specify the location of an array. The index X
of digits leads t o efficient use of storage, increased arithmetic can provide the relative address of a n element within an array.
performance, and improved rates of data transmission. For purely Together, B and X permit double indexing in array processing.
decimal fields, for example, a 90,000-byte/second tape drive reads The displacement provides for relative addressing of u p to 4095
and writes 180,000 digits/second. bytes beyond the element or base address. In array calculations,
Instruction formats the displacement can identify one of many items associated with
an element. Thus, multiple arrays whose indices move together
Instruction formats contain one, two, or three halfwords, depend-
are best stored in an interleaved manner. In the processing of
ing upon the number of storage addresses necessary for the opera-
records, the displacement can identify items within a record.
tion. If no storage address is required of an instruction, one half-
In forming an effective address, the base and index are treated
word suffices. A two-halfword instruction specifies one address; a
as unsigned 24-bit positive binary integers and the displacement
three-halfword instruction specifies two addresses. All instructions
as a 12-bit positive binary integer. The three are added as 24-bit
must be aligned on halfword boundaries.
binary numbers, ignoring overflow. Since every address is formed
The five basic instruction formats, denoted by the format
with the aid of a base, programs can be readily and generally
mnemonics RR, RX, RS, SI, and SS are shown in Fig. 6. RR denotes
relocated by changing the contents of base registers.
a register-to-register operation, RX a register and indexed-storage
A zero base or index designator implies that a zero quantity
operation, RS a register a n d storage operation, SI a storage and
must be used in forming the address, regardless of the contents
immediate-operand operation, and SS a storage-to-storage opera-
of general register 0. A displacement of zero has no special signifi-
tion.
cance. Initialization, modification, and testing of bases and indices
In each format, the first instruction halfword consists of two
can be carried out by fixed-point instructions, or by BRANCH
parts. The first byte contains the operation code. The length and
AND LINK, BRANCH ON COUNT, or BRANCH ON INDEX
format of an instruction are indicated by the first two bits of the
instructions. LOAD EFFECTIVE ADDRESS provides not only a
operation code.
convenient housekeeping operation, but also, when the same
The second byte is used either as two 4-bit fields or as a single
register is specified for result and operand, an immediate register-
8-bit field. This byte is specified from among the following:
incrementing operation.
Four-bit operand register designator (R)
Four-bit index register designator ( X )
Sequencing
Four-bit mask ( M ) Normally, the CPU takes instructions in sequence. After an in-
Four-bit field length specification (L) struction is fetched from a location specified by the instruction
Chapter 43 I The structure of SYSTEM1360 595
REGISTER
OPERANDS
1
RR FORMAT OP CODE R R
REGISTER I STORAGE I
I
OPERAND
/
1
c
I OPERAND
2
A
I
1
R X FORMAT OP CODE R X B D
REGISTER I STORAGE I
OPERANDS I OPERAND I
R S FORMAT OP CODE R R B D
I
IMMEDIATE
OPERAND I STORAGE
OPERAND
I h',
2
,.
1
I
SI FORMAT OPCODE
7 8
I B D
1
OPERAND I1 STORAGE STORAGE
I d LENGTHS
1 L OPERAND
1
h
OPERAND
2
h
SS FORMAT OP CODE L L B D B D
counter, the instruction counter is increased by t h e number of of the arithmetic, logical, and 1/0 operations indicate an outcome
bytes in t h e instruction. by setting the condition register to one of its four possible states.
Conceptually, all halfwords of an instruction are fetched from Subsequently a conditional branch can select one of the states
storage after t h e preceding operation is completed and before as a criterion for branching. For example, t h e condition code
execution of t h e current operation, even though physical storage reflects such conditions as non-zero result, first operand high,
word size and overlap of instruction execution with storage access operands equal, overflow, channel busy, zero, etc. Once set,
may cause t h e actual instruction fetching to b e different. Thus, t h e condition register remains unchanged until modified by
a n instruction can b e modified by the instruction that immediately a n instruction execution that reflects a different condition
precedes it in t h e instruction stream, and cannot effectively modify code.
itself during execution. T h e outcome of address arithmetic and counting operations
can b e tested by a conditional branch to effect loop control. Two
Branching instructions, BRANCH ON C O U N T and BRANCH ON INDEX,
Most branching is accomplished by a single BRANCH ON CON- provide for one-instruction execution of t h e most common arith-
DITION operation that inspects a 2-bit condition register. Many metic-test combinations.
596 Part 6 I Computer families Section 3 1 The IBM System/360-a series of planned machines which span a wide performance range
R A A 16
KEY- Storage p r o t ~ t i o n
key
PROGRAM MASK-- Fixed mint overllow
Program status word addresses of the channel and 1/0 unit involved are recorded in
A program status word ( P S W ) , a double word having t h e format the old PSW. Related information is preserved in a channel status
shown in Fig. 7 , contains information required for proper execution word that is stored as a result of t h e interruption.
of a given program. A PSW includes an instruction address, con- Unusual conditions encountered in a program create program
dition code, and several mask and mode fields. The active or interruptions. Eight of the fifteen possible conditions involve over-
controlling PSW is called t h e current PSW. By storing the current flows, improper divides, lost significance, and exponent underflow.
PSW during an interruption, the status of t h e interrupted program
is preserved.
Table 1 Permanent storage assignments
Interruption
Five classes of interruption conditions are distinguished: input/ Address Byte length Purpose
output, program, supervisor call, external, and machine check. 0 8 Initial program loading PSW
For each class, two PSW's, called old and new, are maintained 8 8 Initial program loading CCW 1
in t h e main-storage locations shown in Table 1. An interruption 16 8 Initial program loading CCW 2
in a given class stores the current PSW as a n old PSW and then 24 8 External old PSW
takes t h e corresponding new PSW as t h e current PSW. If, a t t h e
32 8 Supervisor call old PSW
40 8 Program old PSW
conclusion of t h e interruption routine, old and current PSW's are 48 8 Machine check old PSW
interchanged, t h e system can b e restored to its prior state and t h e 56 8 Input/output old PSW
interrupted routine can b e continued. 64 8 Channel status word
T h e system mask, program mask, and machine-check mask bits 72 4 Channel address word
in the PSW may be used to control certain interruptions. W h e n 76 4 Unused
80 4 Timer
masked off, some interruptions remain pending while others are
84 4 Unused
merely ignored. T h e system mask can keep 1/0 and external 88 8 External new PSW
interruptions pending, t h e program mask c a n cause four of the 96 8 Supervisor call new PSW
15 program interruptions to b e ignored, and the machine-check 104 8 Program new PSW
mask can cause machine-check interruptions to be ignored. Other 112 8 Machine check new PSW
interruptions cannot b e masked off. 120 8 Input/output new PSW
128 Diagnostic scan-out a r e a t
Appropriate C P U response to a special condition in the chan-
nels and 1/0 units is facilitated by a n 1/0 interruption. The t The size of the diagnostic scan-out area is configuration dependent.
Chapter 43 I The structure of SYSTEM/36O 597
The remaining seven deal with improper addresses, attempted by the program to await an interruption, for example, an 1/0
execution of privileged instructions, and similar conditions. interruption or operator intervention from the console. In the wait
A superuisor-cull interruption results from execution of the state, no instructions are processed, the timer is updated, and i/O
instruction SUPERVISOR CALL. Eight bits from the instruction and external interruptions are accepted unless masked. Running
format are placed in the interruption code of the old PSW, per- versus waiting is determined by the setting of a bit in the current
mitting a message to be associated with the interruption. SUPER- PSW.
VISOR CALL permits a problem program to switch CPU control The CPU may be interruptable or masked for the system,
back to the supervisor. program, and machine interruptions. When the CPU is interrupt-
Through an external interruption, a CPU can respond to signals able for a class of interruptions, these interruptions are accepted.
from the interruption key on the system control panel, the timer, When the CPU is masked, the system interruptions remain pend-
other CPU’s, or special devices. The source of the interruption ing, but the program and machine-check interruptions are ignored.
is identified by an interruption code in bits 24 through 31 of the The interruptable states of the CPU are changed by altering mask
PSW. bits in the current PSW.
T h e occurrence of a machine check (if not masked off) termi- In the problem state, processing instructions are valid, but all
nates the current instruction, initiates a diagnostic procedure, and I / O instructions and a group of control instructions are invalid.
subsequently effects a machine-check interruption. A machine In the supervisor state, all instructions are valid. The choice of
check is occasioned only by a hardware malfunction; it cannot problem or supervisor state is determined by a bit in the PSW.
be caused by invalid data or instructions.
comparand. The keys are said to match if equal or if either is zero. signals from the channel. This control-unit-to-channel connection,
A storage key is not part of addressable storage, and c a n . b e called the 1/0 interfucr, enables the CPU to handle all 1/0
changed only by privileged instructions. The protection key of the operations with only four instructions.
CPU program is held in the current PSW. The protection key of
a channel is recorded in a status word that is associated with the
110 instructions
channel operation. Input/output instructions can be executed only while the CPU
W h e n a CPU operation causes a protection mismatch, its is in the supervisor state. T h e four 1/0 instructions are START
execution is suppressed or terminated, and the program execution I/O, HALT I/O, TEST CHANNEL, and TEST I/O.
is altered by an interruption. T h e protected storage location START 1/0 initiates an 1 / 0 operation; its address field speci-
always remains unchanged. Similarly, protection mismatch due to fies a channel and an 1/0 device. If the channel facilities are free,
an 1/0 operation terminates data transmission in such a way that the instruction is accepted and the CPU continues its program.
the protected storage location remains unchanged. The channel independently selects the specified 1/0 device. HALT
1/0 terminates a channel operation. TEST CHANNEL sets the
Multisystem operation condition code in the PSW to indicate the state of the channel
Communication between CPU’s is made possible by shared control addressed by the instruction. The code then indicates one of the
units, interconnected channels, or shared storage. Multisystem following conditions: channel available, interruption condition in
operation is supported by provisions for automatic relocation, channel, channel working, or channel not operational. TEST 1/0
indication of malfunctions, and CPU initialization. sets the PSW condition code to indicate the state of the addressed
Automatic relocation applies to the first 4,096 bytes of storage, channel, subchannel, and 1/0 device.
an area that contains all permanent storage assignments and
usually has special significance for supervisory programs. The Channels
relocation is accomplished by inserting a 12-bit prefix in each Channels provide the data path and control for 1/0 devices as
address whose high-order 12 bits are zero. Two manually set they communicate with main storage. In the multiplexor channel,
prefixes permit the use of an alternate area when storage malfunc- the single data path can be time-shared by several low-speed
tion occurs; the choice between prefixes is preserved in a trigger devices (card readers, punches, printers, terminals, etc.) and the
that is set during initial program loading. channel has the functional character of many subchannels, each
To alert one CPU to the possible malfunction of another, a of which services one 1 / 0 device at a time. On the other hand,
machine-check signal from a given CPU can serve as an external the selector channel, which is designed for high-speed devices, has
interruption t o another CPU. By another special provision, initial the functional character of a single subchannel. All subchannels
program loading of a given CPU can be initiated by a signal from respond to the same 1 / 0 instructions. Each can fetch its own
another CPU. control word sequence, govern the transfer of data and control
signals, count record lengths, and interrupt the CPU on exceptions.
Input/output Two modes of operation, burst and multiplex, are provided
for multiplexor channels. In burst mode, the channel facilities are
Devices and control units monopolized for the duration of data transfer to or from a particu-
Input/output devices include card equipment, magnetic tape lar 1 / 0 device. The selector channel functions only in the burst
units, disk storage, drum storage, typewriter-keyboard devices, mode. In multiplex mode, the multiplexor channel sustains several
printers, teleprocessing devices, and process control equipment. simultaneous 1/0 operations: bytes of data are interleaved and
The 1 / 0 devices are regulated by control units, which provide then routed between selected 1 / 0 devices and desired locations
the electrical, logical, and buffering capabilities necessary for 1 / 0 in main storage.
device operation. From the programming point of view, most At the conclusion of an operation launched by START 1/0
control-unit and 1/0 device functions are indistinguishable. or TEST I/O, an 1 / 0 interruption occurs. At this time a channel
Sometimes the control unit is housed with an 1/0 device, as in status word (CSW) is stored in location 64. Figure 8 shows the
the case of the printer. CSW format. The CSW provides information about the termina-
A control unit functions only with those 1 / 0 devices for which tion of the 1/0 operation.
it is designed, but all control units respond to a standard set of Successful execution of START 1/0 causes the channel to
Chapter 43 1 The structure of SVSTEM/360 599
STATUS COUNT
1
011s 0 through 3 contain the storage protection hay used In th. 0p.ration.
Bits 4 through 7 contain zeros
81ts 8 through 32 specify the location of the last CCW used
Bits 32 through 47 contdin an 110 device status byte and a channel status
byte The s t a t ~ sbytes provide such information as data check c h a n t
I" check. Control unlt end. oltc
Bits 4%through 63 contam the resodual count of the last CCW used.
fetch a channel address word from main-storage location 72. This The control command contains information, called a n order,
word specifies t h e storage-protection key that governs t h e I/O that is used to control t h e selected 1/0 device. Orders, peculiar
operation, as well as t h e location of the first eight bytes of infor- to t h e particular 1/0 device in use, can specify such functions
mation t h a t the channel fetches from main storage. These 64 bits as rewinding a tape unit, searching for a particular track in disk
comprise a channel command word (CCW). Figure 9 shows t h e storage, or line skipping on a printer. In a functional sense, the
CCW format. CPU executes 110 instructions, the channels execute commands,
and t h e control units and devices execute orders.
Channel program T h e sense command specifies a main storage location and
One or more CCW's make u p the channel program that directs transfers one or more bytes of status information from t h e selected
channel operations. Each CCW points to t h e next one to b e control unit. It provides details concerning t h e selected 1/0 de-
fetched, except for t h e last in the chain which so identifies itself. vice, such as a stacker-full condition of a card reader or a file-
Six channel commands are provided: read, write, read back- protected condition of a magnetic-tape reel.
ward, sense, transfer in channel, and control. T h e read command A channel program normally obtains CCW's from a consecu-
defines a n area in main storage and causes a read operation from tive string of storage locations. The string can b e broken by a
t h e selected I/O device. T h e write command causes data to b e transfer-in-channel command that specifies t h e location of t h e next
written by t h e selected device. T h e read-backward command is CCW to b e used by the channel. External documents, such as
akin to the read command, but t h e external medium is moved in punched cards or magnetic tape, may carry CCW's that can b e
t h e opposite direction and bytes read backward a r e placed in used by the channel to govern the reading of t h e documents.
descending main storage locations. T h e input/output interruptions caused by termination of a n
0 7 8 31
811s 0 through 7 rpscity the command code Bit 3 4 causes a possible incorrect length indication to be suppressed
Brits 8 through 3 1 specity the location of a byte In main storage Bit 35 suppresses the fraiirfer al information to main storage
Bots 3 2 through 36 am 11.18 bits Bit 36 causes an interrwtion
Bit 32 causes the address wrtion of the next CCW to be used Bits 37 through 39 must contain zeros
81133 causes the command code and data address in the next Bits 40 through 47 are igiiored
ccw to be "Sed. Bits 4 8 through 63 ~ p e c l l ythe number of bytes on the operation
RR F m t
Branching and Fixed-point f i i l l w d Floating-point
status switching and logical shmt
uxx DOOOxxxr -0ooln;xx 001 I X X X Z
-
LPR LOAD POSITIVE LPER LOAD POSITIVE
LNR LOAD NEGATIVE LNER LOAD NEGATIVE
LTR LOAD AND TEST LTER LOAD AND TEST
LCR LOA0 COMPLEMENT LCER LOAD COMPLEMENT
0100 SPM SET PROGRAM MASK NR AND HER HALVE
0101 BALR BRANCH AND LINK CLR COMPARE LOGICAL
0110 BCTR BRANCH ON COUNT OR OR
0111 BCR BRANCH/CONDITION XR EXCLUSIVE OR
1000 SSK SET KEY LR LOAD LER LOAD
1001 ISK INSERT KEY CR COMPARE CER COMPARE
1010 SVC SUPERVISOR CALL AR ADO ALR ADD N
1011
1100 I SR
MR
DR
SUBTRACT
MULTIPLY
DIVIDE
SDR
MDR
DDR
SUBTRACT N
MULTIPLY
DIVIDE
SER
MER
DER
SUBTRACT N
MULTIPLY
DIVIDE
ALR ADD LOGICAL AWR ADD U AUR ADD U
SLR SUBTRACT LOGICAL SWR SUBTRACT U SUR SUBTRACT U
-
R X Fonnnt
Fixed-point halfword Fixed-point fullword Floating-point Floating-point
and branching and logical long shmt
xxxx ni wrxr* 01Olxxxx 011 oarxx Ol1lxxxx
1
S SUBTRACT SD SUBTRACT N SE SUBTRACT N
MH MULTl PLY M MULTIPLY MO MULTIPLY ME MULTIPLY
1101 D DIVIDE DD DIVIDE DE DIVIDE
1110 CVD CONVERT-DECIMAL AL ADD LOGICAL AW ADO U AU ADD U
11 11 CVB CONVERT-BINARY SL SUBTRACT LOGICAL SW SUBTRACT U SU SUBTRACT U
RS, SI F o m t
Branchtng Fixed-point
status switching logical and
and shifting inputloutput
xxxx 1o m x x x IWlxrn IOllxxxr
SS Fmmat
Decimal
xxxx 1Im r x x 1 I IV***X 1 1 1 lxxxr
I/O operation, or by operator intervention at the 1/0 device, ability to reset the system; store and display information in main
enable the CPU to provide appropriate programmed response to storage, in registers, and in the PSW; and load initial program
conditions as they occur in 1/0 devices or channels. Conditions information. After an input device is selected with the load unit
responsible for 1/0 interruption requests are preserved in the 1/0 switches, depressing a load key causes a read from the selected
devices or channels until recognized by the CPU. input device. The six words of information that are read into main
During execution of START I/O, a command can be rejected storage provide the PSW and the CCW’s required for subsequent
by a busy condition, program check, etc. Rejection is indicated operation.
in the condition code of the PSW, and additional detail on the
Znstruction set
conditions that precluded initiation of the 1/0 operation is pro-
vided in a CSW. The SYSTEM/36O instructions, classified by format and function,
are displayed in Table 2. Operation codes and mnemonic abbrevi-
Manual control ations are also shown. W i t h the previously described formats in
T h e need for manual control is minimal because of the design of mind, much of the generality provided by the system is apparent
the system and supervisory program. A control panel provides the in this listing.
Chapter 44
W Y. Stevens
602
Chapter 44 1 The structure of SYSTEMl360 603
CPU registers and data paths circuit level and t h e number of levels in the register-to-adder path,
the adder, and the adder-to-register return path. The number of
Circuit speed levels varies because of t h e trade-off that can usually be made
SYSTEM/^^^ has three families of logic circuits, as shown in Table between t h e number of circuit modules and the number of logical
2, each using the same solid-logic technology. One family, having levels. Thus, the cycle time of t h e system varies from 1.0 psec for
a nominal delay of 30 nsec per logical stage or level, is used in Model 30 (with 30-nsec circuits, a relatively small number of
t h e data paths of Models 30,40, and SO. A second and faster family modules, and more logic levels) and 0.5 psec for Model SO (also
with a nominal delay of 10 nsec per level is used in Models 60 with 30-nsec circuits, b u t with more modules and fewer levels)
and 62. The fastest family, with a delay of 6 nsec, is used in Model to 0.2 psec for Model 70 (with 6-nsec circuits).
70.
T h e fundamental determinant of CPU speed is the time re- Local storage
quired to take data from t h e internal registers, process the data The speed of t h e CPU depends also on the speed of t h e general
through t h e adder or other logical unit, and return the result to and floating-point registers. In Model 30, these registers are located
a register. This cycle time is determined by t h e delay per logical in an extension to the main core storage and have a read-write
time of 2.0 psec. In Model 40, t h e registers a r e located in a small prepared by the instruction unit. The third unit is a storage bus
core-storage unit, called local storage, with a read-write time of control which coordinates t h e various requests by t h e other units
1.25 psec. Here, the operation of t h e local storage may be over- and by t h e channels for core-storage cycles. All three units nor-
lapped with main storage. In Model 50, t h e registers are in a local mally operate simultaneously, and together provide a large degree
storage with a read-write time of only 0.5 psec. In Model 60/62, of instruction overlap. Since each of t h e units contains a number
t h e local storage has t h e logical characteristics of a core storage of different data paths, several data transfers may be occurring
with nondestructive read-out; however, it is actually constructed on t h e same cycle in a single unit.
as an array of registers using t h e 30-nsec family of logic circuits, T h e operations of other SYSTEM/360 models fall between those
and has a read-write time of 0.25 psec. In Model 70, the general mentioned. Model 50, for example, can have simultaneous data
and floating-point registers are implemented with 6-nsec logic transfers through the main adder, through a n auxiliary byte trans-
circuits and communicate directly with the adder and other data fer path, and t o or from local storage.
paths.
T h e two principal measures of size in the C P U are the width Sequence control
of t h e d a t a paths and the number of bytes of high-speed working
Complex instruction sequences
registers.
Since the SYSTEM/360 has an extensive instruction set, the CPU’s
D a t a path organization must be capable of executing a large number of different sequences
Model 30 has an 8-bit wide (plus parity) adder path, through which of basic operations. Furthermore, many instructions require se-
all data transfers a r e made, and approximately 12 bytes of working quences that are dependent on the data or addresses used. As
registers. shown in Table 3, these sequences of operations can b e controlled
Model 40 also has an 8-bit wide adder path, but has a n addi- by two methods; either by a conventional sequential logic circuit
tional 16-bit wide data transfer path. Approximately 15 bytes of that uses t h e same types of circuit modules as used in t h e data
working registers are used, plus about 48 bytes of working locations paths or by a read-only storage device that contains a micro-
in the local storage, exclusive of the general and floating-point program specifying t h e sequences to be performed for t h e different
registers. instructions.
Model 50 has a 32-bit wide adder path, a n 8-bit wide data path Model 70 makes use of conventional sequential logic control
used for handling individual bytes, approximately 30 bytes of mainly because of t h e high degree of simultaneity required. Also,
working registers, plus about 60 bytes of working locations in the a sufficiently fast read-only storage unit was not available at the
local storage. time of development. The sequences to be performed in each of
Model 60/62 has a 56-hit wide main adder path, an 8-bit wide the Model 70 data paths have a considerable degree of independ-
serial adder path, and approximately 50 bytes of working registers. ence. T h e read-only storage method of control does not easily lend
Model 70 has a 64-bit wide main adder, an 8-bit wide exponent itself to controlling these independent sequences, but is well
adder, an &bit wide decimal adder, a 24-bit wide addressing adder, adapted where the actions in each of t h e data paths are highly
and several other data transfer paths, some of which have incre- coordinated.
menting ability. The model has about 100 bytes of working registers
Read-only storage control
plus the 96 bytes of floating point and general registers which, in
Model 70, are directly associated with the data paths. The read-only storage niethod of control is described elsewhere
T h e models of SYSTEM/^^^ differ considerably in t h e number [Peacock, 19??]. This microprogram &rol,used in all but the
of relatively independent operations that can occur simultaneously fastest model of SYSTEM/360, is the only method known by which
in t h e CPU. Model 30, for example, operates serially: virtually all a n extensive instruction set may be economically realized in a
data transfers must pass through the adder, one byte a t a time. small system. This was demonstrated during the design of Model
Model 70, however, can have many operations taking place at the 60/62. Conventional logic control was originally planned for this
same time. The CPU of this model is divided into three units that model, b u t it became evident during t h e design period that too
operate somewhat independently. The instruction preparation unit many circuit modules were required to implement t h e instruction
fetches instructions from storage, prepares them by computing set, even for this rather large system. Because a sufficiently fast
their effective addresses, and initiates t h e fetching of t h e required read-only storage became available, it was adopted for sequence
data. The execution unit performs the execution of t h e instruction control a t a substantial cost reduction.
Chapter 44 I The structure of SYSTEM/360 605
The three factors of speed, size, and simultaneity are applicable different speeds of main storage and, in the three smaller models,
to t h e read-only storage controls of the various SYSTEM/^^^ models. different speeds of local storage.
The speed of the read-only storage units corresponds to the cycle The channels differ markedly in the amount of hardware de-
time of the CPU, and hence varies from 1.0 psec per access for voted exclusively to channel use, as shown in Table 4.In the Model
Model 30 down to 0.25 psec for Models 60 and 62. 30 multiplexor channel, this hardware amounts only to three
The size of read-only storage can vary in two ways-in width 1-byte wide data paths, 11 latch bits for control, and a simple
(number of bits per word) and in number of words. Since t h e bits interface polling circuit. T h e channel used in Models 60, 62,
of a word are used to control gates in the data paths, t h e width and 70 contains about 300 bits of register storage, a 24-bit wide
of storage is indirectly related to the complexity of the data paths. adder, and a complete set of sequential control circuits. The
The widths of t h e read-only storages in SYSTEM/^^^ range from amount of hardware provided for other channels is somewhere in
60 bits for Models 30 and 40 to 100 bits for Models 60 and 62. between these extremes.
The number of words is affected b y several factors. First, of course, The disparity in t h e amount of channel hardware reflects the
is t h e number and complexity of the control sequences to be extent to which the channels share C P U hardware in accomplish-
executed. This is the same for all models except that Model 60/62 ing their functions. Such sharing is done at the expense of increased
read-only storage contains no sequences for channel functions. The interference with the CPU, of course. This interference ranges
number of words tends to be greater for the smaller models, since from complete lock-out of C P U operations at high data rates on
these models require more cycles to accomplish t h e same function. some of the smaller models, to interference only in essential
Partially offsetting this is the fact that t h e greater degree of references to main storage by the channel in the large models.
simultaneity in the larger systems often prevents t h e sharing of
microprogram sequences between similar functions. Channel/system relationship
SYSTEM/360 employs no read-only storage simultaneity in the When the channels are viewed in their relationship to t h e whole
sense that more than one access is in progress at a given time. system, the three factors of speed, size, and simultaneity take on
However, a single read-only storage word simultaneously controls a different aspect. The channel is viewed as a system component,
several independent actions. The number of different gate control and its effect on system throughput and other system capabilities
fields in a word provides some measure of this simultaneity. Model is of concern. The speeds of the channels vary from a maximum
30 has 9 such fields. Model 60/62 has 16. rate of about 16 thousand bytes per second (byte interleaved mode)
I .
on t h e multiplexor channel of Model 30 to a maximum rate of
Inputloutput channels about 1250 thousand bytes per second on the channels of Models
60, 62, and 70. The size of each of the channels is the same, in
Channel design
t h e sense that each handles a n %bit byte at a time and each can
The SYSTEM/36O input/outpnt channels may be considered from connect to eight different control units. A slight size difference
two viewpoints: t h e design of a channel itself, or the relationship exists among multiplexor channels in terms of the maximum num-
of a channel to t h e whole system. ber of subchannels.
From the viewpoint of channel design, the raw speed of the The degree of channel simultaneity differs considerably among
components does not vary, since all channels use t h e 30-nsec family t h e various models of SYSTEM/^^^. For example, operation of the
of circuits. However, the different channels do have access to Model 30 or 40 multiplexor channels in burst mode inhibits all
606 Part 6 I Computer families Section 3 I The IBM System/360-a series of planned machines which span a wide performance range
Selector channels
Maximum number attachable 2 2 3 6 6
Approximate maximum data rate on one channel in 250 400 aoo 1250 1250
KbY PS t (1250 on
high speed)
Uses CPU data paths for:
initiation and termination Yes yes
byte transfers no no
storage word transfers no low speed
only
chaining Yes Yes Yes
CPU and 1/0 overlap possible Yes Yes regular-yes
high speed-no
Multiplexor channels
Maximum number attachable 1 1 1 0 0
Minimum number of subchannels 32 16 64
Maximum number of subchannels 96 128 256
Maximum data rate in byte interleaved mode (Kbyps) 16 30 40
Maximum data rate in burst mode (Kbyps) 200 200 200
Uses CPU data paths for all functions Yes Yes Yes
CPU and 1/0 overlap possible in byte mode Yes Yes Yes
CPU and 1/0 overlap possible in burst mode no no yes
t Thousand bytes per second.
other activity on t h e system, as does operation of the special fast selector channels and character-oriented data paths of Model
high-speed channel on Model 50. At the other extreme, as many 30 result from this emphasis. But despite this emphasis, t h e gen-
as six selector channels can be operating concurrently with t h e eral-purpose instruction set of S Y S T E M / 3 6 0 results in much better
CPU on Models 60, 62, or 70. A second type of simultaneity is scientific application performance for Model 30 than for its com-
present in the multiplexor channels available on Models 30, 40, parable predecessors.
and 50. W h e n operating in byte interleaved mode, one of these On t h e other hand, the large systems are expected to find
channels can control a number of concurrently operating input/ particularly heavy use in scientific computation, where t h e em-
output devices, and t h e CPU can also continue operation. phasis is on rapid floating-point arithmetic. Thus Models 60, 62,
and 70 contain registers and adders that can handle the full length
Differences in application emphasis of a long format floating-point operand, yet d o character opera-
The models of S Y S T E M / 3 6 0 differ not only in throughput but also tions one byte at a time.
in the relative speeds of the various operations. Some of these No particular emphasis on either commercial or scientific
relative differences are simply a result of t h e design choices de- applications characterizes the intermediate models. However,
scribed in this paper, made to achieve t h e desired overall perform- Models 40 and 50 are intended to be particularly suitable for
ance. The more basic differences in relative performance of the communication-oriented and real-time applications. For example,
various operations, however, were intentional. These differences Model 50 includes a multiplexor channel, storage protection, and
in emphasis suit each model t o those applications expected t o a timer as standard features, and also provides the ability to share
comprise its largest usage. main storages between two CPU’s in a multiprocessing arrange-
Thus the smallest system is particularly aimed at traditional ment.
commercial data processing applications. These are characterized
by extensive input/output operations in relation to t h e internal References
processing, and by more character handling than arithmetic. T h e PeacA??
Appendix
“top down” development would thus embed ISP within PMS. Neverthe-
means of a form in the variables x and y.
less, it appears appropriate to present them as two distinct notations: it
makes reference easier and permits each to be organized around its own 5 Indefinite expressions
most important notions. a /b 1 c means one of a or b or c.
The style of presentation is moderately formal. Within a section, the x- y means the interval from x up to and including y.
syntax is presented, followed by remarks on the interpretation to be given -x means an interval around x of undetermined scope.
to these syntactic forms (the semantics). Examples that help to pin down
6 Lists and sets
the notations are furnished throughout. Although not a computer lan- (3, 5, 1, 5) is a list of digits, which also could have been written
guage, we present it as if it were; thus, a number of elementary things (3; 5 ; 1; 5). Digit-list refers to all possible lists of digits. Digit-set
are provided for in the definitions. (Part of the motivation for this is to refers to all possible sets digits, unordered and without repetition.
introduce abbreviations.)
A language can be realized in many media. In this book we have taken 7 Definite expressions
some advantage of printing orthography insofar as it enhances cornmuni- X := (size: integer; function: (primary I secondary); control: (yes I
cation. However, it may also be necessary to map the notations into vari- no)) defines X to be an entity with an attribute, size, taking any
ous restrictive character seta-e.g., those of the typewriter and the com- integer as value; with an attribute, function, taking primary or
puter. For the sake of brevity, we do not discuss this coding problem here. secondary as value; and with an attribute, control, taking yes or
no as value.
The appendix is in three parts. The first part gives the general con-
ventions common to both PMS and ISP. The second and third parts give -
Y : = X(size: 12 20; primary; 7control) defines Y as an entity of
PMS (page 615) and ISP (page 628), as discussed in Chap. 2. type X which is further specified by having size between 12 and
20, having the value of function be primary and the value of
control be no.
607
608 Appendix
11 Numbers 2. Metanotation
Numbers and arithmetic expressions are defined in the standard 2.1 The language itself is described by giving various classes of expres-
fashion. sions and assigning meanings to the members of these classes (ie., telling
12 Quantities, dimensions, and units what they designate). We will generally do this in English but with a few
A quantity is just a dimensionalized number-a number of units special notations.
along a given dimension.
13 Booleans and relations 2.2 Expression-variables
Logical expressions involving and (A), or (V), not (7), implies
(3). equivalence (z), and exclusive-or (0) are defined in stand- 1 Let a, b, . . , , A , B, . , . b e variables whose domain is a set of ex-
ard fashion, as are expressions involving the six basic relations pressions.
(=,#, <>>>I>2). 2 Let class(a) be the set of definite expressions defined by the indefi-
nite expression a. This is extended to definite expressions, x, by
1. Basic semantics defining class(x) = X.
1 . 1 We will use the term “entity” to refer to all things designatable by
expressions in the language. COMMENT Normally lowercase variables (e.g.. a) stand for any
legal expression, whereas uppercase variables (e.g., A) stand for
1.2 An entity is assumed to be fully characterizable by a set of attributes any indefinite expression.
and associated values, which are themselves entities.
COMMENT There will necessarily be entities with no further specification 2.3 We will define the language by giving forms of expressions, that is,
within the system-that, in effect, have only a name. by writing down sequences of expressions and expression-variables. These
forms are to be interpreted as permitting any expression that results from
The semantics of the language consists in showing how expressions in the replacing the expression-variables with expressions from their respective
language determine the various attributes and values. domains.
COMMENT Note that we have used the same variable several times, even 3.8 Subscripting and superscripting may be used interchangeably with
though independently selected values are meant at each occurrence. It will the marks J, and t respectively.
always be clear from the context when this is being done.
EXAMPLE 10 1 2 is the same as 10,
x t 2 is the same as xz
3. Basic syntax
3.2 A name is a sequence of characters written without spaces. 4.1 If x is a free name [as defined in General Conventions section 10
(CC lo)] and y is any expression, then the command
3.3 A character is a member of one of the following alphabets: X:= Y
I Capital letters A B ...Z assigns the name x to the corresponding expression y. In particular,
3.4 One or more spaces (freely determined) occur between names. The etc.; then x is assigned to be the name of the union of all the expressions:
only exceptions are names that are single marks (alphabet 4, above) and class(x) = union( class(i) )
can be disambiguated. For these, spaces can be omitted. i = a,b, . . .
EXAMPLES A, B instead of A , b EXAMPLE M.l := M(size: 1000 w) and M.l : = M(size: 2000 w) would
- 3 instead of - 3 define M.l to be memories of either 1,000 or 2,000 words.
(A + B) instead of ( A + B ) 4.3 If x is any name and y is any name, then the command
3.5 Parentheses are used around any expression that would otherwise be X / Y
ambiguously interpreted. Conversely, parentheses can be dropped when- assigns y to be an abbreviation (a synonym) for x. Abbreviation may oc-
ever there is no possibility of ambiguity. cur on any occasion and not just when x is first defined. It may occur as a
separate expression or it may occur in an expression in which x occurs,
3.6 To avoid excesq parentheses, an order of precedence exists for names thus establishing the abbreviation in passing. A sequence of abbreviations
used as separators. The higher in the order, the greater the binding power, may be defined in the same expression.
i.e., the greater precedence in being interpreted first. The following order
is consistent with the alphabetical order: COMMENT The abbreviation may not be a shorter phrase at all, but simply
: = I ~ ; l : l + l + / ~ & @3~ 1
l , an alternative phrasing (say, one commonly known).
v I A I1 =# I <><2 1 +- I x/
I
I - 1 f I .1 I C J 1 /(abbreviation), .-(hyphen) EXAMPLE Memory / M, bit / b, second / sec 1s
3.7 Spacing on the page is freely determined (e.g., for legibility). An ex- multiplex / many channeled
pression may run freely on several consecutive lines (with no explicit con-
tinuation mark). COMMENT / is also used for division, but no difficulties arise
4.4 If x is any name and D is any indefinite expression, then the command
EXAMPLE z'(0:ll) := (? ib+ z"; This ISP expression and also
ib + M[z"]) this comment are on two lines. x := D-variable
610 Appendix
assigns x to be a variable with the set of entities of class(D) as the do- class(Y) contains C(iMp: M(size: 1000 w; width: 12 bj)
main. If there are no restrictions on the domain of the variable, then the C(Mp: M(size: 1000 w; width: 16 b)j
D may he dropped.
C(Mp: M(size: 10(M w; speed: 1000 o/s)j
EXAMPLES x := number-variable etc.
y := component-variable
z : = variable no restricted domain 6.3 Indefinite expressions can be formed in five ways:
COMMENT Note that these variables are over entities, not over expres- Postulation: an expression is given in the initial definition in this
sions (as are the expression-variables x, y, zj. appendix.
EXAMPLE Entity is so defined in GC 7.
4 . 5 A form is any expression containing variables. If f is a form contain-
ing a single free name x (in addition to variables and defined subexpres- Specialization: If A contains an occurrence of another indefinite
sions) and g is a form, then we extend the assignment command to expression, B and x is any expression for a subset of class(B); then
the expression formed by replacing the occurrence of B in A by x
include
yields a legitimate expression. In symbols, if A(. , , B . . .) is legal
f:= g and x is legal and class(x) C class(B), then A(. . . x . . .) is legal.
which is taken as defining the name x. The variables occurring in f are
called the operands of x. An occurrence of the form f with variables re- EXAMPLE In the example of GC 5.2, the expressions of the mein-
placed hy expressions designating in the domain of the variables is equiva- bers of class(Yj are legal expressions.
lent to the expression g with these same variables replaced by their values Alternation: If x, y, . . , are any expressions, then x 1 y . . . is the in-
from the occurrence of f. This permits the definitions of functions and definite expression “either x or alternatively y or alternatively. . . .”
operations in which the operands (the variables in f) can be identified by In symbols,
the form of their occurrence.
class(x I y . . .) = nnion( class(i) j
I = x,y, . . .
EXAMPLES x : = iliimber-vdriabk y : = number-variable
x-Y
is the indefinite expression containing all members of the ordering
5. Indefinite expressions starting with x, tip to arid including y.
5.1 An indefinite expression is characterized completely by giving the
class associated with the expression. EXAMPLE 7 - 11 is equivalent to 71 8 1 9 101 11
Approximation: If x designates a member of an ordering, then -x
5.9 The hasic evaluation rule is the following:
is an indefinite expression containing x plus members of the order
If A contains an occnrrence of another indefinite expression B, then on both sides of x, without specification of the exact limits.
class(A) is the union of the classes of all the expressions formed by replac-
ing the occurrence of B by each member of class(B). In symbols, EXAMPLE -10 i\ a set of nuinbeis around 10, posihly 8 I Y 1 10 I 11.
the attribute-list. This is an abbreviation technique that permits writing 10.2 There is a special class of expressions called name-expressions, which
the attribute names only once for a list of values, each of which has are used to define names.
several suhattributes.
EXAMPLE operation-times := (add-time, store-time) has values 1 Name-expressions all have names that are of the form x-name,
(10 p, 6 p),(20 ~ s 20
, p j , etc. where x is a name.
8.8 attribute : = x-name 2 Name-expressions are written with spaces, which are to be removed
This is a single special attribute, defined for each entity x. See GC 10.10 in generating strings of characters from them.
for definition. 3 Name-expressions occur only in conjunction with name-expression
8.9 attribute := index / # names, either as an assignment:
where value(indexj : = +integer 1 -integer x-name := name-expression
if x is a list (more generally, of form z o z . . .)
or as an attribute-value:
The elements of a list (or other sequence) are automatically indexed by
their number from the front (+integer) or the end (-integer) of the list. x-name: name-expression
This index can be used as an attribute.
Thus, it can always he determined when a name-expression
occurs.
EXAMPLE x := (Ma, Mb, Mc, Md)
x(index: 3 ) = x ( # : 3 ) = x(3) = Mc
EXAMPLE Q-name : = A I3 (112) defines Q-name
x.4 = x.-I = Md
AB1 and A B 2 are the two possible Q-names
9. Null symbol and optional expression 10.3 Alphabets are defined as the alternate\ of their characters, e.g.,
9.1 Let p be the null expression digit := 0 ( 1 ( 2 ( 3 ( 4 ( 5 ( 6 ( 7 ( 8 ( 9
class(p) = the null class Capital letters, small letters, marks, and characters, as laid out in GC 3.3,
g may occur as the defining expression in an assignment or as a member are defined similarly.
of an alternation:
10.4 If x is any set of characters, then
x:= p
XlalY
x-string
(d may occur as a member of a set or list, in which case it may be deleted is a string of such characters of indefinite length (at least one) with no
from the set or list. spaces between.
x, 8, y is equivalent to x, y
EXAMPLE digit-string contains 1, 1354, 65487, etc.
(-) (hyphen-names),All simple-names fiinction identically: they obtain their 10.9 x-name. The names to be used in defining an immediate instance
designations through assignment (: = ) or abbreviation (/). They may thus of the entity x. If x is any entity and y is any name-expression, such that
be definite or indefinite, corresponding to the expressions they name. Any
x : = (x-name: y; . . .)
simple-name may be used if it has not already been used for a different
expression or is not excluded by number-name or by a previously defined then any z which is an instance of x,
x-name (see below).
2 := x(. . . . .)
EXAMPLES AB3 SAM Baker Instruction-set input-register 13-B must be chosen from the name-expressions defined by y. This holds only
for a single level. If w := z(. . .), then w is not constrained as to the name
ABBREVIATION If there is no chance for ambiguity, phrase-names may be used.
written with a space instead of the space-concatenation mark (J.
EXAMPLE component := (component-name: capital-letter)
EXAMPLE skip condition = skip-condition M := component (. . .) is legal:
SAM := component(. . .) is not legal;
ABBREVIATION If the hyphen-name x-a is used within the scope of the SAM := M(. . .) is legal.
definition of the entity x, then the name may be abbreviated to just a.
COMMENT This permits the use of the same name in local contexts, where 11. Numbers
the name of the context (the expression being defined) serves to disambig- 11.1 number : = number-name I number-variable I number J base I
uate the name where needed. arithmetic-expression 1 count-expression
system is different from 10, it may be given explicitly (for example, 10 J 2 13. Boolean and relations
= 10, = 2). Arithmetic expressions are formed from various arithmetic boolean : = true / t / l I false / f/O 1 boolean-variable 1
operations with numbers as operands. Operations are classified by their boolean-expression I relational-expression
syntactic form: unary operations ( - ( 3 ) or +(7) ): binary operations (7 - 6,
3/8 or 3 t 2 = 37; and n-ary operations (3 + 8 + 6 or 5 x 6 x 2 x 3). boolean-expression : = unary-boolean-operation boolean 1
Functions are defined as taking a list of numbers as operands (abs(3) or hoolean binary-boolean-operation boolean I
max(5, 7, - 12) ). There is a counting function that takes any set or list of boolean 11-ary-boolean-operation boolean . . .
entities as inputs and produces their number (if X : = (Ma, Mb, Mc) then unary-boolean-operation :=
number(X) = 3). Abbreviations are introduced for many of the operations
and functions. binary-boolean-operation := II E
n-ary-boolean-operation : = V I A I @
11.2 number-set-name := (digit 1 @)-string
relational-expression := number relational-operator number
A special subset of (alternative) numbers may be defined by substituting a relational-operation : = = 1 +1 < I >I 5 I 2 1
5I f
@ for a digit. The Q) stands for any digit (of the base of the number).
There are two primary boolean values, true and false. Boolean-variables,
EXAMPLE 01@ = 0101011 01@ hinary boolean-expressions, and relational-expressions are expressions that evaluate
7@ = 701711.. ,177 7@ octal (potentially) to true or false. Boolean expressions are made up from the
standard operations on truth values: negation (+. implication ( I )equiva-
,
12. Quantities, dimensions, and units lence (G),conjunction (A), disjunction (V), and exclusive-or (0). Relations
quantity : = number unit are defined on numbers.
COMMENT More general definitions for entities (for = and f ) and for
unit := (dimension; conversion-list) 1 unit-name := miiltiplier unit I
simple-name
ordered sets (for <, >, 5 , and 2)are not needed.
word : = length x bits I length x character I length x base-unit This single definition of a computer component contains all of the
word-bit-length : = 12 - 64
attributes common to all components. All components can thus he given
as further specifications of this definition. (Such definitions can add attri-
word-character-length := 2 - 8 butes not in the higher entity.) Examples are given in succeeding sections.
block := length x word I length x character We comment on some of the attribute domains below and provide an
extensive listing of values for some.
record : = length x word 1 length x character
file := +integer x block1 +integer x record 4.2 Component-name. All components that are immediate instances of
this definition are to have single-letter names-for example, P, M, S, etc.
IBM-card / card := column x row x card-hole Names of instances of P, M, S, etc., are arbitrary.
card-column / col := 80
4.3 Manufacturer-names I Proper-name. We provide a very short abbrevia-
card-row / row : = 12 tion (') to indicate that a string of characters is a manufacturer's name,
card-hold := 1 hit since these names are arbitrary and need to be distinguished from other
values. A proper name can also he given to a component.
print-line / line := print-column x character
print-column / col : = 64 - 132 I 72 180I 120 I 132 rarely <64 EXAMPLES 'IBM System/360 Model 50. 'I/O,Bus
direction: (from / out / output / X +) 1 (to / in / into / input / X t); The information rate as measured at the port is the flow of i-units per
unit of time. An equivalent measure is the time for the i-unit to p d S S
turn-around-time / t.tum: [t] only for hoZf-duplex currier;
through the port. Concurrency is a measure of the number of simul-
carrier) taneous i-units the port can pass. Concurrency-type denotes both the
number of simultaneous messages and the message direction. The simplex
carrier := (
port allows only one message to enter or leave the port, not both. The
writability: (human / h 1 machine / mechanical process / In I half-duplex port allows a message to either enter or leave the port, hut
both machine and human / b); only on a time-multiplexed basis; that is, the port is simplex for one
direction at a time. In the case of the half-duplex port, the turnaround
readability: (human / h I machine / mechanical process / m I
time is a significant attribute that denotes the time taken to go from re-
both machine and human / b);
ceiving to transmitting or vice versa. A full-duplex port allows information
medium; to flow in both directions at once (i.e., enter and leave the port simulta-
encoding) neously). Finally, the multiplex port denotes multiple ports that can be
decomposed into the more elementary structures discussed above.
medium := (electrical conduction : = voltage 1 current) 1 Direction is usually indicated on each port of a component to denote
magnetic 1 electrostatic I radiowave 1 microwave I optical light I the direction of information flow. Direction must be specified for simplex
ports (using arrowheads t,+). Half- and full-duplex ports are shown
(mechanical movement := tactile 1 linear position I angular position I with no arrowheads.
spatial position) I temperature / heat I Carrier characterizes the form of information at a port. The two major
attributes, writability and readability, define whether human beings, ma-
(acoustical / airpressure : = high frequency audio) 1 memory technology
chines, or both human beings and machines are able to use (interpret) the
see PMS 6.2 carrier directly. Media denotes the technology of the carrier. Information
encoding / modulation := continuous-modulation / analog I can be carried by any of the media listed. It should be noted that memory
technology is also listed as a media to carry information. Unlike the media
digital / discrete-modulation
that are instantaneous carriers, memory holds information over a long pe-
continuous-modulation : = direct / null 1 amplitude / am I riod of time. For each media, it is appropriate to encode information in
particular ways. The two basic methods are continuous and discrete en-
pulse amplitude modulation / pam I pulse duration modulation / pdm I
coding (or modulation).
time duration modulation I frequency modulation / fm
discrete-modulation := direct / pulse code modulation / pcm 1
4.7 Logic-technology and technology. All devices have a logic technology
frequency shift keying / fsk I digital pulse I digital level I contact and almost always only a single one (though exceptions exist, especially in
The ports are the connection points (nodes or terminals) of a compo- compound components). They may also have other technology specific to
nent at which cocomponents connect. A port is not a component but the type of component (e.g., disk-memory technology). The logic technol-
simply an interface with a characteristic i-unit that crosses it in one direc- ogy is given here; other technologies are given with the specific component.
tion or the other. One can thus associate two operations with a port,
namely, the transmission operations of its component and the cocom- logic-technolop : = magnetic-core 1 cryogenic 1
ponent. The port introduces directionality: input is from the cocompo- electro-mechanical I fluidic I hybrid-circuit 1
nent into the port’s component; output is from the port’s component to
the cocomponent. monolithic integrated / integrated / ic I large scale integrated / LSI 1
The i-unit subcomponents iisiially correspond to physical subparts of mechanical I integrated metal oxide silicon / MOS I
the port. For conventional information-carrying structures, the base-unit
medium scale integrated / MSI 1 optical I
is the encoding of information on a single wire of the port, i.e., a bit.
The width is the number of wires available per unit time. The length is the transistor 1 vacuum-tube
number of (width x base-unit)’s which are necessary to transmit the i-unit.
As such, the i-unit can he thought of as a message normally with length
X width x base-unit. More complex messages can have multiple dimen- 4.8 Reliability. Although of extreme importance, we list only two values
sional lengths (e.g., consider a record which is transmitted serially, where for reliability, the mean number of operations between failures, and the
the base-unit is a bit, the width is 1, the length is an 8-bit byte, and the mean time between failures. In essence, one can be derived from the other
record length is 1,000 bytes). if the operation rate is known.
Appendix 619
4.9 Enor rute. Usually a ratio of the number of erroneous operations per 4.12 history := (
error-free operations. Approximately l/(probability of an error).
t.conception / t.start: date;
4.10 Cost. Only the two simplest cost numbers, purcha5e price and ?.announcement / t.paper: date
(monthly) rental are listed as attributes. Conventionally, purchase price is
*t.birth / t.prototype / t.operational: date;
taken as 45 times monthly rental. In addition, one could list manufac-
turing costs, broken down into materials, labor, etc., and more elaborate *t.scheduled: date;
sales costs, such as lease-purchase options. Most of these quantities are not
*t.exhibited: date;
relevant from an engineering viewpoint. Some that are important are un-
obtainable in general. 't.delivery / t.production: date-list;
*t.first-delivery / t.first: date;
4.11 lineage := ( *t.last-delivery / t.last / t.withdrawa1: date;
manufacturer: Burroughs 1 *t.death / t.last-use: date;
Control Data Corporation / CDC I 'production: number(t.delivery))
Digital Equipment Corporation / DEC 1 date := year 1 month year 1 day month year quarter year
English Electric I quarter / q : = winter / I 1 spring / 2 I summer / 3 1 fall / 4
Ferranti I
General Electric / GE I The history of the component is viewed as a series of event dates, only
the more important being given above. Often the same essential function
Honeywell I is served by a variety of events (e.g., the announcement of a computer to
International Business Machines / IBM 1 the public can be made either by formal announcement, as happens with
commercial systems, or by a technical paper). Delivery or production re-
International Computers and Tabulators / ICT I
fers to the actual placing of systems and consists of a series of dates, one
Hewlett-Packard / HP 1 for each instance produced. This series is normally abbreviated to the first
and last delivery, plus the number produced. None of the attributes be-
Olivetti I
yond t.start need exist, as a computer system can be aborted at any time.
Radio Corporation of America / RCA I For all attributes, the dates may be known only approximately.
Remington-Rand / UNIVAC 1
4.13 Weight, power, volume, area, temperature. Since we concentrate on
Scientific Data Systems / SDS / Xerox Data Systems / XDS I the informational aspects of components, other attributes are mentioned
only briefly (and others, such as decor, are left out entirely). The values of
Westinghouse;
these parameters are especially important in aerospace applications. They
manufacturer-type: government / g 1 industrial / i I also show the effects of technology on packaging and computing power
research-laboratory / r 1 university / u; per unit volume.
The attributes are mostly self-descriptive, We have not attempted to cocomponents: (input: component, output: component, initiators:
list manufacturers other than the principle industrial ones. Descendants input 1 output (both);
and antecedents are necessarily vague, since no precise notion of parent-
subcomponents: (*control; *input-buffer: M.i-unit; *output-buffer:
hood can be defined. It is not limited to computers built as a series (as in
M.i-unit);
the IBM 704 being a descendant of the IBM 701) but includes any ma-
chine where the design bond is strong (e.g., IBM 709 and 7090). concurrency: 1;
620 Appendix
concurrency-type: simplex; a time-which thus leads to a turnaround delay time. A full-duplex link
permits simultaneous transmission in both directions. Broadcast links per-
information-rate / i-rate: (i-unit/operation) x o-rate [i/t];
mit transmission to many receivers; thus the output components can he
i-unit: i-unit(input) equals i-unit(output); set. Network broadcast permits more than one terminal to be a source,
though only one at a time. The star denotes all n components of a set to
delay / t.delay / td: [t];
simultaneously communicate with one another via ( 4 2 ) x (n-1) full-
carrier) duplex links.
Finally, a set of disjoint links (that is, inputs disjoint and outputs dis-
A simple-link has the capability of moving an i-unit from the input joint) can be considered to be a single link. This latter is essentially a
cocomponent to the ontpnt cocomponent. The simple-link has two simplex convenience for naming a multiplex link.
ports that connect to the ports of the two cocomponents and are sepa-
rated by a delay. In essence, as the delay goes to zero, the input port and EXAMPLES L Dataphone; 1800 h/s; half-duplex; i-unit: (length: 8,
output ports become one. Initiation of the transmission may be b e d at [width: 1 b)
one end or the other or be from either end, depending on the design of
L(Te1ephone; i-rate: 110 b/s; direction: full-duplex)
the link. The base-unit is usually a bit (is., two states), but it may be
more. The width of the i-unit is the number of base-units transmitted in Telephone := L(110 b/s; full-duplex) alternative form
parallel; and the length is the number of widths serially transmitted in one
1/0 Bus : = L half-duplex; i-unit: 1 w; 12 b/w;
operation. A simple-link permits transmission in one direction only (from
input to output cocomponent); this is normally called a simplex link. The
[operation-rate: 500 ko/s 1
port-to-port delay is the time from the initiation of the transmit operation L 'I/O Bus; half-duplex; i-unit: 1 w; ] alternatiue form
at one port to the arrival of the i-unit at the second port. (Occasionally, [I2 b/w; 500 kw/s
the arrival time between widths can be relevant operationally, and then
a more precise characterization of the time structure would be required.)
The rate of transmission (the information rate) may be calculated by taking
L 'I/O Bus; half-duplex; i-unit (length:
[I2 b; width: 1 b); 6 megabits/s
1 alternative form
the operation rate times the information transmitted per operation (ix.,
the content of the i-unit). Links may-but need not-contain buffering at
either end for a single i-unit. There may be a distinct control involved,
especially if initiation and termination rituals must be accomplished; but 6. Memory
it is possible to have links that are simple wires and simply present at the
6.1 Memory / M := simple-memory I compound-memory
output terminal what was presented at the input.
6.2 simple-memory : = component (
EXAMPLE L input: register A; output: register B; width: 36 h; cocomponents: read: component, write: component;
[I megawords/s - functions: see Table.J
1,
subcomponent: control;
5.3 compound-link := (
word / w: i-unit [i];
simple-link(c0ncurrency: 1; concurrency-type: half-duplex) 1
size: 1 word [i];
simple-link(concurrency: 2; concurrency-type: full-duplex) I
operations: (read I write I read, write);
+integer; concurrency-type: hroadcast;
siml-'le~link(concriri-en~y:
output: component-set) 1 information-rate / i-rate: [i] / word x operation-rate [i/t];
simple-link(concurrency: + integer; concurrency-type: network broad- x access-time / ta: constant 1 -constant
_c_
[t];
cast; input: component-set; output: component-set) 1
- cycle-time / tc: time(read; next write) [t];
- - -..--
simple-link(concurrency: +integer; concurrency-type: star) 1
per-mannEy: (decay I fast-read-slow-write / frsw I permanent / read-
(simple-link)-set) only / ro / ros / ROS / read-only-memory / rom / ROM I
read-clestryct 1 read-regenerate / rr 1 read-write / rw 1 write-only) [t];
' A \ 1 \ ( 1
A compound-link is made up of several links, but such that no switch-
.- portahility: (portable / p I not portable / fixed / f);
ing occurs. A half-duplex link permits information to flow from either
terminal to the other, but transmission is possible in only one direction at technology: see Table 2)
Appendix 621
portability: portability(simp1e-M);
?See PMS 6.2 for abbreviations, also c/cylic, I/linear, r/randorn.
technology: see Table 2)
7. Switch +
bilinear: constant constant x abs(a - p ) (
7.1 Switch / S := gate-switch I simple-switch I compound switch cyclic: constant + constant x (a - p ) mod (size)(
interleave: (a interleave-relation p + random)-list 1
linear: (a 2 p + constant + constant x (a - p ) ;
a< p + reset-time + constant x a) 1
7.2 gate-switch := component ( first-in-first-out / fifo / queue: (constant I -constant) I
cocomponents: (input: component, output: component: initiators: last-in-first-out / lifo / stack: (constant 1 -constant) I
component); dequeue: (constant 1 -constant));
subcomponents: (*control; *input-buffer: M.i-unit; *output-buffer: permanency: (decay 1 transmit-destruct I time-multiplexed / tmx / tm 1
M.i-unit); moving I cyclic I permanent I irreversible I fixed until broken /
fixed I manual);
operation: (open I close);
hang-up-delay: [t];
concurrency: (1 12);
delay: delay(1inks);
concurrency-type: (simplex1 half-duplex I full-duplex / duplex);
L-initiator: initiator(1inks);
i-rate: i-rate(1ink);
technology)
delay: delay(1ink);
hang-up-delay: [t];
A simple-switch consists of a set of potential links between a set of
access-time / ta: constant [t])
input and output components, with an operation (access) that can actual-
ize some subset of the links. This is done according to an instruction called
A gate-switch acts as a simple-link or as no connection. It is used to trans-
the address (which may or may not be held in a memory). For a switch,
mit information conditionally between the ports of two components. It
the cocomponent input and output ports are sometimes listed to specify
can be used as a basic primitive to express the structure of other switches,
the size of the switch.
including the simple-switch. The parameters will be discussed under the
An important parameter is the concurrency-type, which describes the
simple-switch.
various subsets that can be simultaneously realized. The values given cor-
respond to practical alternatives-simplex, in which only a single simplex
7.R simple-switch := component (
link may be established at a time; duplex, in which a single full-duplex
link may be established; cross-point (also dual-cross-point), which permits
true simultaneity; time-multiplexed-cross-point, in which functional simul-
cocomponents: (input/from: component-set, output/to: component-set,
taneity is established for many links by means of rapid switching within
initiator: component-set);
the course of transmission of an i-unit (in essence the time multiplexed-
subcomponents: control, links: link-set, *address: memory; cross-point has 1-trunk, which permits 1 conversation); and finally k-trunks
operation: access; for k-simultaneous conversations. We often use a duplex switch instead of
simplex or half duplex switch in PMS diagrams, even though the latter
size: size(output(cocomponents)); would be more accurate.
concurrency: + integer; Hierarchy is a redundant attribute derived from the cocomponent set.
As a rule, if there are n identical cocomponents each of which communi-
concurrency-type:(simplex 1 half-duplex I full-duplex/duplex 1 cates with one another, there is no hierarchy. A telephone system is a
dual-simplex 1 dual half-duplex 1 dual full-duplex / dual-duplex 1 typical nonhierarchical structure. Usually the switches internal to a com-
time-multiplexed-cross-point / 1 trunk I cross-point 1 dual-cross-point 1 puter are hierarchical in that there are n components of type a which
k-trunk); communicate with m components of type b. The a’s only communicate
hierarchy: (hierarchical1 nonhierarchical / anarchical); with the b’s and vice versa: hierarchy does not determine the component
initiating the dialogue.
location: (central I distributed (cocomponent set)); The location of a switch refers to whether the hardware is localized
distribution: (radial 1 bussed / \)us / chain / daisy chain); within one of the components using the switch, whether it is separate
(called central), or whether it is distributed through all the cocomponents.
accesq-time / ta: switch-type(address / a, prior-address / p )
An attribute that is not completely independent is distribution, which
switch-type : = ( denotes whether the physical structure is a continuous bus or chain or is
624 Appendix
fed radially from a centralized component. See Fig. 13, Chap. 3, page 67 A compound-switch is an array of switches whose links are connected
for common alternative physical structures. so that the outputs of some are inputs to others and thus effects a total
A major way of classifying simple-switches is by their access time- set of links, which go from output to input component-sets. It can be
cyclic, linear, random, etc. With each is given the type of formula that defined as an extension of a simple-switch, since most parameters are
determines the actual access time. The two critical parameters in most defined identically for both. Many combinations of accessing arrangements
switches are the address being sought (a) and the prior address ( p ) , which are possible. The two most common are given above. A cascade-switch is
represents the existing state of the switch. Thus, in a bilinear switch the one in which each accessing of the next subswitch must take place after
access time consists of a start-up time plus a time proportional to the mag- the prior one so that the access times add. A parallel-switch makes all the
nitude of the difference between the prior address and the desired address. accesses simultaneously, so that the total access time is simply the access
This differs from a linear switch, which only permits movement in one time of the subswitch that takes longest. (In both cases, there can be ad-
direction and must reset to an initial state if an address lower than the ditional overhead time, but this can usually be allotted to the subswitches
existing address ( p ) is sought. An interleave memory is one that consists of and does not require separate terms in the expressions for access time.)
a collection of random-access memories, depending on the relationship
between a and p (usually a modular one, such as (a = p mod 4) + long
access; (1 # p mod 4+ short access). Random access means that the access 8. Control , I
time is independent of both a and p . This constancy may be only approxi- 8.1 Control / K := simple-control I compound-control
mate (as in using a drum with its cyclic character ignored). Queues and
8.2 simple-control := component (
stacks differ from the other switches in having a degenerate addressing
system such that the next link selected is determined by the state of the cocomponents: controlled / object: component-set, *instruction:
switch itself. Dequeues allows either of the two ends of a queue to be component-set, *data: component-set;
accessed.
subcomponents: *instruction: memory, working / w: memory,
Permanency refers to how long the switch maintains a link (or set of
operations: data-operation;
them) after establishing the link by an access operation. The three com-
mon values are (1)the destruction of the connection with the transmission operations: evoke / -+, next-evoke / next, condition-operations;
of the i-unit across the link, ( 2 ) the maintenance of the connection perma- controlled-operations: (controlled-component: operation)-list;
nently, and (3) the autonomous movement of the connection (as in disks
and drums). The latter two give rise to the p used in the access formulas. instruction-source: (none I data 1 instruction);
Rarer is a decay function, in which the link remains established for some instruction-set)
period of time, or an irreversible connection, which can be set just once F ' , I
and from then on operates like a simple-link. A simple-control is a logical circuit (usually sequential) that evokes
Hang-up delay is the time taken to break a connection after the appro- operations in other components (the controlled, or object, components).
priate i-unit has been transmitted. Hang-up delay is given only for certain Thus, its main operations are those of evoking and evoking-next (symbol-
permanencies of fixed-until-broken and manual switches. ized as + and next in ISP). However, it must also detect conditions on
A number of parameters derive directly from the properties of the set which such evoking depends, so that it has available additional operations,
of ports or links-the size of the i-unit, the information-rate, the link de- that are combined in an instruction-set (see ISP 2.1). These vary greatly
lay, the direction of data flow, and the component that can initiate data in complexity, from boolean operations to arithmetic operations (such as
transmission (as opposed to initiating accessing). Finally, there is tech- counting the number of i-units processed).
nology, which is not given in detail, since much of it is identical to A major distinction is the source of the external instructions that can
memory technology. be given the control. At one extreme there may be none, as in a clock
whose function is to interrupt the system every millisecond. The common
EXAMPLES S('I/O BUS; location: K; from:P; to:K; half-duplex; initiators: case is that in which all the external instruction comes via the data itself.
P, K; switch-type: random; ta: 5 p ; concurrency: 1) More complex controls have a separate set of external instructions (often
S(cross-point; 16 M; 6 (P + K); concurrency; 6; location: M) called control characters or commands). A control does not obtain its own
next instruction, being dependent on an external component to set it into
action. This is the primary characteristic that distinguishes it from a proc-
7.4 compound-switch := simple-switch (
essor. It does have an instruction-set, which is the ISP expression that
shows what conditions evoke what actions.
subcomponents: control, links: link-set, subswitches: switch-set,
No technology is given, since controls are all realized in a logic tech-
*address: memory;
nology, as given in the definition of component. Likewise, no function
access-time: (cascade: sum(access-time(subswitches))I parameter is given, since there exists no special vocabulay to designate
parallel: max(access-time(subswitches)) ) ) the different subspecies of control tasks.
Appendix 625
EXAMPLES K(Mp; input: Pc; output: Mp) transducer-technology := (analog-digital converter 1 bell 1 buzzer 1 TV
K(D(multiply)) camera / vidicon I card reader 1 card punch 1 CRT display I storage
CRT display 1 plasma display 13 D display 1 printed document
8.3 compound-control := simple-control (
reader / document reader I document printer I magnetic character
subcomponents: alternatives: simple-K-set, *instruction: memory, document reader(f2m reader(fi1m)writer)gongljoystick) keys)
working: memory; keyboard I light gun 1 light pen I continuous line plotter I line printer /
printer 1 linear actuator 1 SRI mouse 1 paper tape reader 1 paper tape
instruction-source: mode-instructions) punch I incremental point plotter I pressure transducer I speech
synthesizer I Rand tablet 1 Sylvania tablet 1 telephone dial I push
button telephone dial I thermocouple I Lincoln Laboratory Wand) )
A compound-control consists of a collection of alternative simple-controls
and can be given as an extension of the simple-control. At any time, the
A simple-transducer is a pair of connected links that have different i-units
control is one of these simple-controls. Determination of what simple-
and/or underlying carriers. As defined above, transduction is a digital op-
control is operative (often called the mode the control is in) is by a mode-
eration, taking in an i-unit of the input link and producing an i-unit of the
instruction from some external component. This additional freedom re-
output link. Meaning is preserved; that is, only the encoding has changed.
quires a subcomponent, the control-state, to hold the current specification.
Preservation of meaning distinguishes transduction from data operation.
(Thus it is possible, though rare, that the actual simple-K is determined
The amount of information need not be preserved, so that information
by a sequence of mode-instructions, each determining some part of the
divergence is an additional characteristic of a transducer. It may be posi-
control state.)
tive or negative, as the net number of bits is either increased or decreased.
A simple-transducer is called a simplex, in that information flow is in
EXAMPLE K(1nstruction set processor/ISP; input:M.processor,state; out-
one b e d direction only (as in a simple-link).
put: D, K(Mp), K(L('I/O Bus)); M(read-write; 40 b; working);
Knowing the function of the transducer permits an inference of whether
M(read only; 100 w; 36 b/w 1 ps/w))
one interface of the transducer involves a human being. This inference
can be derived from the port characteristics.
9. Transducer
9.I Transducer / T : = simple-transducer 1 compound-transducer EXAMPLE T(1ine printer; 1000 lines/m; 132 char/line; 8 bit/char)
9.2 simple-transducer := component ( T(paper tape; reader; 300 char/s; 8 b/char; width: 1 in.)
T(sense amplifier; i-rate: .5 w/s; 24 b/w; input: M(memory
cocomponentx input: component, output: component, initiator: stack))
(input 1 output 1 both);
9.3 compound-transducer := (
subcomponents: input: L, output: L, 'control;
functional-name: (input: reader / sensor / pen / receiver; output: simple-transducer-set:
writer / punch / perforator / display / printer / transmitter; concurrency-type: (half-duplex[ full duplex);
synchronizer isolator; transducer); compound-transducer-technology;
'portability: (portable I not portable / fixed); A compound-transducer consists of a set of simple-transducers. The two
concurrency-type: simplex; simplest kinds are the half-duplex and the full-duplex, which are extensions
of the simple-transducer, wherein the direction of information flow can be
concurrency: I;
either way but only one way at a time (half-duplex) or can be both ways
transduction-technology := (amplification1 analog-digital I angular- simultaneously (full-duplex).The more general case is simply a set of trans-
linear \ attentuation I electroluminescence 1 electromagnetic I ducers with independent inputs and outputs (so that overall there is no
electromechanical 1 electromechanical-acoustic I electro-optical I switching function). It is common to call this a multiplexed transducer in
mechanical-indentation I photochemical I xerographic) which concurrency is specified by an integer.
626 Appendix
A data-operation creates information (i.e., new instances of data-types) cocomponents: primary: M-set, *secondary: M-set, controlled:
that has new meaning. It usually does this as a function of input informa- component-set;
tion (e.g., a floating point multiply which creates a floating point number function: (microprogram 1 central / general purpose / c 1 input-output /
'
that represents the product of the two input numbers). It may or may not io I display I array I vector move I special algorithm I language)
destroy some existing information (e.g., a tally operation, which modifies
subcomponents: (interpreter: K; data-operations: D-set; M.processor-
the existing number in creating the new one). A data operation differs
from a transducer (T), since its output differs in meaning from its input. state / ps: see PMS Tuhle 1; M.nou-processor-state: see PMS
The T preserves meaning, while changing representation. Table 1;
The data-operation takes the data-type i-units at the input ports, oper- operations: operations(data-operations),operationsjcocomponents)
ates on the data, and presents the result at the output port. The simple- see ISP;
data-operation can perform only one operation at a time. The simplest D
data-types: data-type(operations) see 1%';
is just a set of transfer paths between registers for performing some oper-
ation on a boolean vector (that is, A A B, A @ B, lA) or a combinational cycle-time / tc: cycle-time(Mp);
network (that is, X = 0). Slightly more complex D's are the additive op-
i-rate: i-rate(Mp);
erations on integers (+, -). Operations like X , / are usually constructed
from more primitive D's, +, -, and (/2), with a subcontrol (K) to step concurrency: (a-rate / cycle-time) [o];
through the various substeps of the arithmetic algorithm. Finally, a float- program-switching-time: It];
ing point multiply would be formed as a sequence of simple-data-opera-
tions controlled by one or more common subcontrols. interrupt-response-time: It];
instruction-set see ISP 2.1;
D operation: +; data-type: fixed; i-unit: 32 b;
1
EXAMPLE
instruction-efficiency: (operations / instruction) / instruction-size [o/i];
[operation-time: .2 p .. algorithm-encoding-efficiency: (sum(data i-units/[t])/
D floating point multiplier; data-type: f; i-unit: 36 b;
[operation-time: 2.0 ps; M.working ( 3 x 36 + 10)b 1 instruction-size: [i];
siim(data i-units + instructions)/[t]));
+
addresses-per-instruction: (0 address / stack 1 1 address / 111 index / not be if a processor use5 an incremental or relative addressing system.
(1 + x) I 1 + general register address / (1 + g) 12 address 13 address I The ratio can he measured at many levels of the ISP: instruction-by-
n + 1 address I compound)) instruction, on a subroutine, or for a whole program. In a simple computer,
this ratio is near y2.Vector operations can allow a ratio much closer to 1.
A simple processor is always associated with a memory (its primary mem- Common measures for the instructions give the size of the operation
ory), which holds the program (and usually the data) for the processor. code, the address, and the instruction. The addresses per instruction is one
In addition, there may he secondary memories and also other components of the best parameters to indicate the overall structure of the instruction
that are controlled by the processor. set and is called the instruction-type. It ranges from 0 addresses (systems
The processor often functions as the main component of an essentially which execute a sequence of operations) through 1,2, and 3 addresses per
isolated system (often called stand-alone); it is then a central processor, Pc. instruction to variable number of addresses. Between 1 and 2 addresses lie
Processors also occur as more specialized components in larger systems; e.g., index register (1 + x) and general register (1 + g) machines. In a special
to manage input/output (Pia) or display (P.display) or to do a subset of class is the (n + 1) organization, which involves an additional address to
data-operations efficiently (Pdata, P.vector,move, P.array, or P.specia1-- obtain the next instruction; it can be added to any other organization.
algorithm). Processors are sometimes built in hierarchy, using one processor
to perform the interpretation and operations of another. Such processors EXAMPLES Pc(’DEC PDP-8; 1 address / instruction; -2 w/ instruction;
have become known as microprogram processors. 12 b/w; 1.5, 3.0, 4.5 ps / instruction)
The distinguishing feature of a processor is that it determines its own Pio(’1BM 7909; 500 kw/s; data-types: words; integer; 1 ad-
next instruction. The control that does this is called the interpreter. The dress / instruction: 36 b/w )
repertoire of operations of the processor is partly a set of data-operations
performed by its own subcomponents and partly the set of operations
proper to a set of transducers, memories, links, and switches external to 11.3 complex-processor : = simple-processor (
the processor but incorporated into its operation code. The operations are Mp-concurrency: (1 PI 1 P with interrupt I 1 program with multiple
largely determined by the set of data-types (see the ISP section). concurrent subprograms 1 1 Pc - n Pi0 1 monitor + 1 user program I
A processor may have considerable internal memory (called the proc- monitor + 1 swapped program 1 fixed multiprogramming)
essor state, Mps). Besides the instruction and instruction-address registers, multiprogramming I segmented-programming);
which are necessary for interpretation, there may be various amounts of multiprogramming := (no relocation I protect only I 1 segment I
status information, accumulators, index registers, general registers, and >
2 segment / pure 1 impure segments I 1 segments I paging)
accumulator stacks. No one system has all of these memories, since they segmented-programming := (fixed length page segments I
often provide alternatives to each other (e.g., index registers and general multiple length page segments I variable length page segments 1
registers). named segments);
Each of the operations has its own operation time and its own possi-
P-concurrency: (serial / serial by bit I parallel / parallel by word 1
bilities for being overlapped with other operations. Several parameters are
multiple instruction streams I multiple data streams (arrays) I
given that summarize this array of information: the cycle-time of Mp,
pipeline processing I instruction-memory );
which in the long run limits the rate at which instructions and data can
be accessed (and also determines the maximum throughput); the concur- instruction-memory := (none 1 1 instruction look ahead1n instruction
rency, which tells how many operations can be performed per cycle time look ahead I cache / look aside / slave memory))
(this requires an averaging of the various possibilities as given in the in-
struction set); and the program-switching time, which is the time required A complex processor is often an extension of a simple processor along the
to change context from one program to another. In simple operating re- dimension of memory mapping, since a processor is already a highly struc-
gimes (standard batch processing) program-switching time is not an impor- tured and “complex” component.
tant parameter; it becomes so when interrupts are permitted. For inter- Note that a collection of processors does not constitute a compound
rupts, the response time is critical. It is the time between when a request processor in a way similar to other PMS components; hence, we denote a
is made and when the request is acknowledged by P. The instruction set general collection of processors as a computer. Thus, a complex processor
is really an entry point to the ISP description of the processor. One might can be written in terms of a simple-P with new values. The central proc-
give here simply the number of instructions, but this can be a very mis- essor using a microprogrammed processor contains a specialized processor
leading number, since many variations of a basic instruction can be counted as a subcomponent (P.microprogram).
thus giving highly erroneous results. The algorithm-encoding-efficiency is Three attributes separate a simple processor from a complex processor:
the ratio of i-units uaed for data per unit time to the number of accesses Mp-concurrency, P-concurrency, and instmction-memory. In essence, the
for data + instructions per unit time. This efficiency is strongly affected simple processor has no Mp concurrency (interpreting a single program)
by the address size, which is usually the address size of the Mp but need and serial or parallel P concurrency, with no instruction-memory (buffer-
628 Appendix
ing for multiple instructions). These attributes are independent of one quires augmenting the subcomponents to include a set of Pc’s. Other than
another and are discussed in Chap. 3. this, compoundc’s are the same as simple-C’s, although some parameters
(such as instruction-type) may not have simple values if several Pc’s differ
radically.
The simpler compound-C’s retain a single Pc, but add input/output
12. Computer processors (Pio’s and then P.display’s). The next step is to limited multi-
12.1 Computer / C := simple-computer 1 compound-computer 1 network processing, with 2 Pc’s, and on to n Pc’s operating on many programs, and
finally to parallel processing operation on many tasks of a single program.
12.2 simple-computer := component ( A parallel processor is distinguished from a network; namely, there is no
way to decompose a parallel processor into disjoint C’s (with Pc’s and
structure: 1Pc 1 1 Pc.interrupt;
Mp’s). In both multiprocessing and parallel processing there may or may
sulxomponents: Pc, Mp-set, *controlled: component-set(Pc); not be Pio’s, P.display’s, and other special-function processors.
cocomponents: none:
EXAMPLES C(l Pc-8 Pia; ‘IBM 7094 11; Mp (32768 w; 1.4p/w; 36 b/w);
function: (scientific1 business data processing 1 general purpose 1 process Pc(1 address; 1 instruction / w; Mprocessor state: 12 w; data-
control / control I communication := (switching I store and forward) 1
terminal control / input-output / io 1 display1 file processing / file
-
types:(integer, word, bv, sf, suf, df, duf, fri); 1962 1966)
C(mu1tiprocessor; ‘Burroughs D-825; Mp(65 kw; 4.8p/w; 48
control I time-sharing); b/w); 16 (Pc, Kio); Pc(stack; 12 b/syllable; 1 - 7 syllable /
access-time: access-time(Mp); instruction; data-types: integer, floating, single character,
boolean vector))
cycle-time: min(cyc1e-time(b1p));
access-type: access-type(Mp.min); 12.4 network/N := dual-C 1 network-C I C-set.
A network is any collection of two (dual-C) or more computers not inter-
instruction-type: instruction-type(Pc))
connected through primary memory. The network-C is a special case of a
single physical structure which is usually called a single C but by its
A simple computer consists of a single Pc (possibly with interrupt capa-
structure is a network (for example, CDC 6600). Finally, a set of inter-
bility) with an Mp (possibly a set of them) plus some set of transducers,
connected computers that are physically separate are the most general
Ms’s, switches, and controls. It is a complete system that can stand alone
case of networks.
and accomplish processing for a wide variety of functions.
Almost all of its significant parameters are derived from those of the
Pc or the Mp (using the Mp with the minimum cycle time if there are ISP conventions
several Mp’s).
Making use of the prior general conventions and the PMS definitions, ISP
E.UMPI,ES C(’Whir1wind I: Mp(core; 8 p / w ; 2 0 4 8 ~ 16
; b/w); is developed systematically. We do this only for the processor and not for
Pc(M.processor_state: -2w; 1 instruction/w; 1 address/ controls (though the system might be adapted to that end). Several nota-
instruction); I948 -1966) tions are added to make ISP conform with currently existing notations.
-
C(’LGP-30; technology: vacuum tubes; power: 1500 watts;
Mp(drum, 4096 w; 31 b/w; t.access: ,260 1 6 . 6 ~ 1 ~ ) ;
Pc(1 address/instruction; 1 instruction/word; Mps: -2w))
The top-level entities of ISP-data-types, operations, the interpreter,
and the instruction-set-are values of corresponding attributes in the PMS
definition of a processor. An image of all the PMS structure for a computer
system exists in the instruction set of the processors that control the PMS
components. PMS notation is assumed for this. In ISP the primary mem-
12.3 compound-computer := simple-computer(
ory (Mp) is usiially named M; all other memories must be specifically
structure: ((1Pc, n Pio)((lPc, n Pio, P.display)J(2Pc)\(n Pc multi- declared and named.
processor) 1 (n Pc, P(array) 1 (n Pc, special algorithm) 1 (n Pc parallel
processor)); 1 Data-types
siilxomponents: Pc-set, Mp-set, *controlled: component-set(Pc-set)) 2 Instructions
3 Operations
The essential feature of compound computers is to have more than one
processor. This is indicated primarily by the structure parameter but re- 4 Processors
Appendix 629
A data-type specifies the encoding of a meaning into an information me- 1.3 data-type : = i-unit
dium. The meaning of the data-type (that which it designates or refers to) The simplest data-types are i-units. An i-unit as a data-type implicitly
is called its referent (or value). The referent may be an entity, ranging determines the five defining parameters given in ISP 1.2. The referent is
from highly abstract (the uninterpreted bit) to highly concrete (the pay- the uninterpreted i-unit itself (k,a word is to be handled only as an
roll account for a specific type of employee). The encoding of this refer- uninterpreted unit of information). There is no need for a referent expres-
ent either is directly understood (as when a bit encodes a bit) or must be sion. The carrier is the i-unit itself, if it is an i-unit capable of independent
given by the referent expression in terms of the component data-types. storage and transmission in the system. If not, then the carrier is the
smallest such i-unit that contains the given i-unit. The component data-
EXAMPLE binary-floating-point-number := data-type( types are the first sublevel of structures of the i-unit. There are no com-
ponents if the i-unit is a base-unit (bit or undecomposable character). If
referent: number; the i-unit is the carrier, no format is needed. If a larger carrier is required,
component-list: mantissa, exponent: then a mapping is usually implicit (e.g., 1 bit in a word goes into the low-
order position; 1 word in a block goes into the first word, etc.). If not, a
referent-expression: mantissa x 2 t exponent) format must then be given in the regular way.
COMMENT Note that in the referent expression the component data-types 1.4 data-type : = data-type-name
are taken to designate their values, Le., a signed fraction and an exponent
data-type-name : = i-unit-name 1 simple-name I
is an integer. This avoids a clumsier notation in which one could write:
component-name . length-type I precision . data-type-name 1
component. component. . .
referent(mantissa) x 2t referent(exponent).
length-type := array / a 1 string / st 1 vector / v
Associated with every data-type is an i-nnit, called its carrier, into
which all its component data-types can be mapped. The carrier is used in precision : = +integer I multiple / m 1 quadruple / q 1 triple / t I
storing the data-type in memories and in transmitting it over links. It must double / d 1 *single / 7 1 half / h 1 fractional / fr
he extensive enough to hold all the component data-types, hut it may be
larger (having error-checking and -correcting bits, or even unused bits). A naming scheme is provided for data-types, which can be used as a basis
It need not hold disjointly all the carriers of the component data-types, for abbreviations. Some data-types have arbitrary simple names (e.g., char-
since packing may occur. However, the component data-types must all acter, floating point numbers); others are named by their value (e.g., in-
have their relative structures preserved (or they cannot be processed). The teger). Data-types that are iterations of a basic component can be named
mapping of the component data-types into the carrier is called the format. by the component suffixed by a length-type. The length-type can be array/
It is given as a list that associates to each component a memory expression a, implying a multidimensional array of fixed but unspecified dimensions;
involving the carrier (see ISP 2 for definition of memory-expression). a string/st, implying a single seqnence of variable length (on each occur-
630 Appendix
complex : = data-type(components: real, imaginary; usually floating data-expression n-ary operation data-expression . . . I
complex) function(data-expression-list)/ f(data-expression-list)I
operation-expression * { operation-modifier )
field := data-type(carrier: word; components: i-unit-list; format:
(element-range)) operation-modifier : = data-type I name See GC 10
12, 101, 5; + 125, - 126; unsigned; and signed integm 2.3 The instrnction-expression, when interpreted, takes the processor
through a sequence of steps which result (possibly) in some change of state
+72, -999; sign-magnitude of the computer system that holds past the period of interpretation, thus
]()I2, 77,, AS,,; binary, octal and heridecimul constituting a new initial condition for the next instruction. The action
sequence has two structural features. First, steps (and subsequences of steps)
+ 6.257; 6.257 X 10”; mixed, and floating point may he conditional on a boolean value, developed according to a condi-
(1, 2, 2.7); complex tion. Second, steps may be accomplished in parallel or in series. Any set
of steps between two occurrences of the term “next,” are to have all their
digit set specification: stands fool
data expressions developed prior to any transmission of data. Thus, all
l O 2 I 1 1 , ; and 70,171, I . . . 177,
their data is a fimction of the existing state at the start of the sequence.
respectiuel!y
At the occurrence of the term “next,” all pending transmissions are made,
so that the state for the following sequence of steps is now different (if
there were in fact transmissicns to be made).
2. instruction
2.1 instruction : = data-type(referent: instruction-expression; operation- 2.4 All permanent changes in state are accomplished by means of actions,
code: field; operand-li.;t; operand: data-type) which take data developed according to a data expression and transmit it
for storage in a memory, as designated by a memory expression.
instruction-expression := condition + action-sequence
action-sequence := (step I next step)-list EXAMPLES
2.5 The memory expression specifies the contents of a memory (an in- expression may imply the use of memory if it involves nested parentheses;
stance of a data-type) by giving the memory switch (possibly compound), such memory is assumed to he temporary with no permanent effect on
as seen from PMS. However, all that is represented in ISP is the address the memory state.
that is used to control the switch. The address is a data-type, usually rep-
resented as a positive integer. The element-range is a field. In both cases 2.7 A condition is given as a boolean, that is, as either true or false
it is possible to specify an arbitrary list of contents (addresses and fields), (equivalently, 1 or 0), or the result of a boolean expression involving the
although in most processors this can never arise. The address-range x:y logical connectives or relations among data-expressions (see Table 4, ISP 3,
means from address x to address y inclusive. and also GC 13). A condition can also be given as a memory-expression,
in which case the memory contents are normally evaluated as a boolean
EXAMPLES OF REGISTERS vector with all Os being false, and not all Os being true.
binary-arithmetic-operations
add +
subtract -
inverse subtract
multiply X
divide / where only n, or n2 may be used to
give n4 or n5
inverse divide see divide similar to inverse subtract
modulo mod il mod i2 i, - (i, / iz) x i2 remainder
conversion-arithmetic-operations
f ix-to-float f, float(i ,) integer or fixed to floating
float-to-fix 12 f ix(f ,) floating number to integer
unary-vector-operations radix r; note if r = 2, the character
is a bit
end-around.shift (rotate) v3 x r (rotate} v1 x ri2(rotate}
v3 ,' r (rotate} v, / r'X{rotate)
logical-shift v3 x r (logical} v1 x r'z{logical} the most or least significant digits
v3 ,' r {logical} v1 x ri.{logical} receive 0's in the shift
tally/count 12 tally(b.v) count 1's in a vector
sign extend n2 sign,extend(b.v,) copy sign of b.v to fill vector in n2
n-ary-arithmetic-operations
minimum min smallest of n1 . . . n,
maximum max largest of n, . . . nm
summation sum n, + +
n 2 . . . nm
average avg n, + n2 . . . nm) / m
product prod n, x n 2 . . . x nm
relational-i-unit-operations comparison of two i-units
identical b3
not identical b3
relational-arithmetic-operations comparison of two numbers
equality
h e quaIity
less than
greater than
less than or equal to
greater than or equal to
boolean-operations
false (0) 0 all 16 possibilities are listed
and A
null
null
exclusive or;
inclusive or
nor/ Pierce stroke
coincidence or
Appendix 635
not 1 b3 1b2
implication inverse b3 bl V l b
not 1 b3 l b l
implication 3 b3 7bl V b2or b, 3 b,
nand/Sheffer stroke t b3 i b 1 V 7b2 o r l ( b 1 A b2)
true (1) 1 b3 1
boolean-operations (common set)
not 1 b3 1bl
and A b3 b l A b,
or V b3 b l V b,
exclusive or 0 b3 b l 0 b,
boolean-operations (sufficient sets)
nand t b3 l ( b i A bz)
nor 1 b3 l ( b i V bz)
not 1 b3 b
ll this pair of operations are required
and A b3 b, A b,) for sufficient set
4.3 Instruction Format. The instruction formats are usually declared in first defined, so that the data-operation definitions can use them. For ex-
the same bashion as memory and are not distinguishable as special non- ample, a precise definition of an ISP would include the data-type formats
memory entities. Normally, the instriictions are carried in registers; it is (for example, floating-point), followed by a definition of each data opera-
thus natural to give declarations in this fashion. Usnally only a single dec- tion (for example, +, -, X , /). Normally, we do not give enough infor-
laration is made, the instruction/i, followed by the declarations of the mation about the data-type and its appropriate operation implementation
parts of the instruction-the operation code, the address fields, indirect in our description of machines, since the information for these descriptions
hit, etc. is obtained from the programming manuals. If we were actually to use the
ISP descriptions, as an interpreter nsing a compiled or interpreted lan-
EXAMPLE guage, then only a few well-defined primitives would exist in the language
and all other operations would have to be defined in terms of these primi-
i/instrnction[0:4](0:7) five 8 hit byte instruction tives for each ISP. ISP 2 and ISP 3 describe how the various data-types
and operations are declared.
op(0:4) := i[o](0:4) oprode
EXAMPLES
Rim 4 (instruction c hl[PC]; fetch (PC/progrum counter)
PC t PC +
1: next
Conditional register definition Instruction-execution) execute
z(0:ll) := z‘; effective address
i + (M[z’] + 1; with side effects
In more complex processors the conditions for trapping and interrupting
+- M[z’] + 1)) must be described. Also, in the interpretation process it is often more
G := M[g] operand definition descriptive to carry out part of effective address calculation prior to In-
shift-count / SC(0:ZS) : =
process struction-execution. See below.
(TF c e’; F + z) EXAMPLE
index convention
E’(21:35) :=
((T = 0) 4 (T # 0) + XR[T] + y) -, interrupt A Run+
Declarations in terms of a variable puranaeter
(op[O] c M[PC]; PC c PC + 1; next fetch
hlp[z] := ((2 > FL) + Mp[z + RA]: long instruction 4
(z 2 FL) + (Run c 0: only side effects, (op[ll + M W I ; (op[ll c M[PCl: fetch more instruction
violation c 1)) no value PC t PC + 1);next if a long instruction
Evaluated expressions Instruction-execution) execute
add-instruction := (op = 5 ) boolean interrupt A Run + (M[O] t PC; PC t 1: interrupt, .saw
z(0:6) : = (a(0:5,7) + b( 1:7)) 7 bit calue
interrupt t 0) PC and go to M [ 1 ]
skip-condition : = (?Q A d( 15) V z ( 6 ) )
The IRM 1401 interpreter (Chap. 18) requires a separate process to fetch
4.5 Data-type Foimat and Special Data-Operation Definitions. The com- the operands addresses prior to execution in a variable-length instruction.
ponent parts of the data-types are named, and their element ranges are The fetch is based on the specific instruction to be executed next.
Appendix 637
EXAMPLE EXAMPLE
Journals
A CM Association for Computing Machin-
Pt. Part
ery
Res. Rept. Research Report
A DC Automatic Digital Computation
SUPP. Supplement
AFIPS American Federation of Informa-
tion Processing Societies Symp. Symposium
AIEE-IRE Conf. American Institute of Electrical Trans. Transactions
Engineers-Institute of Radio En-
gineers Conference Reports, manuals, and miscellaneous
A p p I. Sci. Res. Applied Scientific Research “Study of a Computer Directly Implementing an Algebraic Lan-
guage,” AD633-727, Air Force Office of Scientific Research
EJCC Eastern Joint Computer Confer-
Contract AF19(628)-2798.
ence
Control Data 6600 Computer System Reference Manual, 1st ed.
FJCC Fall Joint Computer Conference
Publ. 450, Copyright 0 1963, Control Data Corporation, Min-
SJCC Spring Joint Computer Conference neapolis 20, Minn.
WJCC Western Joint Computer Confer- ‘ ‘ Digita I Sma II Computer Handbook, ” 1967 Edition, Copyright
ence 0 1967, all rights reserved, Digital Equipment Corporation,
IBM J. of Res. and V e u . IBM Journal of Research and De- Maynard, Mass.
velopment Programmed Buffered Display 338 Programming Manual-
IBM Sys. J. IBM Systems Journal PDP-8, DEC-08-G61C-D,Copyright 0 1967, all rights reserved,
Digital Equipment Corporation, Maynard, Mass.
ICIP International Conference on Infor-
mation Processing A22-6703, IBM 7094 Principles of Operation, Data Processing
System, Copyright @ 1959, 1960, 1961, 1962, International
IEE Institution of Electrical Engineers, Business Machines Corporation.
London A22-6821-4 IBM System/360 Principles of Operation.
IEEE Institute of Electrical and Elec- A22-6810-8 IBM System/360 System Summary.
tronics Engineers
IBM System/360 Functional Characteristics Manuals for each
IFIP International Federation for Infor- Model
ma tion Processing
IBM System/360 Configurator (diagram) for each Model.
IRE Institute of Radio Engineers
IBM OS/360: PL/I Language Specification, Form C28-6571,
Psychology Res. Psychology Review p. 74.
H20-0223-0, IBM System/360 Attached Support Processor
General
System (ASP) System Description, Copyright 0 1966, Interna-
Bull. Bulletin tional Business Machines Corporation.
Comm. Communications A24-1403-5, IBM 1401 Reference Manual, Data Processing
Conf. Conference System, Copyright 0 1960, 1961, 1962, International Business
Machines Corporation.
Cong. Congress
225-6487-3, IBM 1401 Customer Engineering Reference Man-
J. Journal ual, Copyright 0 1960, 1961, 1962, 1963, International Busi-
Proc. Proceedings ness Machines Corporation.
638
Bibliography 639
A26-5919-4, IBM 1800 Data Acquisition and Control System bridge, Mass. Specificquarterlies used: January,
Configurator. 1966, vol. 6; no. 1; 1st and 2nd quarters, 1967,
A26-5918-5, IBM 1800 Functional Characteristics, Copyright 0 vol. 7, nos. 1, 2; 4th quarter, 1967, and 1st
1966, International Business Machines Corporation. quarter, 1968, vol. 7, no. 4, vol. 8, no. 1, (first
published in 1960).
IBM 1620 FORTRAN: Preliminary Specifications, Form J29-
4200-2, April, 1960. AdamC6O Adams, C. W.: A Chart for EDP Experts, Datama-
tion, vol. 6, pp. 13-17, November-December,
FORTRAN Specifications and Operating Procedures, IBM 1401,
1960. See AdamA6O.
IBM Systems Ref. Lib. C24-1455.2.
AdamC62 Adams, Charles W.: Grosch’s Law Repealed,
International Business Machines Corporation, General Infor- Datamation, vol. 8, no. 7, pp. 38-39, July, 1962.
mation Manual FORTRAN, Form F28-807401, December, 1961.
AinsE52 Ainsworth, Ernest: SEAC Input-Output Operat-
Type 650 Magnetic Drum Data-processing Machine (Manual of
ing Experience, AIEE-IRE-ACM Conf., pp. 44-
Operations), Form 22-60 60-1, International Business Machines
47, December, 1952.
Corporation, New York, 1955.
AlexS5I Alexander, S. N.: The National Bureau of Stand-
Librascope LGP-30, Manual, Librascope, Inc., 80 Western Ave.,
ards Eastern Automatic Computer (SEAC),
Glendale, Calif.
AIEE-IRE Conf., pp. 84-89, December, 1951.
Olivetti Underwood Programma 101General Reference Manual,
AllaR64 Allard, R. W., K. A. Wolf, and R. A. Zemlin: Some
Olivetti Underwood Corporation, One Park Avenue, New York,
Effects of the 6600 Computer on Language
10016. Structures, Comm. A C M , vol. 7, no. 2, pp. 112-
Pegasus Maintenance Manuals, Ferranti Ltd., London. 119, February, 1964.
Pegasus Programming Manual, Ferranti Ltd., London. AlleM63 Allen, M. W., T. Pearcey, J. P. Penny, G. A. Rose,
Proceedings Conference on Spaceborne Computer Engineering, and J. G. Sanderson: CIRRUS, An Economical
Anaheim, Calif., Oct. 30-31, 1962. Multiprogram Computer with Microprogram
Control, IEEE Trans., vol. EC-12, no. 6, pp.
Scientific Data Systems Reference Manual, SDS 930 Computer, 663-671, December, 1963.
Copyright 0 1965, 1966, 1967, Scientific Data Systems, Inc.,
1649 Seventeenth Street, Santa Monica, Calif. AllmR62 Allmark, R. H., and J. R. Lucking: Design of an
Arithmetic Unit Incorporating a Nesting Store,
Scientific Data Systems Reference Manual, SDS 9300 Computer, Proc. IFIP Cong. 1962, pp. 694-698, 1962.
Copyright 0 1963, 1964, 1965, 1966, 1967, Scientific Data
Systems, Inc., 1649 Seventeenth Street, Santa Monica, Calif. Alon R60 Alonso, R. L., and J. H. Laning, Jr.: Design
Principles for a General Control Computer, In-
Symposium on Multi-programming (Concurrent Programs), stitute of Aeronautical Sciences, New York,
Information Processing, 1962 Proc. IFlP Congress, pp. 570- S. M. Fairchild Publ. Fund Paper FF-29, April,
575, North-Holland Publishing Company, Amsterdam, 1963. 1960.
Univac Scientific Electronic Computing System Model 1103A, AlonR61 Alonso, R. L., J. H. Laning, Jr., and H. Blair-
Form EL338, Remington-Rand Corporation, 1902 West Minne- Smith: Preliminary MOD 3C Programmers Man-
haha Ave., St. Paul W4, Minn. ual, M.I.T. Instrumentation Lab., Kept. E-1077,
“Comprehensive System Manual, A System of Automatic Cod- 1961.
ing for the Whirlwind Computer,” Digital Computer Laboratory, AlonR62 Alonso, R. L., A. Green, H. Maurer, and R.
Massachusetts Institute of Technology, Cambridge 39, Mass., Oleksiak: A Digital Control Computer; Develop-
August, 1955; revised, December, 1955. ment Model l B , M.I.T. Instrumentation Lab.,
Kept. R-358 (confidential), April, 1962.
Alon R63 Alonso, R. L., H. Blair-Smith, and A. L. Hopkins:
Books and periodicals
Some Aspects of the Logical Design of a Control
Ad am A60 Ada ms Associates : Computer Characteristics Computer, A Case Study, IEEE Trans., vol.
Quarterly, summary of the characteristics of EC-12, no. 6, pp. 687-697, December, 1963.
computers being currently manufactured, Cam- AmdaG62 Amdahl, Gene M.: New Concepts in Computing
640 Bibliography
System Design, Proc. IRE, vol. 50, no. 5, pp. puter System, IBM Sys. J., vol. 1, pp. 64-76,
1073-1077, May, 1962. September, 1962.
Amd aG64a Amdahl, G. M., G. A. Blaauw, and F. P. Brooks, BarnG68 Barnes, George H., Richard M. Brown, Maso
Jr.: Architecture of the IBM System/360, IBM Kato, David J. Kuck, Daniel L. Slotnick, and
J. Res. and Dev., vol. 8, no. 2, pp. 87-101, April, Richard A. Stokes: The ILLIAC IV Computer,
1964. Review TeagH65 IEEE Truns., VOI. C-17, no. 8, pp. 746-757,
August, 1968.
Amd a G64b Amdahl, G. M.: Processing Unit Design Consid-
erations, IBM Sys. I., vol. 3, no. 2, pp. 144-164, BartR61 Barton, R. S.: A New Approach to the Functional
1964. Design of a Digital Computer, Proc. WJCC, pp.
393-396, 1961.
AmdaG64c Amdahl, G. M.: The Model 92 as a Member of
the System 360 Family, AFlPS Proc. FJCC, Pt. 11, BashT64 Bashkow, T. R.: A Sequential Circuit for Alge-
vol. 26, pp. 69-72, 1964. Review GrimR65b braic Statement Translation, IEEE Trans., vol.
EC-13, no. 2, pp. 102-105, April, 1964.
AndeD67 Anderson, D. W., F. J. Sparacio, and R. M.
Tomasulo: The IBM System/360 Model 91: Ma- BashT67 Bashkow, Theodore, Azra Sasson, and Arnold
chine Philosophy and Instruction Handling, IBM Kronfeld: System Design of a FORTRAN Ma-
J. of Res. and Deo., vol. 11, no. 1, pp. 8-24, chine, IEEE Trans., vol. EC-16, no. 4, pp. 485-
January, 1967. 499, August, 1967.
AndeJ61 Anderson, James P.: A Computer for Direct Basil57 Basilewskii, Iu. la.: The Universal Electronic
Execution of Algorithmic Languages,AFIPS Proc. Digital Machine (URAL) for Engineering Re-
EJCC, VOI. 20, pp. 184-193, 1961. search,J.ACM, vol. 4, no. 2, pp. 511-519, 1957.
AndeJ62 Anderson, James P., Samuel A. Hoffman, Beck F6 1 Beckman, F. S., F. P. Brooks, Jr., and W. J.
Joseph Shifman, and Robert J. Williams: Lawless, Jr.: Developments in the Logical Orga-
D825-A Multiple Computer System for Com- nization of Computer Arithmetic and Control
mand and Control, AFIPS Proc. FJCC, vol. 22, Units, Proc. IRE, vol. 49, no. 1, pp. 53-66,
pp. 86-96, 1962. January, 1961.
AndeJ65 Anderson, James P.: Program Structures for BernA58 Bernstein, A., M. De V. Roberts, T. Arbuckle, and
Parallel Processing, Comm. ACM, vol. 8 , no. 12, M. A. Belsky: A Chess Playing Program for the
pp. 786-788, December, 1965. IBM 704, P ~ o c WJCC,
. pp. 157-159, 1958.
Andes67 Anderson, S. F., J. G. Earle, R. E. Goldschmidt, BhusA67 Bhushan, A., R. H. Stotz, and J. E. Ward: Rec-
and D. M. Powers: The IBM System/360 Model ommendations for an lntercomputer Commu-
91: Floating-point Execution Unit, IBMJ. o f R e s . nications Network for M.I.T. Memorundurn
and Den, vol. 11, no. 1, pp. 34-53, January, MAC-M-355, July, 1967.
1967. BlaaG59 Blaauw, G. A,: Indexing and Control-word Tech-
niques, I B M J. of Res. and Dev., VOI. 3, no. 2,
ArbuR66 Arbuckle, R. A.: Computer Analysis and Thruput
Evaluation, Computers and Automution, p. 13, pp. 288-301, July, 1959.
January, 1966. B Ia a G64u Blaauw, G. A., and F. P. Brooks, Jr.: The Struc-
ture of System/360, Part I-Outline of the Logi-
ArdeB66 Arden, B. W., B. A. Galler, T. C. O’Brien, and
F. H. Westervelt: Program and Addressing Struc- cal Structure, IBM Sys. J., vol. 3, no. 2, pp.
ture in a Time-sharing Environment,J. ACM, vol. 119-135, 1964.
13, no. 1, pp. 1-16, January, 1966. B laaG64b Blaauw, G. A.: Multisystem Organization, I B M
AstrM52 Astrahan, M. M., and N. Rochester: The Logical SYS. J., vol. 3, no. 2, pp. 181-195, 1964.
Organization of the New IBM Scientific Calcula- BlocE59 Bloch, Erich: The Engineering Design of the
tor, Proc. ACM, Pittsburgh Conf., pp. 79-83, May, Stretch Computer, Proc. EJCC, pp. 48-58, 1959.
1952. BlosR6O Blosk, R. T.: The Instruction Unit of the
BaldF62 Baldwin, F. R., W. B. Gibson, and C. B. Poland: STRETCH Computer, Proc. EJCC, pp. 299-324,
A Multiprocessing Approach to a Large Com- 1960.
Bibliography 641
BockR63 Bock, R. V.: An Interrupt Control for the B 5000 BurdE53 Burdette, E. W.: Characteristics of the Oracle,
Data Processor System, AFIPS Proc. FJCC, vol. Argonne Natl. Lab., Proc. Symp. o n Large Scale
24, pp. 229-241, 1963. Digital Computing Machines, pp. 194-201, AU-
BolaL67 Boland, L. J., G. D. Granito, A. U. Marcotte, gust, 1953.
B. U. Messina, and J. W. Smith: The IBM Sys- BurkA62a Burks, Arthur W., Herman H. Goldstine, and
tem/360 Model 91: Storage System, IBM J. of John von Neumann: Preliminary Discussion of
Res. andDec.,vol. 11, no. 1, pp. 54-68, January, the Logical Design of an Electronic Computing
1967. Instrument, Part I, Datumation, vol. 8, no. 9, pp.
BoutE63 Boutwell, E., Jr., and E. A. Hoskinson: The 24-31, September, 1962.
Logical Organization of the PB 440 Micropro- BurkA62b Burks, Arthur W., Herman H. Goldstine, and John
grammablecomputer, AFIPS Proc. FJCC, vol. 24, von Neumann: Preliminary Discussion of the
pp. 201-213, 1963. Logical Design of an Electronic Computing In-
BowdB53 Bowden, B. V., editor: “Faster than Thought,” strument, Part II, Datamation, vol. 8, no. 10, pp.
Sir Isaac Pitman and Sons, Ltd., London, 1953. 36-41, October, 1962.
BrigH64 Bright, H. S.: A Philco Multiprocessing System, BurkA63 Burks, Arthur W., Herman H. Goldstine, and
AFIPS Proc. FJCC, pt. (I, vOI. 26, pp. 97-141, John von Neumann: Preliminary Discussion of
1964. the Logical Design of an Electronic Computing
Instrument (Pt. I,vol. l),Rept. prepared for U S .
BrooF57a Brooks, F. P., Jr.: A Program-controlled Pro-
Army Ordnance Dept., 1946, in A. H. Taub (ed.),
gram Interruption System, Proc. EJCC, pp. 128-
“Collected Worksof John von Neumann,”vol. 5,
132, 1957.
pp. 34-79, The Macmillan Company, New York,
BrooF57b Brooks, F. P., Jr., A. L. Hopkins, Jr., P. G. 1963.
Neumann, and M. V. Wright: An Experiment in
Musical Composition, I R E Trans., vol. EC-6, no. BussB63 Bussell, B., and G. Estrin: An Evaluation of the
3, pp. 175-182, September, 1957. Effectiveness of Parallel Processing, IEEE
Pacijic Computer Conf., pp. 201-220, 1963.
BrooF59 Brooks, F. P., Jr., G. A. Blaauw, and W. Buch-
holz: Processing Data in Bits and Pieces, I R E CampR52 Campbell, Robert V. D.: Evolution of Automatic
Trans., vol. EC-8, no. 2, pp. 118-124, June, Computing, Proc. ACM, Pittsburgh Conf., pp.
1959. 29-32, May, 1952.
BrooF6O Brooks, F. P.: The Execute Operations, A Fourth CarlC63 Carlson, C. B.: The Mechanization of a Push-
Mode of Instruction Sequencing, Comm. ACM, down Stack, AFIPS Proc. FJCC, vol. 24, pp.
vol. 3, no. 3, pp. 168-170, March, 1960. 243-250, 1963.
BrooR60 Brooker, R. A,: Some Techniques for Dealing CarrJ56 Carr, J. W., Ill, and N. R. Scott (eds.): “Notes
with Two-level Storage, Computer J., vol. 2, pp. on the Special Summer Conference on Digital
189-194, 1960. Computers,” Special Summer Conferences on
BuchW53 Buchholz, Werner: The System Design of the Digital Computers, University of Michigan, Ann
IBM Type 701 Computer, Proc. IRE, vol. 41, no. Arbor, Mich., 1956.
10, pp. 1262-1275, October, 1953. Ca rrJ 59 Carr, John W., Ill: Programming and Coding,
BuchW57 Buchholz, W.: Design Objectives for the IBM in Eugene M. Grabbe, Simon Ramo, and Dean E.
STRETCH Computer, New Computers, Rept. Wooldridge (eds.), “Handbook of Automation,
f r o m the Manufacturers ACM Conf., pp. 99-104, Computation, and Control,” vol. 2, chap. 2, pp.
1957. 77-83,93-98,111-115,115-121, John Wiley&
BuchW58 Buchholz, W.: The Selection of an Instruction Sons, Inc., New York, 1959.
Language, Proc. WJCC, pp. 128-130, 1958. CartW64 Carter, W. C., H. C. Montgomery, R. J. Preiss,
BuchW62 Buchholz, Werner, (ed.): “Planning a Computer and H. J. Reinheimer: Design of Serviceability
System,” McGraw-Hill Book Company, New Features for the IBM System/360, I B M J . ofRes.
\
York, 1962. andDeu., vol.8, no. 2, pp. 115-125, April, 1964.
642 Bibliography
CasaC62 Casale, Charles T.: Planning the CDC 3600, sharing System, AFlPS Proc. SJCC, vol. 21, pp.
AFlPS Proc. FJCC, VOI. 22, pp. 73-85, 1962. 335-344, 1962.
ChasG52 Chase, George C.: History of Mechanical Com- CorbF65 Corbato, F. J., and V. A. Vyssotsky: Introduction
puting Machinery, Proc. ACM, Pittsburgh, Conf., and Overview of the MULTICS System, AFlPS
pp. 1-28, May, 1952. Proc. FrCC, Pt. I, VOI.27, pp. 185-196, 1965.
ChenT64 Chen, T. C.: The Overlap of the IBM System/360 CoxJ68 Cox, Jerome R., Jr.: Economy of Scale and
Model 92 Central Processing Unit, AFIPS Proc. Specialization in Large Computing Systems,
FJCC, Pt. II, vol. 26, pp. 73-80, 1964. Review Computer Design. VOI.7, no. 11, pp. 77-80,
GrimR65~ November, 1968.
ChuC52 Chu, J. C.: The Oak Ridge Automatic Computer, CrawP?? Crawford, P.: Thesis for Master’s Degree, Mas-
Proc. A C M , Toronto Conf., pp. 142-148, Septem- sachusetts Institute of Technology, Cambridge,
ber, 1952. Mass.
ClarW57 Clark, Wesley A.: The Lincoln TX-2 Computer CritA63 Critchlow, A. J.: Generalized Multiprocessing
Development, Proc. WJCC, pp. 143-145, 1957. and Multiprogramming Systems, AFIPS Proc.
FJCC, VOI. 24, pp. 107-126, 1963.
CIay B64 Clayton, B. B., E. K. Dorff, and R. E. Fagen: An
Operating System and Programming Systems DaleR65 Daley, R. C., and P. G. Neumann: A General-
for the 6600, AFlPS Proc. FJCC, Pt. II, vol. 26, purpose File System for Secondary Storage,
pp. 41-57, 1964. AFlPS Proc. FJCC, Pt. I, VOI. 27, pp. 213-229,
1965.
CochD68 Cochran, David S.: Internal Programming of the
9 1OOA Calculator, Hewlett-Packard J., vol . 20, no. DaleR68 Daley, Robert C., and Jack B. Dennis: Virtual
1, pp. 14-16, September, 1968. Memory, Processes, and Sharing in MULTICS,
Codd E 59 Codd, E. F., E. S. Lowry, E. McDonough, and Comm. ACM, vol. 11, no. 5, pp. 306-312, May,
C. A. Scalzi: Multiprogramming STRETCH Fea- 1968.
sibility Considerations, Comm. A C M , vol. 2, no. DarrJ69 Darringer, John A.: The Description, Simulation,
11, pp. 13-17, November, 1959. and Automatic Implementation of Digital Com-
CoddE62 Codd, E. F.: Multiprogramming, “Advances in puter Processors, Thesis for Ph.D. degree, Car-
Computers,” vol. 3, pp. 78-153, Academic negie-Mellon University, College of Engineering
Press, Inc., New York, 1962. and Science, Department of Electrical Engi-
neering, Pittsburgh, Pa., May, 1969.
ComfW65 Comfort, W. T.: A Computing System Design for
User Service, AFlPS Proc. FJCC, Pt. I, vol. 27, pp. DaviD67 Davies, D. W., K. A. Bartlett, R. A. Scantlebury,
619-626, 1965. and P. T. Wilkinson: A Digital Communication
Con tC64 Conti, Carl: System Aspect: System/360 Model Network for Computers Giving Rapid Response
92, AFlPS Proc. FJCC, Pt. 1 1 , VOI. 26, pp. 81-95, at Remote Terminals, A C M S y m p . on Operating
1964. Review GrimR65a. System Principles, Gatlinburg, Tenn., Oct. 1-4,
1967.
ContC68 Conti, C. J., D. H. Gibson, and S. H. Pitkowsky:
Structural Aspects of the System/360 Model 85, DaviG6O Davis, G. M.: The English Electric KDF9 Com-
I. General Organization, IBM Sys. I., vol. 7, no. puter System, Computer Bull., pp. 119-120,
1, pp. 2-14, 1968. December, 1960.
ConwM58 Conway, Melvin E.: Proposal for an UNCOL, DennJ65 Dennis, J. B.: Segmentation and the Design of
Conzm. A C M , vol. 1, no. 10, pp. 5-8, October, Multiprogrammed Computer Systems, J. A C M ,
1958. vol. 12, no. 4, pp. 589-602, October, 1965.
ConwM63 Conway, M. E.: A Multiprocessor System Design, DennJ66 Dennis, J., and E. C. Van Horn: Programming
AFIPS Proc. E;JCC, VOI.24, pp. 139-146, 1963. Semantics for Multiprogrammed Computations,
CorbF62 Corbato, Fernando J., Marjorie Merwin-Daggett, Comm. ACM, vol. 9, no. 3, pp. 143-155, March,
and Robert C. Daley: An Experimental Time- 1966.
Bibliography 643
DesmW64 Desmonde, W. H.: "Real Time Data Processing tal Means, Proc. IEE, Pt. B, vol. 103, Supp. 3,
Systems," Prentice-Hall, Inc., Englewood Cliffs, pp. 437-446, 1956.
N.J., 1964. EnglW62 England, W. A.: Subminiature Computer De-
Dijkstra, E. W.: Solution of a Problem in Con- signed for Space Environments, Proc. Conf. on
current Programming Control, Comm.ACM, vol. Spaceborne Computer Engineering, Anaheim,
8, no. 9, p. 569, September, 1965. Calif., pp. 95-101, October, 1962.
DreyP58 Dreyfus, P.: System Design of the Gamma 60, ErnsH63 Ernst, H. A.: TCS, An Experimental Multipro-
Proc. WJCC, pp. 130-133, May, 1958. gramming System for the IBM 7090, IBM Res.
DunwS56 Dunwell, S. W.: Design Objectives for the IBM Rept. RJ248, 4 1 pp., Yorktown Hts., N.Y., June,
STRETCH Computer, Proc. EJCC, pp. 20-22, 1963.
1956. Est rG52 Estrin, G.: A Description of the Electronic Com-
EcclW 19 Eccles, W. H., and F. W. Jordan: A Trigger Relay, puter at the Institute for Advanced Studies, Proc.
Radio Reo., pp. 143-146, October, 1919. ACM, Toronto Conf., pp. 95-109, September,
1952.
EckeJ51 Eckert, J. Presper, Jr., James R. Weiner,
H. Frazer Welsh, and Herbert F. Mitchell: The EstrGGO Estrin, Gerald: Organization of Computer Sys-
UNIVAC System, AIEE-IRE Conf., pp. 6-16, tems, the Fixed Plus Variable Structure Com-
December, 1951. puter, Proc. WJCC, pp. 33-40, 1960.
EckeJ59 Eckert, J. P., J. C. Chu, A. B. Tonik, and W. J. EstrG63 Estrin, G., 6. Bussell, R. Turn, and J. Bibb:
Schmitt: Design of Univac-LARC System, Part Parallel Processing in a Restructurable Com-
I, Proc. EJCC, pp. 59-65, 1959. puter System, IEEE Trans., vol. EC-12, no. 6, pp.
747-755, December, 1963. Article reviewed by
EdwaD6O Edwards, D. B. G., M. J. Lanigan, and T. Kilburn:
E. G. Newman in IEEE Trans., vol. EC-13, no.
Ferrite-core Memory Systems with Rapid Cycle
5, p. 649, October, 1964.
Times, Proc. IEE, pt. B, vol. 107, pp. 585-598,
November, 1960. EverR51 Everett, R. R.: The Whirlwind I Computer,
AIEE-IRE Conf., pp. 70-74, 1951.
ElboR53 Elbourne, R. D., and R. P. Witt: Dynamic Circuit
Techniques Used in SEAC and DYSEAC, IRE EverR57 Everett, R. R., C. A. Zraket, and H. D. Bening-
Trans., vol. EC-2, no. 1, pp. 2-9, 1953. ton: SAGE-A Data-processing System for Air
ElliW51 Elliott, W. S.: Circuit Standardization in Series Defense, Proc. EJCC, pp. 148-155, 1957.
Working, High-speed Digital Computers, Elliott EwinR64 Ewing, R. G., and P. M. Davies: An Associative
J., vol. 1, no. 2, p. 49, September, 1951; also Processor, AFIPS Proc. FJCC, Pt. I, vol. 26, pp.
in Proc. ACM, March, 1950. 147-158, 1964.
ElliW52 Elliott, W. S., H. G. Carpenter, and C. E. Owen: FaggP64 Fagg, P., J. L. Brown, J. A. Hipp, D. T. Doody,
Development of Computer Components and J. W. Fairclough, and J. Greene: IBM Sys-
Systems, Proc. ACM, Toronto Conf., September, tern1360 Engineering, AFIPS Proc. FJCC, Pt. I,
1952. VOI.26, pp. 205-231, 1964.
ElliW53 Elliott, W. S., H. G. Carpenter, and A. St. Johns-
Fa i rJ 56 Fairclough, J. W.: A Sonic Delay-line Storage
ton: The Elliott-NRDC Computer 401, A Demon-
Unit for a Digital Computer, P ~ o cIEE,
. Pt. B, vol.
stration of Computer Engineering by Packaged
103, SUP^. 3, pp. 491-496, 1956.
Unit Construction, Symp. ADC, pp. 273-276,
1953. FalkA64 Falkoff, A. D., K. E. Iverson, and E. H. Sus-
EIliW56a Elliott, W. S., C. E. Owen, C. H. Devonald, and senguth: A Formal Description of System/360,
B. G. Maudsley: The Design Philosophy of Peg- IBM Sys. J., vol. 3, no. 3, pp. 198-261, 1964.
asus, A Quantity-production Computer, Proc. FikeR68 Fikes, Richard E., Hugh C. Lauer, and Albin L.
IEE, Pt. B, VOI. 103, SUPP.2, pp. 188-196, 1956. Vareha, Jr.: Steps toward a General-purpose
ElliW56b Elliott, W. S., R. C. Robbins, and D. S. Evans: Time-sharing System Using Large Capacity Core
Remote Position Control and Indication by Digi- Storage and TSS/360, Proc. 23rd Natl. Conf. of
644 Bibliography
ACM, Lm Vegus, Nevada, pp. 7-18, August, GoldH63b Goldstine, H. H., and John von Neumann: Plan-
1968. ning and Coding Problems for an Electronic
FlynM66 Flynn, Michael J.: Very High-speed Computing Computing Instrument (Pt. II, vol. l ) , Rept.
Systems, Proc. IEEE, vol. 54, no. 12, pp. 1901- prepared for US. Army Ordnance Dept., 1947,
1909, December, 1966. in A. H. Taub (ed.), “Collected Works of John
von Neumann,” vol. 5, pp. 80-151, The Mac-
FlynM67a Flynn, M. J., and P. R. Low: The IBM Sys- millan Company, New York, 1963.
tem/360 Model 91: Some Remarks on System
Development, IBM J. of Res. and Deo., vol. 11, Gold H63c Goldstine, H. H., and John von Neumann:
no. 1, pp. 2-7, January, 1967. Planning and Coding of Problems for an Elec-
tronic Computing Instrument (Pt. II, vol. 2),
FlynM67h Flynn, Michael J., and M. Donald MacLaren: Rept. prepared for U.S. Army Ordnance Dept.,
Microprogramming Revisited, Argonne Natl. 1948, in A. H. Taub (ed.), “Collected Works of
Lah.,Appl. Math. Diu., Tech. Mem. 134, pp. 1-17, John von Neumann,” vol. 5, pp. 152-214, The
Argonne, Ill., 1967. Macmillan Company, New York, 1963.
ForgJ65 Forgie, James W.: A Time- and Memory-sharing Gold H63d Goldstine, H. H., and John von Neumann:
Executive Program for Quick Response, On-line Planning and Coding of Problems for an Elec-
Applications, AFIPS Proc. FJCC, Pt. II, vol. 27, tronic Computing Instrument (Pt. II, vol. 3),
pp. 127-139, 1965. Rept. prepared for U.S. Army Ordnance Dept.,
ForrJ51 Forrester, J. W.: Digital Information Storage in 1948, in A. H. Taub (ed.), “Collected Works of
Three Dimensions Using Magnetic Cores, J. John von Neumann,” vol. 5, pp. 215-235, The
Appl. Phys., vol. 22, pp. 44-48, January, 1951. Macmillan Company, New York, 1963.
Fot hJ6 1 Fotheringham, John: Dynamic Storage Alloca- GreeJ57 Greenstadt, J. L.: The IBM 709 Computer, New
tion in the Atlas Computer, Including an Auto- Computers, Rept. from the Manufacturers ACM
matic Use of a Backing Store, Comm. ACM, vol. Conf., pp. 92-98, 1957.
4, no. 10, pp. 435-436, October, 1961. GreeJ64 Greene, J. E., R. F. Dean, and B. M. Updike:
FranJ57 Frankovich, J. M., and H. P. Peterson: A Func- Micro-programmed Implementation of the IBM
tional Description of the Lincoln TX-2 Computer, System /360 Machine Organization, IBM General
Proc. WICC, vol. 19, pp. 146-155, February, Products Div., Development Lab., Engineering
1957. Publ., D q t . PTP792, Endicott, N.Y., April, 1964.
GrimR65c Grimsdale, R. L.: A Review of ChenT64, Com- HillJ66 Hillegass, John R.: Auerbach on Equipment IBM
puting Rev., vol. 6, no. 6, pp. 429-430, Novem- System 360-The First Two Years, Data Proc-
ber, December, 1965. essing Mag., VOI.8, no. 5, pp. 44-51, May, 1966.
GrosH53 Grosch, H. R. J.: High Speed Arithmetic: The HodgD64 Hodges, Donald: IPL-VC, A Proposal for a Com-
Digital Computer as a Research Tool, J. Optical puter System Having the IPL-V Instruction Set,
Society of America, vol. 4, no. 4, pp. 306-310, Argonne Natl. Lab., Appl. Math. Diu., Tech. M e m .
April, 1953. 66, 22 pp., January, 1964.
G rue F68 Gruenberger, F. J.: The History of the JOHN- HollJ59 Holland, John: A Universal Computer Capable
NIAC, Mem. RM-5654-PR, prepared for United of Executing an Arbitrary Number of Sub-
States Air Force Project Rand, The Rand Cor- programs Simultaneously, Proc. EJCC, pp. 108-
poration, Santa Monica, Calif., October, 1968. 113, 1959.
GrumM58 Grumette, Murray: IBM 704-Code Nundrums, HopkA63 Hopkins, A. L., R. L. Alonso, and H. Blair-Smith:
Comm. ACM, vol. 1, no. 3, pp. 3-13, March, Logical Description of the Apollo Guidance
1958. Computer (AGC 4), M.I.T. Instrumentation Lab.,
Ha inL65 Haines, L. H.: Serial Compilation and the 1401 Re@. A-393 (confidential), Cambridge, Mass.,
FORTRAN Compiler, IBM Sys. J., vol. 4, no. 1, March, 1963.
pp. 73-80, January, 1965. HowaD61 Howarth, D. J., R. B. Payne, and F. H. Sumner:
HaleA62 Haley, A. C. D.: The KDF9 Computer System, The Manchester University Atlas Operating
AFIPS Proc. FJCC, VOI.22, pp. 108-120, 1962. System, Part II: User’s Description, ComputerJ.,
vol. 4, no. 3, pp. 226-229, October, 1961.
Ham bC62 Hamblin, C. L.: Translation to and from Polish
Notation, Computer]., vol. 5, pp. 210-213, Octo- HowaD62 Howarth, D. J., P. D. Jones, and M. T. Wyld: The
ber, 1962. ATLAS Scheduling System, Computer J., vol. 5,
no. 3, pp. 238-244, October, 1962.
Ha neF68 Haney, Frederick M.: Using a Computer to De-
sign Computer Instruction Sets, Thesis for Ph.D. HowaD63 Howarth, D. J.: Experience with the Atlas
degree, Carnegie-Mellon University, College of Scheduling System, AFIPS Proc. SJCC, vol. 23,
Engineering and Science, Department of Com- pp. 59-67, 1963.
puter Science, Pittsburgh, Pa., May, 1968. HughE54 Hughes, E. S., Jr.: The IBM Magnetic Drum
HartD68 Hartley, D. F., B. Landy, and R. M. Needham: Calculator Type 650, Engineering and Design
The Structure of a Multiprogramming Super- Considerations, Proc. WJCC,pp. 140-154, 1954.
visor, Computer J., vol. 11, no. 3, pp. 247-255, lverK62 Iverson, Kenneth E.: A Common Language for
November, 1968. Hardware, Software, and Applications, AFIPS
Ha uc E68 Hauck, E. A., and B. A. Dent: Burroughs B PTOC.FJCC, VOI. 22, pp. 121-129, 1962.
6500/B 7500 Stack Mechanism, AFIPS Proc. JohnD52 Johnston, D. L.: Standardized Printed Circuit
SJCC, VOI. 32, pp. 245-251, 1968. Units for Digital Computers, Proc. ACM, Pitts-
HaueR52 Haueter, R. C.: Auxiliary Equipment to SEAC burgh Conf., pp. 135-141, May, 1952.
Input-Output, AIEE-IRE-ACM Conference, pp. KampT6O Kampe, Thomas W.: The Design of a General-
39-44, December, 1952. purpose Microprogram-controlled Computer
HellH61 Hellerman, H.: On the Organization of a Multi- with Elementary Structure, IRE Truns., vol. EC-9,
programming-Multiprocessing System, IBM no. 2, pp. 208-213, June, 1960.
Res. Rept. RC-522, 52 pp., Yorktown Hts., N.Y., KatzJ66 Katz, J. H.: Simulation of a Multiprocessor
September, 1961. Computing System, AFIPS Proc. SJCC, vol. 28,
HellH66 Hellerman, H.: Parallel Processing of Algebraic pp. 127-139, 1966.
Expressions, l E E E Trans., vOI. EC-15, no. 1, pp. K i IbT56 Kilburn, T., D. B. G. Edwards, andC. E. Thomas:
82-91, February, 1966. The Manchester University Mark II Digital Com-
HerwP6O Herwitz, Paul S., and James H. Pomerene: The puting Machine, Proc. IEE, Pt. B, vol. 103, Supp.
Harvest System, Proc. WJCC, pp. 23-32, 1960. 2, pp. 247-268, 1956.
646 Bibliography
KilbT60a Kilburn, T., and R. L. Grimsdale: A Digital Com- and Engineering Support Division, November,
puter Store with a Very Short Read Time, Proc. 1961.
IEE, Pt. B, vol. 107, pp. 567-572, November, KuckD68 Kuck, D. J.: ILLIAC I V Software and Application
1960. Programming, IEEE Trans., VOI. C-17, no. 8, pp.
K i IbT6Ob Kilburn, T., D. B. G. Edwards, and D. Aspinall: 758-770, August, 1968.
A Parallel Arithmetic Unit Using a Saturated
LampB65 Lampson, B. W.: Interactive Machine Language
Transistor Fast-Carry Circuit, Proc. IEE, Pt. B, Programming, AFIPS Proc. FJCC, Pt. I, vol. 27,
vol. 107, pp. 573-584, November, 1960. pp. 473-481, 1965.
Kilb T 6 l u Kilburn,T., D. J. Howarth, R. B. Payne, and F. H. Lampson, B. W., W. W. Lichtenberger, and
LampB66
Sumner: The Manchester University Atlas Oper- M. W. Pirtle: A User Machine in a Time-sharing
atingsystem, Part I: InternalOrganization, Com- System, Proc. IEEE, vol. 54, no. 12, pp. 1766-
puter J., VOI. 4, pp. 222-225, October, 1961. 1774, December, 1966.
Kil b T 6 l b Kilburn, T., R. B. Payne, and D. J. Howarth: The LangJ67 Langdon, J. L., and E. J. Van Derveer: Design
Atlas Supervisor, AFIPS Proc. EJCC, vol. 20, pp. of a High-speed Transistor for the ASLT Current
279-294, 1961. Switch, IBM J. of Res. and Deu., vol. 11, no. 1,
Kil bT62 Kilburn, T., D. B. G. Edwards, M. J. Lanigan, pp. 69-73, January, 1967.
and F. H. Sumner: One-level Storage System, LaueH67 Lauer, Hugh C.: Bulk Core in a 360/67 Time-
IRE Trans., vol. EC-11, no. 2, pp. 223-235, April, sharing System, AFIPS Proc. FJCC, vol. 31, pp.
1962. 601-609, 1967.
KinsH 64 Kinslow, H. A,: The Time-sharing Monitor Sys- LebeS56 Lebedev, S. A,: The High-speed Calculating Ma-
tem, AFIPS Proc. FJCC, Pt. I, vol. 26, pp. 443- chine of the Academy of Sciences of the USSR,
454, 1964. J. ACM, vOI. 3, pp. 129-133, 1956.
KistJ57 Kister, J., P. Stein, S. Ulam, W. Walden, and M.
LehmM63u Lehman, M., R. Eshed, and Z. Netter: SABRAC,
Wells: Experiments in Chess, J. ACM, vol. 4, no. A Time-sharing Low-cost Computer, Comm.
2, pp. 174-177, April, 1957.
ACM, vol. 6 , no. 8, pp. 427-429, August, 1963.
KitoA56 Kitov, A. I.: Elektronnie Tsifrovie Mashiny
LehmM63b Lehman, M., R. Eshed, and Z. Netter: SABRAC
(Electronic Digital Machines), lzdatelstvo
-A New Generation Serial Computer, IEEE
Sovetskoe Radio, Moscow, partial translation
Truns., vol. EC-12, no. 6, pp. 618-628, Decem-
available, 1956.
ber, 1963.
KleiR53 Klein, R. J., Jr.: The Oracle Memory System,
LehmM65 Lehman, M.: Serial Mode Operation and High-
Argonne Natl. Lab., Proc. Symp. o n Large Scale
Digital Computing Machines, pp. 47-58, August, speed Parallel Processing, Proc. IFIP Cong. 1965,
1953. Pt. 2, pp. 631-633, 1965.
KnigK66 Knight, Kenneth E.: Changes in Computer Per- Lehm M66 Lehman, M.: A Survey of Problems and Prelimi-
formance, Datamation, vol. 12, no. 9, pp. 40-54, nary Results Concerning Parallel Processing
September, 1966. and Parallel Processors, Proc. IEEE, vol. 54, no.
12, pp. 1889-1901, December, 1966.
KnigK68 Knight, Kenneth E.: EvolvingComputer Perform-
ance 1963-1967, Datamation, vol. 14, no. 1, pp. LeinA54 Leiner, A. L., and S. N. Alexander: System
31-35, January, 1968. Organization of the DYSEAC, Professional Group
o n Electronic Computers, Institute of Radio Engi-
KnutD66 Knuth, D. E.: Additional Comments on a Prob-
neers, vol. EC-3, no. 1, pp. 1-10, March, 1954.
lem in Concurrent Programming Control,
Comm. ACM, vol. 9, no. 5, pp. 321-322, 1966. LeinA57 Leiner, A. L., W. A. Notz, J. L. Smith, and A.
Weinberger: Organizing a Network of Computers
KrogM61 Kroger, Marlin G., et al.: Computers in Com-
to Meet Deadlines, Proc. EJCC, pp. 115-128,
mand and Control, TR61-12, prepared for
1957.
D0D:ARPA by Digital Computer Application
Study, Institute for Defense Analyses, Research LeinA58 Leiner, A. L., W. A. Notz, J. L. Smith, and A.
Bibliography 647
Weinberger: PILOT, The NBS Multicomputer formance of the Census Univac System, AIEE-
System, Proc. EJCC, pp. 71-75, 1958. IRE Conf., pp. 16-22, December, 1951.
LeinA59 Leiner, A. L., W. A. Notz, J. L. Smith, and A. MaheR61 Maher, R. J.: Problems of Storage Allocation in
Weinberger: PILOT, A New Multiple Computer a Multiprocessor Multiprogrammed System,
System, J. ACM, vol. 6, no. 3, pp. 313-335, Comm. ACM, vol. 4, no. 10, pp. 421-422, Octo-
1959. ber, 1961.
LichW65 Lichtenberger, W., and M. W. Pirtle: A Facility MarcM63 Marcotty, M. J., F. M. Longstaff, and A. P. M.
for Experimentation in Man-Machine Inter- Williams: Time-sharing on the Ferranti-Packard
action, AFIPS Proc. FJCC, Pt. I, vol. 27, pp. FP6000 Computer System, AFIPS Proc. SJCC,
589-598, 1965. VOI.23, pp. 29-40, 1963.
LindA66 Lindquist, A. B., R. R. Seeber, and L. W. Com- MeadR63 Meade, R. M.: 604 Machine Description, IBM
eau: A Time-sharing System Using an Asso- Internal Mem., 38 pp., December, 1963.
ciative Memory, Proc. IEEE, vol. 54, no. 12, pp.
MeagR51 Meagher, R. E., and J. P. Nash: The Ordvac,
1774-1779, December, 1966.
AIEE-IRE Conf., pp. 37-43, December, 1951.
LiptJ68 Liptay, J. S.: Structural Aspects of the Sys-
MelbA65 Melbourne, A. J., and J. M. Pugmire: A Small
tem/360 Model 85, II. The Cache, IBM Sys. J.,
Computer for the Direct Processing of FORTRAN
vol. 7, no. 1, pp. 15-21, 1968.
Statements, ComputerJ.,vol. 8, pp. 24-27, April,
LloyR67 Lloyd, R. H. F.: ASLT: An Extension of Hybrid 1965.
Miniaturization Techniques, IBM J. of Res. and
MendM66 Mendelson, M. J., and A. W. England: The SDS
Dev., vol. 11, no. 1, pp. 86-92, January, 1967.
SIGMA 7: A Real-time, Time-sharing Computer,
LoneW61 Lonergan, William, and Paul King: Design of the AFIPS Proc. FJCC, VOI. 29, pp. 51-64, 1966.
B 5000 System, Datamation, vol. 7, no. 5, pp.
MercR57 Mercer, Robert J.: Micro-programming, J. ACM,
28-32, May, 1961.
vol. 4, no. 2, pp. 157-171, 1957.
LonsK56 Lonsdale, K., and E. T. Warburton: Mercury: A
Merr156 Merry, I. W., and B. G. Maudsley: The Magnetic-
High Speed Digital Computer, Proc. IEE, Pt. B,
drum Store of the Computer Pegasus, Proc. IEE,
VOI. 103, SUPP.2, pp. 174-183, 1956.
Pt. B, VOI. 103, SUPP. 2, pp. 197-202, 1956.
Lou rN 59 Lourie, N., H. Schrimpf, R. Reach, and W. Kahn:
MetrN52 Metropolis, N., E. F. Klein, W. Orvedahl, J. R.
Arithmetic and Control Techniques in a Multi-
Richardson, H. B. Demuth, and J. B. Jackson:
program Computer, Proc. EJCC, pp. 75-81,
MANIAC, Proc. ACM, Toronto Conf., pp. 13-17,
1959.
September, 1952.
McCaJ62 McCarthy, J.: “Time Sharing Computer Systems
in Management and the Computer of the Fu- Mi IIW63 Miller, W. F., and R. A. Aschenbrenner. The GUS
ture,” The M.I.T. Press, Cambridge, Mass., Multicomputer System, IEEE Trans., vol. EC-12,
1962. no. 6, pp. 671-676, December, 1963.
McCaJ63 McCarthy, J., S. Boilen, E. Fredkin, and J. C. MiraW67 Miranker, W. L., and W. M. Liniger: Parallel
R. Licklider: A Time-sharing Debugging System Methods for the Numerical Integration of Ordi-
for a Small Computer, AFIPS Proc. SJCC, vol. 23, nary Differential Equations, Math. of Computa-
pp. 51-57, 1963. tion, vol. 21, no. 99, pp. 303-320, July, 1967.
McCoB63 McCormick, Bruce H.: The Illinois Pattern Rec- MolnC67 Molnar, Charles E., Severo M. Ornstein, and
ognition Computer-ILLIAC Ill, IEEE Trans., vol. Antharvedi Anne: The CHASM: A Macromodular
EC-12, no. 5, pp. 791-813, December, 1963. Computer for Analyzing Neuron Models, AFIPS
McCuJ65 McCullough, J. D., K. H. Speierman, and F. W. Proc. SJCC, vOI. 30, pp. 393-401, 1967.
Zurcher: Design for a Multiple User Multiproc-
MonnR68 Monnier, Richard E.: A New Electronic Calcula-
essing System, AFIPS Proc. FJCC, Pt. I, vol. 27,
tor with Computerlike Capabilities, Hewlett-
pp. 611-617, 1965.
Packard J., vol. 20, no. 1, pp. 3-9, September,
McPhJ51 McPherson, J. L., and S. N. Alexander: Per- 1968.
648 Bibliography
MorrD67 Morris, Derrick, Frank H. Sumner, and Michael PapiW57 Papian, W. N.: High-speed Computer Stores 2.5
T. Wyld: An Appraisal of the Atlas Supervisor, Megabits, Electronics, vol. 30, no. 10, pp. 162-
Proc. ACM Natl. Meeting, pp. 67-75, 1967. 167, October, 1957.
MuntC62 Muntz, C. A.: A List Processing Interpreter for PatzW67 Patzer, William J., and Gilbert C. Vandling: Sys-
AGC4, M.I.T., lmtrumentation Lab., AGC LMem. tems Implications of Microprogramming, Com-
2, Cambridge, Mass., January, 1962. puter Design, vol. 6, no. 12, pp. 62-66, Decem-
ber, 1967.
MurtJ66 Murtha, J. C.: Highly Parallel Information Proc-
essing Systems, in “Advances in Computers,” PeacA?? Peacock, A,: Read-only Memory and Computer
vol. 7, pp. 2-116, Academic Press, Inc., New Cont 101, to be published.
York, 1966. PennJ62 Penny, J. P., and T. Pearcey: Use of Multipro-
MyerT68 Myer, T. H., and I. E. Sutherland: On the Design gramming in the Design of a Low Cost Digital
of Display Processors, Comm. ACM, vol. 11, no. Computer, Comm. ACM, VOI. 5, no. 9, pp. 473-
6, pp. 410-414, June, 1968. 476, September, 1962.
NeweA56 Newell, A., and H. A. Simon: The Logic Theory PikeJ52 Pike, James L.: Input-Output Devices Used with
Machine, IRE Trans., vol. IT-2, no. 3, pp. 61-79, SEAC, AIEE-IRE-ACM Conf., pp. 36-38, Decem-
September, 1956. ber, 1952.
NeweA57a Newell, A., and J. C. Shaw: Programming the PlugW61 Plugge, W. R., and M. N. Perry: American Air-
Logic Theory Machine, Proc. WJCC, pp. 230- lines’ “SABRE” Electronic Reservations System,
240, February, 1957. Proc. WJCC, pp. 593-602, May, 1961.
NeweA57b Newell, A., J. C. Shaw, and H. A. Simon: Empiri- PortR6O Porter, R. E.: The RW-400-A New Polymorphic
cal Explorations of the Logic Theory Machine, Data System, Datamation, vol. 6, no. 1, pp.
Proc. WJCC, pp. 218-230, February, 1957. 8-14, January/February, 1960.
NeweA58 Newell, A., J. C. Shaw, and H. A. Simon: The RajcJ43 Rajchman, J., Snyder, and Rudnick: RCA Labo-
Elements of a Theory of Human Problem Solv- ratories Report, under terms of OSRD contract
ing, Psychology Rev., vol. 65, pp. 151-166, OEM-sr-591.
March, 1958. RandB68 Randell, B., and C. J. Kuehner: Dynamic Storage
NievJ64 Nievergelt, J.: Parallel Methods for Integrating Allocation Systems, Comm. ACM, vol. 11, no. 5,
Ordinary Differential Equations, Comm. ACM, pp. 297-306, May, 1968.
vol. 7, no. 12, pp. 731-733, December, 1964. RichR55 Richards, R. K.: “Arithmetic Operations in Digi-
NiseN66 Nisenoff, N.: Hardware for Information Process- tal Computers” D. Van Nostrand Company, Inc.,
ing Systems: Today and in the Future, Proc. Princeton, N.J., 1955.
IEEE, vol. 54, no. 12, pp. 1820-1835, Decem- RobeJ58 Robertson, J. E.: A New Class of Digital Division
ber, 1966. Methods, IRE Trans., vol. EC-7, no. 3, pp. 218-
OsboT68 Osborne, Thomas E.: Hardware Design of the 222, September, 1958.
Model 9100A Calculator, Hewlett-Packard J., vol. RobeL67 Roberts, Lawrence G.: Multiple Computer Net-
20, no. 1, pp. 10-13, September, 1968. works and lntercomputer Communication, ACM
OssaJ65 Ossanna, J. F., L. E. Mikus, and S. D. Dunten: Symp. on Operating System Principles, Gatlinburg,
Communications and Input-OutputSwitching in Tenn., Oct. 1-4, 1967.
a Multiplex Computing System, AF l P S Proc. RoseG67 Rose, Gordon A.: “lntergraphic,” A Micropro-
FJCC, Pt. I, VOI. 27, pp. 231-241, 1965. grammed Graphical-Interface Computer, IEEE
PadeA64 Padegs, A.: Channel Design Considerations, Trans., vol. EC-16, no. 6, pp. 773-784, Decem-
IBM Sys. J., vol. 3, no. 2, pp. 165-180, 1964. ber, 1967.
PadeA68 Padegs, A.: Structural Aspects of the Sys-
‘According to E. F. Codd, this article has not been published as of Jan. 23,
tem/360 Model 85, Ill. Extensions to Float- 1968. However, “Microprogram Control for System/360” by S. G. Tucker, IBM
ing-point Architecture, IBM Sys. J., vol. 7, no. Sys. J., vol. 6, no. 4, 1967, has and covers the material that we think was
1, pp. 22-29, 1968. intended to be in PeacA??.
Bibliography 649
RoseJ65 Rosenfeld, J.: Marbles and Boxes, IBM Res. nization for Array Processing, AFIPS Proc. FJCC,
Project Rept., Yorktown Hts., N.Y., November, Pt. I, VOI. 27, pp. 117-128, 1965.
1965. SerrR62 Serrell, R., M. M. Astrahan,G. W. Patterson, and
Roses67 Rosen, Saul: “Programming Systems and Lan- I. B. Pyne: The Evolution of Computing Ma-
guages,” McGraw-Hill Book Company, New chines and Systems, Proc. IRE, vol. 50, no. 5,
York, 1967. pp. 1039-1058, May, 1962.
Roses69 Rosen, Saul: Electronic Computers: A Historical ShanC38 Shannon, E. C.: A Symbolic Analysis of Relay
Survey, Computing Suroeys, VOI. 1, no. 1, pp. and Switching Circuits, Trans. AIEE, vol. 57, pp.
7-36, March, 1969. 713-723, 1938.
RosiR69 Rosin, Robert F.: Contemporary Concepts of SharW69 Sharpe, William F.: “The Economics of Com-
Microprogramming and Emulation, Computing puters,” Columbia University Press, New York,
Surveys, vol. 1, no. 4, pp. 197-212, December, 1969.
1969. ShawJ58 Shaw, J. C., A. Newell, H. A. Simon, and T. 0.
RossH53 Ross, Harold D., Jr.: The Arithmetic Element of Ellis: A Command Structure for Complex Infor-
the IBM Type 701 Computer, Proc. IRE, vol. 41, mation Processing, Proc. WJCC, pp. 119-128,
no. 10, pp. 1287-1294, October, 1953. 1958.
RothS59 Rothman, S.: R/W 40 Data Processing System, ShedG66a Shedler, G. S., and M. Lehman: Parallel Compu-
Intern. Conf. on Information Processing and tation and the Solution of Polynomial Equa-
Auto-math 1959, Ramo-Wooldridge, Div. of tions, IBM Res. Rept. 1550, Yorktown Hts., N.Y.,
Thompson Ram0 Wooldridge, Inc., Los Angeles, February, 1966.
Calif., June, 1959.
ShedG66h Shedler, G. S.: Parallel Numerical Methods for
SaltJ66 Saltzer, J. H.: Traffic Control in a Multiplexed the Solution of Equations, IBM Res. Rept. RC
Computer System, M.I.T. Tech. Rept. MAC-TR-30, 1619, Yorktown Hts., N.Y., June, 1966.
July, 1966.
ShupP53 Shupe, P. D., and R. A. Kirsch: SEAC, Review
SamuA57 Samuel, Arthur L.: Computers with European of Three Years of Operation, Proc. EJCC, pp.
Accents, Proc. WJCC, pp. 14-17, 1957. 83-90, 1953.
SaxoJ63 Saxon, J. A,: “Programming the IBM 7090,” SlotD62 Slotnick, Daniel L., W. Carl Borck, and Robert
Prentice-Hall, Inc., Englewood Cliffs, N.J., 1963. C. McReynolds: The SOLOMON Computer,
Sch IH?? Schlaeppi, H. P.: Extensions of PL/I-like Lan- AFIPS Proc. FJCC, VOI.22, pp. 97-107, 1962.
guages for Parallel Processing, with Program- SlutR51 Slutz, Ralph J.: Engineering Experience with the
ming Examples, in preparation. SEAC, AIEE-IRE Cor$., pp. 90-94, December,
1951.
Sc hWJ6 4 Schwartz, J. I.: A General-purpose Time-sharing
System, AFIPS Proc. SJCC, vol. 25, pp. 397-411, SmitR64 Smith, R. V., and D. N. Senzig: Computer Orga-
1964. nization for Array Processing, IBM Res. Rept. RC
1330, Yorktown Hts., N.Y., December, 1964.
SechR67 Sechler, R. F., A. R. Strube, and J. R. Turnbull:
SoloM66 Solomon, Martin B., Jr.: Economies of Scale and
ASLT Circuit Design, I B M J . ofRes. and Deo., vol.
the IBM System/360, Comm. ACM, vol. 9, no.
11, no. 1, pp. 74-85, January, 1967.
6, pp. 435-440, June, 1966.
SeebR63 Seeber, R. R., and A. B. Lindquist: Associative Squ iJ63 Squire, J. S., and S. M. Polais: Programming
Logic for Highly Parallel Systems, AFIPS Proc. and Design Considerations of a Highly Parallel
FJCC, VOI. 24, pp. 489-493, 1963. Computer, AFIPS Proc. SJCC, vol. 23, pp. 395-
SegaR61 SegaI, R. J., and H. P. Guerber: Four Advanced 400, 1963.
Computers-Key to Air Force Digital Data Com- SteeT61 Steel, T. B., Jr.: A First Version of UNCOL, Proc.
munication System, AFPS Proc. EJCC, vol. 20, WJCC, pp. 371-377, 1961.
pp. 264-278, 1961. StevL52 Stevens, L. D.: Engineering Organization of
SenzD65 Senzig, D. N., and R. V. Smith: Computer Orga- Input and Output for the IBM 701 Electronic
650 Bibliography
Tayl N5 1 Taylor, Norman H.: Evaluation of the Engineer- Wa reW63 b Ware, W. H.: “Digital Computer Technology and
ing Aspects of Whirlwind I, AIEE-IRE Conf., pp. Design,” vol. 2, “Circuits and Machine Design,”
75-78, December, 1951. John Wiley & Sons, Inc., New York, 1963.
TeagH65 Teager, Herbert M.: A Review of AmdaG64a; WebeH67 Weber, Helmut: A Microprogrammed Implemen-
Computing Rev., vol. 6, no. 5, pp. 355-356, tation of EULER on IBM System/360 Model 30,
September-October, 1965. Comm. ACM, VOI. 10, no. 9, pp. 549-558, Sep-
tember, 1967.
ThomR63 Thompson, R. N., and J. A. Wilkinson: The D825
Automatic Operating and Scheduling Program, Wei kM55 Weik, M. H.: A Survey of Domestic Electronic
AFIPS PTOC.SJCC, VOI. 23, pp. 41-49, 1963. Digital Computing Systems, Ballistic Research
Laboratories, Aberdeen, Md., Rept. 971, Decem-
ThorJ64 Thornton, James E.: Parallel Operation in the ber, 1955.
Control Data 6600, AFIPS Proc. FJCC, Pt. II, vol.
26, pp. 33-40, 1964. WeikM61 Weik, Martin H.: A Third Survey of Domestic
Electronic Digital Computing Systems, Ballistic
TomaR67 Tomasulo, R. M.: An Efficient Algorithm for Research Laboratories, Aberdeen, Md.; report
Exploiting Multiple Arithmetic Units, IBM 1. of supersedes B R L Rept. 1010, Department of the
Res. and Deu., vol. 11, no. 1, pp. 25-33, January, Army Project No. 5B03-06-002 (1961).
1967.
WeikM64 Weik, Martin H., Jr.: A Fourth Survey of Do-
Tucks67 Tucker, S. G.: Microprogram Control for Sys- mestic Electronic Digital Computer Systems,
tem/360, IBM Sys. J., vol. 6, no. 4, pp. 222-241, Ballistic Research Laboratories, Aberdeen, Md.,
1967. Rept. 1227; processed by Defense Documenta-
tion Agency, Defense Supply Agency No. 42900,
Tu riS59 Turing, Sara: “Alan M. Turing,” W. Heffer and
January, 1964.
Sons, Ltd., Cambridge, England, 1959.
WestG6O West, George P., and Ralph J. Koerner: Com-
UngeS58 Unger, S. H.: A Computer Oriented toward Spa-
munications within a Polymorphic lntellectronic
tial Problems, Proc. IRE, vol. 46, no. 10, pp.
System, Proc. WICC, pp. 225-230, 1960.
1744-1750, October, 1958.
WilkJ53 Wilkinson, J. H.: “The Pilot ACE,” pp. 5-14,
VandW52 Van der Poel, W. L.: A Simple Electronic Digital Automatic Digital Computation, National Physi-
Computer, Appl. Sci. Res., Sec. B, vol. 2 , pp. cal Laboratory, Teddington, England, March
367-400, 1952. 25-28, 1953.
Va n d W 56 Van der Poel, W. L.: The Logical Principles of WilkM5la Wilkes, M. V.: The Best Way to Design An Auto-
Some Simple Computers, Thesis, Amsterdam, mat ic Ca Iculat ing Machine, Manchester Univer-
1956. sity Computer Inaugural Conf., July, 1951. Pub-
Va ndW 59 Van der Poel, W. L.: ZEBRA, A Simple Binary lished by Ferranti Ltd., London.
Computer, Proc. K I P , UNESCO, pp. 361-365, Wil kM5 1b Wilkes, M. V.: The Edsac Computer, AIEE-IRE
June, 1959. Conf.., pp. 79-83, December, 1951.
Bibliography 651
WilkM52 Wilkes, M. V., D. J. Wheeler, and S. Gill: “The Operating Experience, Proc. EJCC, pp. 91-95,
Preparation of Programs for a Digital Compu- 1953.
ter,” Addison-Wesley Publishing Company, Inc., WillF49 Williams, F. C., and T. Kilburn: A Storage System
Reading, Mass., 1952. for Use with Binary-Digital Computing Ma-
WilkM53 Wilkes, M. V., and J. B. Stringer: Micro- chines, Proc. IEE, Pt. 3, vol. 96, pp. 81-100,
programming and the Design of the Control March, 1949. Same paper in Pt. 2, vol. 96, pp.
Circuits in an Electronic Digital Computer, Proc. 183-202, April, 1949.
Cambridge Phil. soc., Pt. 2, VOI. 49, pp. 230-238, WirsJ66 Wirsching, Joseph E.: NOVA: A List-oriented
April, 1953. Computer, Datamation, vol. 12, no. 12, pp.
WilkM58a Wilkes, M. V., W. Renwick, and D. J. Wheeler: 41-43, December, 1966.
The Design of the Control Unit of an Electronic WirtN66a Wirth, N., and H. Weber: EULER: A Generaliza-
Digital Computer, Proc. IEE, Pt. B, vol. 105, pp. tion of ALGOL, and Its Formal Definition: Part
121-128, March, 1958. I, Comm. ACM, vol. 9, no. 1, pp. 13-25, Janu-
ary, 1966.
WilkM58b Wilkes, M. V.: Microprogramming, Proc. EJCC,
pp. 18-20, 1958. WirtN66b Wirth, N., and H. Weber: EULER: A Generaliza-
tion of ALGOL, and Its Formal Definition: Part
WilkM65 Wilkes, M. V.: Slave Memories and Dynamic II, Comm. ACM, vol. 9, no. 2, pp. 89-99, Febru-
Storage Allocation, IEEE Trans., vol. EC-14, no. ary, 1966.
2, pp. 270-271, 1965.
W irtN66c Wirth, N.: A Note on “Program Structures” for
WilkM69 Wilkes, M. V.: The Growth of Interest in Micro- Parallel Processing, Comm. ACM, vol. 9, no. 5,
programming: A Literature Survey, Computing pp. 320-321, May, 1966.
Sur~eys,vol. 1, no. 3, pp. 139-145, September,
ZadeL63 Zadeh, Lotfi A., and Charles A. Desoer: “Linear
1969.
System Theory,” McGraw-Hill Book Company,
WillC53 Williams, Charles R.: A Review of ORDVAC New York, 1963.
Name Index
Adams, Charles W., 42, 585 Burdette, E. W., 119 Edwards, D. B. G., 276-290
Adams Associates, 42, 257, 580 Burks, Arthur W., 86-119 Elbourne, R. D., 172, 212
Ainsworth, Ernest, 212 Bussell, B., 469 Elliott, W. S., 171-183
Alexander, S. N., 165, 212 Ellis, T. O., 257, 349-362
Allard, R. W., 496 England, A. W., 396
Allen, M. W., 469 Campbell, Robert V. D., 42 England, W. A., 149
Allmark, R. H., 257, 262-266 Carlson, C. B., 257, 273 Ernst, H. A., 469
Alonso, R. L., 146-156 Carpenter, H. G., 171 Eshed, R., 469
Amdahl, Gene M., 259, 469, 561 Carr, J. W., 111, 205-215, 220-224 Estrin, Gerald, 119, 469
Anderson, D. W., 587 Carter, W. C., 387 Evans, D. S., 171
Anderson, James P., 257, 348, 447-455, 469, Casale, Charles T., 69, 155, 156, 396 Everett, R. R., 137-145, 504
586 Chase, George C., 42 Ewing, R. G., 469
Anderson, S. F., 587 Chen, E. C. Y., 274
Ann& Antharvedi, 73 Chen, T. C., 587
Arbuckle, R. A., 50 Chu, J. C., 119, 396 Fagen, R. E., 496
Arbuckle, T., 349 Clark, Wesley A., 274 F a g , P., 385
Arden, B. W., 81, 275, 469, 566, 571 Clayton, B. B., 496 Fairclough, J. W., 171, 174, 176, 385
Aschenbrenner, R. A,, 469 Cochran, David S., 243-256, 439 Falkoff, A. D., 13, 458, 587
Aspinall, D., 277 Codd, E. F., 397, 439, 469 Fikes, Richard E., 571
Astrahan, M. M., 42, 119, 144, 212, 223, 515 Comeau, L. W., 587 Flynn, Michael J., 83, 340, 587
Comfort, W. T., 291, 469 Forgie, James W., 291, 469
Conti, Carl J., 563, 574 Forrester, J. W., 75
Babbage, Charles, 46 Conway, Melvin E., 295, 457 Fotheringham, John, 190
Backus, John, 9 Corbato, Fernando J., 295, 457, 469, 517, 523, Frankovich, J. M., 469
Baldwin, F. R., 46 57 1 Fredkin, E., 291
Baldwin, R. R., 469 Couleur, J., 469 Fried, 45
Barnes, George H., 320-333 Cox, Jerome R., Jr., 50 Frizzell, Clarence E., 525
Bartlett, K. A,, 504 Crawford, P., 111
Barton, R. S., 257, 273 Cray, Seymour, 471
Bashkow, Theodore R., 363-381 Critchlow, A. J,, 469 Galler, B. A., 81, 275, 469, 566, 571
Hasilewskii, Iu. Ia., 213 Culler, Glen, 45 Gibson, C. T., 81, 587
Beckman, F. S., 146 Gibson, D. H., 574
Belsky, M. A,, 349 Gibson, W. B., 469
Benington, H. D., 504 Daley, Robert C., 275, 297, 469, 517, 523, 571 Gill, S., 456
Bernstein, A,, 349 Darringer, John A,, 13 Glaser, E. L., 469
Bhushan, A,, 507 Davies, D. W., 504 Goldschmidt, R. E., 587
Bibb, J., 469 Davies, P. M., 469 Goldstine, Herman H., 87-119
Blaauw, G. A,, 259, 426, 428, 464, 561, Davis, G. M., 257 Grabhe, E. M., 205-215, 220-224
588-601 Dean, R. F., 340, 587 Graham, R. M., 469
Blair-Smith, H., 146-156 Demuth, H. B,, 119 Granito, G. D., 587
Bloch, Erich, 421-439 Dennis, Jack B., 81, 275, 295, 457, 469 Green, A,, 156
Blosk, R. T., 439 Dent, B. A,, 257 Green, J., 392
Bock, R. V., 257 Desmonde, W. H., 456 Greene, J., 340, 587
Boilen, S., 291 Desoer, Charles A,, 7 Greenstadt, J. L., 525
Boland, L. J., 587 Devonald, C. H., 171-183 Greenwald, Sidney, 212
Borck, W. Carl, 320, 463 Dijkstra, E. W., 469 Gregory, J. G., 315, 463
Bouchon, Falcon, Jacques, 46 Doody, D. T., 385 Grimsdale, R. L., 277, 587
Boutwell, E., Jr., 334 Dorff, E. K., 496 Grosch, H. R. J., 585
Bowden, B. V., 42 Dreyfus, P., 456 Gruenberger, F. J., 89, 119
Aright, I i . S., 291, 456 Dunten, S. D., 469 Grumette, Murray, 525
Brooker, R. A,, 279 Dunwell, S. W., 421 Guerher, H. P., 509
Brooks, F. P., Jr., 146, 259, 349, 423, 428, 464,
561, 588-601
Brown, J. L., 385 Earle, 5. G., 587 Haines, L. H., 392
Brown, Richard M., 320-333 Eccles, W. H., 46 Haley, A. C. D., 266
Buchholz, Werner, 396, 421, 428, 469, 515 Eckert, J. Presper, Jr., 91, 157-169, 396 Haley, G., 274
65:
654 Name index
Hamblin, C. L., 257 Lebedev, S. A,, 213 Neumann, P. G., 297, 349
Haney, Frederick M., 9 Lehman, M., 393, 446, 456-469 h’ewell, A., 257, 349-362
Hartley, D. F., 290 Leibniz, Gottfried Wilhelm, 46 Nievergelt, J., 463
Hauck, E. A,, 257 Leiner, A. L., 212, 440-445, 449, 456 Nisenofl, N., 42
Haueter, R. C., 212 Lichtenberger, W. W., 291-300 Notz, W. A,, 440-445, 449, 4-56
Hayata, Tomo, 344 Licklider, J. C. R., 291
Hellerman, H., 469 Lindquist, A. B., 469, 587
Herwitz, Paul S., 397 Liniger, W. M., 463
Hillegass, John R., 587 O’Brien, T. C., 81, 275, 469, 566, 571
Liptay, J. S., 587
Hipp, J. A,, 385 Lloyd, R. H. F., 587 Oleksiak, R., 156
Hodges, Donald, 257 Oliver, G., 469
Lonergan, William, 257, 267-273
Hoffman, Samuel A., 257, 447-455, 469 Longstaff, F. M., 469 Ornstein, Severo M., 73
Holland, John, 315, 320 Lonsdale, K., 279 Orvedahl, W., 119
Hollerith, H., 46 Laurie, N., 469 Osborne, Thomas E., 243-256
Hopkins, A. L., 146-156, ,349 Low, P. R., 587 Ossanna, J. F., 469
Hoskinson, E. A,, 334 Lowry, E. S., 397 Owen, C. E., 171-183
Howarth, D. J,, 274 Lucking, J. R., 257, 262-266
Hughes, E. S., Jr., 223 Lukasiewicz, J., 270
Huskey, H. D., 191, 193 Padegs, .4., 587
Papian, W. N., 270
Parnes, David L., 13
Iverson, Kenneth E., 13, 587 McCarthy, J., 291, 469 Pascal, Rlaise, 46
McCormick, Bruce H., 315
Patterson, G. W., 42, 119, 144, 212, 223
McCullough, J. D., 291, 456, 469
Patzer, William J., 340
Jackson, J. B., 119 McDonough, E., 397
Payne, R. B., 274
Jacquard, Joseph Marie, 46 MacLaren, M. Donald, 587
Peacock, A., 604
Johnston, A. St., 171 McPherson, J. L., 165, 169
Pearcey, T., 469
Johnston, D. L., 171 McReynolds, Robert C., 315, 320, 463
Penny, J. P., 469
Jones, P. D., 290 Maher, R. J., 273
Perry, M. N., 504
Jordan F. W., 46 Marcotte, A. U., 587
Peterson, H. P., 469
Marcotty, M. J., 469
Pike, James L., 212
Mauchly, John W., 91 Pirtle, M. W., 291-300
Kahn, W., 469 Maudsley, B. G., 171-183
Pitkowsky, S. H., 574
Kampe, Thomas W., 71, 334, ,341-347 Mauer, H., 156
Plugge, W. R., 504
Kato, Maso, 320-333 Meade, R. M., 469
Polais, S. M., 469
Katz, J. H., 463 Meagher, R. E., 119
Melbourne, A. J., 392 Poland, C. B., 469
Kepler, Johannes, 46 Pomerene, James H., 397
Kilburn, T., 75, 274-290 Mendelson, M. J., 396
Porter, R. E., 449, 477-488
King, Paul, 257, 267-273 Mercer, Robert J., 340
Powers, D. M., 587
Kinslow, H. A,, 469 Merry, I. W., 171, 176
Preiss, R. J., 587
Kirsch, R. A,, 212 Merwin-Daggett, Marjorie, 469, 517, 523,
571 Pugmire, J. M., 392
Kister, J., 349 Messina, B. U., 587 Pyne, I. B., 42, 119, 144, 212, 223
Kitov, A. I., 213 Metropolis, N., 119
Klein, E. F., 119 Mikus, L. E., 469
Klein, R. J., Jr., 119 Miller, W. F., 469
Miranker, W. L., 463 Rajchman, J., 111
Knight, Kenneth E., 50-51 Ramo, S., 205-215, 220-224
Knuth, D. E., 469 Mitchell, Herbert F., 157-169
Molnar, Charles E., 73 Randell, B., 77, 274
Koerner, Ralph J.. 485 Reach, R., 469
Kroger, Marlin G., 448 Monnier, Richard E., 243-256
Montgomery, 11. C., 587 Reinheimer, H. J., 587
Kronfeld, Arnold, 363-381 Renwick, W., 346
Kuck, David J., 320-333 Morris, Derrick, 274
Mueller, 46 Richards, R. K., 146, 150
Kuehner, C. J., 77, 274 Richardson, J. R., 119
Muntz, C. A., 155, 156
Murtha, J. C., 320 Robbins, R. C., 171
Myer, T. H., 303 Roberts, Lawrence G., 45, 504
Lampson, B. W., 291-300 Roberts, M. De V., 349
Landy, B., 290 Robertson, J. E., 431
Langdon, J. L., 581 Rochester, N., 515
Lanigan, M. J., 276-290 Nash, J. P., 119 Rose, Gordon A., 304, 469
Laning, J. H., Jr., 146-156 Naur, Peter, 9 Rosen, Saul, 3, 42
Lauer, Hugh C., 571 Needham, R. M., 290 Rosenfeld, J., 468
Lawless, W. J., Jr., 146 Netter, Z., 469 Rosin, Robert F., 340, 649
Name index 655
Ross, Harold D., Jr., 525 Stein, P., 349 van Neumann, John, 86-11>]
Rothman, S., 470, 485 Stevens, L. D., 525 Vyssotsky, V. A,, 295, 457, 469
Rudnick, 111, 119 Stevens, W. Y., 563, 587, 602-606
Stokes, Richard A,, 320-333
Stotz, R. H., 507 Walden, W., 349
Saltzer, J. H., 295 Strachey, C., 469 Walendziewicz, E. T., 148, 156
Samuel, Arthur L., 42, 119, 144, 257 Stringer, J. B., 200, 335-340, 344 Warbiirton, E. T., 279
Sanderson, J. G., 469 Strube, A. R., 587 Ward, J. E., 507
Sasson, Azra, 363-381 Sumner, Frank H., 274-290 Ware, W. H., 650
Saxon, J. A,, 525 Sussenguth, E. H., 13, 587 Weber, Helmut, 257, 340, 348, 382-392, 469,
Scalzi, C. A,, 397 Sutherland, I. E., 303 587
Scantlebury, R. A,, 504 Weik, Martin H., Jr.. 42
Schickhardt, Wilhelm, 46 Weinberger, A,, 440-445, 449, 456
Schlaeppi, €I. P., 457, 463 Weiner, James R., 157-169
Schniitt, W. J., 396 Tanb, A. H., 92 Wells, M., 349
Schrimpf, H., 469 Taylor, Norman H., 144 Welsh, H. Frazer, 157-169
Teager, Herbert M., 587 West, George P., 485
Schwartz, J. I., 291
Scott, N. R., 209 Thomas, C. E., 279 Westervelt, F. H., 81, 275, 469, 566, 571
Sechler, R. F., 587 Thomas, L. X., 46 Wheeler, D. J.. 346
Thompson, R. N., 455 Wilkes, M. V., 84, 139, 200, 214, 334-340,
Seeber, R. R., 469, 587
Thornton, James E., 489-503 344, 345, 396, 574
Segal, R. J., 500
Tomasulo, R. M., 587 Wilkinson, J. A,, 455
Senzig, 1). N., 463, 469
Serrell, R., 42, 119, 144, 212, 223 Tonik, A. B., 396 Wilkinson, J. H., 193-199
Shannon, E. C., 46, 649 Tucker, S. G., 340 Wilkinson, P. T., 504
Sharpe, William F., 585 Turing, Alan M., 23, 191, 193 Williams, A. P. M., 469
Shaw, J. C., 257, 349-362 Turing, Sara, 191, 199 Williams, Charles R., 119
Shedler, G. S., 463 Turn, R., 469 Williams, F. C., 75
Shifman, Joseph, 257, 447455, 469 Turnbull, J. R., 587 Williams, Robert J., 257, 447-455, 469
Shupe, P. D., 212 Winching, Joseph E., 316-319
Simon, H. A,, 257, 349-362 Wirth, N., 257, 348, 383, 389, 392, 469
Slotnick, Daniel L., 315, 320-333, 463 Ulam, S., 349 Witt, R. P., 172, 212
Slutz, Ralph J., 210 Unger, S. H., 320 Wolf, K. A., 496
Smith, J. L., 440-445, 449, 456 Updike, B. M., 340, 587 Wooldridge, D. E., 205-215, 220-224
Smith, J. W., 587 Wright, M. V., 349
Smith, R. V., 463, 469 Wyld, Michael T., 274
Snyder, 111, 119
Solomon, Martin B., Jr., 561 Van der Poel, W. L., 200-204
Sparacio, F. J., 587 Van Derveer, E. J., 587 Zadeh, Lotfi A,, 7
Speierman, K. H., 291, 456, 469 Vandling, Gilbert C., 340 Zemlin, R. A,, 496
Squire, J. S., 469 Van Horn, E. C., 295, 457 Zraket, C. A,, 504
Steel, T. B., Jr., 8 Vareha, Albin L., Jr., 571 Zurcher, F. W., 291, 456, 469
Machine and Organization Index
Page references in boldface refer to the Appendix, ISP descriptions, and PMS diagrams.
Aberdeen Proving Grounds (see EDVAC; B 160, 170, 180, 250, 260, 263, 270, 273, 280, CDC 1604, 44, 89
ENIAC; IAS) 283, and 300 (Burroughs), 43-44 CDC 1700, 44
ACE (NPL/National Physical Laboratory), 39, H 2500, 2501, and 3500 (Burroughs), 43 CDC 3400, 3600, and 3800, 43-44, 348, 396
43, 44, 74, 190, 193-199, 216 B 5000 (Burroughs), 43, 44, 79, 81, 257-261, CDC 6400,6416, 6500, 6600, 6700, and 7600,
introduction, 193 267-273 43-45, 47, 71, 76, 79, 83, 120, 170, 397,
ISP, 193-199 design, 267 470-476, 489-503
PMS, 191, 193, 198 ISP, 268-273 circuits, 41-14-495
T(io), 197-199 operating system, 267-268 history, 470, 489
ADU/Accumulation and Distribution Unit PMS, 258-260, 268 ISP, 472, 491-49-3, 497-503
(see ComLogNet) B 5500 (Burroughs), 43-45 operating system, 472-475
AEC/Atomic Energy Commission, 396 B 6500 and B 7 k 0 (Burroughs), 43, 45, packaging, 494496
AGC/Apollo Guidance Computer (M.I.T. 257-261, 325, 328 performance, 470-471
Instrumentation Laboratory), 44, 89, B 8500 and B 8501 (Burroughs), 43-44, 64, 257 PMS, 470, 471-475, 476, 489-494
146- 156 Babbage’s Analytic Engine, 42, 46, 53 RT, 491-494
D(arithmetic), 150-152 Babbage’s Difference Engine, 46 CDC 8090 and 8092 (see CDC 160, A, G)
design and construction, 148 Baldwin Calculator, 46 CDP/Communications Data Processor (see
interpreter, 147-148 BASIC (Dartmouth College), 45, 236 ComLogNet)
introduction, 146 Bell System, 303 C.E.C.E., 39
ISP, 152-155 Bell Telephone Laboratory computers, 39, Census, Bureau of, 157, 164-165
PMS, 146-148 42-43, 45-46 CG24 (Lincoln Laboratory), 43
Air Force, 137 Bendix 3 CDC (see under CDC; G-15; 6-20) Chasm special pulpose computer, 73
ALGOL language, 13, 45, 73, 257, 267, 348 BESK, 39, 89 COBOL 60 and 61 language, 45
ALPAK language, 45 BESM, 213 Columbia University Calculator, 46
ALWAC IIIE, 11, 44 BINAC (Eckert-Manchly), 43, 91, 163 COMIT language, 33, 45
AMBIT Zanguage, 45 BIT 480 (Business Information Technology), ComLogNet, 45, 509-510
AN/FSQ-27 (see RW-40 and 400) 44 CORC language, 45
AN/GYK-3(V) (see D825 and D830) Bitran 6 (Fabri-Tek), 44 CPC/Card Programmed Calculator (IBM), 43,
AN/UYK (RW =1 TRW), 71 BIZMAC I, I1 (RCA), 39-43 88
AOSP/Automatic Operating and Scheduling BTL MACRO language, 45 CSIRAC, 89
program (see D825, operating system) BTSS/Berkeley Time Sharing System Culler-Fried on line language, 45
APEXC, 39 (University of California, Berkeley), 44,
APL/A Programming Language, 13, 45 45, 274-275, 291-300
Apollo (see AGC) input-output, 297-300 D825 and D830 (Burroughs), 44, 45, 257-260,
Argonne Laboratory, 257 introduction, 291-292 446-455
Arithmometer (L. X. Thomas), 46 ISP, 291-297 design philosophy, 447-450
ARPA/Advanced Research Projects Agency, M(files), 297-300 input-output, 454-455
291-300, 315 multiprogramming, 291-295 ISP, 45.3
network, 510-512 operating system, 292-300 operating system, 450-455
Arrow (see Strela) PMS, 275, 292 PMS, 260, 450-455
AS1 6000 (EMR), 44 T(io), 297 DASK, 89
Atlas (Manchester University, Ferranti), 43-45, Burroughs (see B 2500; B 5000; B 55(X); Datamatic 1000 (Honeywell), 39, 43
82, 91, 274-290 B 6500; B 8500; D825; Datatron 204, 205, DATANET 30 (GE), 43
input-output, 274-283, 285-289 and 220; E 101, 102, and 103; ILLIAC Datatron 204, 205, and 220 (Burroughs), 39,
interrupt, 274, 276-277 IV) 43, 44
introduction, 276 IIDP-IS (Honeywell), 43
ISP, 276-279, 283-285 DIIP-24, 224, and 124 (Honeywell), 43-44
M(core), 280-283, 289-290 California, University of, Berkeley (see BTSS) DDP-116, 316, 416, and 516 (Honeywell),
multiprogramming, 274-283 Carnegie-Mellon University, 120, 571 43-44, 512
operating system, 279, 285-287 CDC/Control Data Corporation (see (2-15; DEC/Digital Equipment Corporation (see
PMS, 277, 279-283, 289-290 G-20) PDP- 1)
RT, 287-289 CDC 160, A, 6, 43, 44, 120 DEC 338, 260, 303-314, 396
ATLAS-1 and 2 (Ferranti), 43 CDC 924, 3100, 3200, 3300, and 3500, 43-44, interpreter, 305
AVIDAC, 39-89 79 introduction, 305
56
Machine and organization index 657
DEC 338, ISP, 305-309, 310-314 FORTRAN Machine, 44, 348, 363-381 IBM 702, 39, 43, 47, 87
PMS, 121 interpreter, 366-379 IBM 705, 705 III, 708, and 7080, 39, 43-44,
(See a k o PDP-8) introduction, 363-364 47, 87, 433
Deuce (English Electric), 39, 43-45, 191 ISP, 363-365 IBM 1130 (see IBM 1800)
(See also ACE) logical design, 365-381 IBM 1401, 1440, and 1460, 43-45, 47, 61, 188,
DMI/Data Machine Inc. Varian Associates, PMS, 365-366 224-234, 562-,564
44 RT, 364-368, 375-381 history, 225
DMI 520/I (Varian), 44 FX-1 (Lincoln Laboratory), 43-45 interpreter, 229
DM1 620 (Varian), 44 introduction, 225-226
Dutch Postal and Telecommunications ISP, 226-229, 231-234
Services, 200 PMS, 226
Dynamo language, 45 6-15 (Bendix 3 CDC), 39, 43-44, 74, 191 RT, 229-230
DYSEAC (National Bureau of Standards), 39, 6-20 (Bendix =) CDC), 44, 57, 152 IBM 1410 and 7010, 43, 44
43, 172, 440 Gamma 60 (Machines Bull), 44, 456 IBM 1620, 111, and 1710, 43-44, 225
GARDE 312 (GE), 43 IBM 1800 and 1130, 43-45, 48, 55, 90, 396,
GE lOO/ERMA, 43 399-420, 470, 575-576, 579, 583-586
E 101, 102, and 103 (Burroughs), 43, 44 GE 115, 43 input-output, 405, 409-411
EAI/Electronic Associates Inc., 44 GE 205, 210, 215, 225, 235, 255, and 265, interpreter, 408-409
EA1 640, 44 43-44 introduction, 3%-400
Eccles-Jordan Flip-Flop, 46 GE 412, 435, 43-44 ISP, 407416, 417-420
Eckert-Mauchly Computer GE 635, 625, 43 PMS, 400-405, 404
Corporation * UNIVAC, 91 GE 645 (General Electric), 43, 45, 79, 275 RT, 405-409, 411-413
EDSAC I and I1 (Cambridge University), 39, GE 4040, 4050, 4060, 4020, and 4050 11, 43 IBM 2938, 45, 72
42-45, 58, 89, 139, 144, 196, 398 General Automation (see SPC-8) IBM 7030 (see Stretch)
EDVAC/Electronic Discrete Variable General Precision CDC (see LGP-30) IBM 7070, 7072, 7074, 43, 44
Automatic Computer (University of Genie prooject (see BTSS) IHM 7094 I, 11, 7044, 7040, 7090, 709, and
Pennsylvania) 39, 42-45, 95 George (University of New South Wales), 257 704, 30-32, 39, 43-45, 47, 54, 64,70, 79,
Eight-bit character computer, 170, 184-187, Gott Sei Danke, 346 91, 149, 303, 306, 422, 433, 515-541,
224 GPS language, 45 562-564
introduction, 184 history, 515-517
ISP, 184, 185, 186-187 interpreter, 522-523
EMR 6130, 44 H-200 series: 110, 120, 125, 200, 400, 1200, ISP, 523, 526-541
English Electric = ICT/International 1250, 2200, 3200, 4200, and 8200 multiprogramming, 523
Computers and Tabulators (see KDF 9) (Honeywell), 43, 44, 58, 225 P(io), 524-525
ENIAC/Electronic Numerical Integrator and H-1400 and 1800 (Honeywell), 43 PMS, 517-519
Computer (University of Pennsylvania), Harvard (see Marks) RT, 520-522
39, 42-43, 45-47, 88, 113 Hollerith Punched Cards, 46 IBM Multiplying Calculator, 46
ERA/Engineering Research Honeywell (see Datamatic 1000; DDP-19; IBM Stretch (see Stretch)
Associates * UNIVAC, 43, 192 DDP-24; DDP-116; H-200; H-1400) IBM System/360, 43-45, 61, 64, 303, 396
(See also UNIVAC 1101, 1102; UNIVAC Host computer (see ARPA network) addressing, 565-566, 594
1103A) HP/Hewlett-Packard (see HP 9100A) array processor, 576-579
ERMA (see GE 100) HP 9100A, 44, 235-236, 243-256 base register, 594
ESS/Electronic Switching System (Bell D, 243-244, 254-256 (See also addressing above)
System), 303 ISP, 243-249 bihliography, 587
EULER, 44, 73, 257, 348, 382-392 microprogram, 254-256 branch instructions, 505
interpreter (microprogram), 385-392 packaging, 250, 252-253 channel-to-channel adapter, 576
introduction, 382-383 PMS, 235, 240-254 circuits, 564, 603-604
ISP, 383-385, 388-391 RT, 250 cost, 579-585
PMS, 382-392 T, 243, 248, 253 critique by authors, 561-587
data types, 564-565, 590-594
design, 561-564, 588
Fabri-Tek (see Bitran 6) direct control, 597
FACT language, 45 IAS/Institute for Advanced Studies nircchine (See also input-output below)
Ferranti Carp. Ltd. * ICT/International (see van Neumann) emulation, 562-563
Computers and Tabulators, 39 IBM ASP/Attached-Support Processor, 506 floating point, 591-592
(See also Atlas; Mercury; Pegasus) IBM 305 (disk), 43, 45 functional schematic, 589
FLAC (Florida), 39 IBM 650, 39, 43, 44, 91, 216, 220-223 general registers, 564-565
FOCAL (DEC) language, 236 ISP, 220-223 history, 561
FORMAC (IBM) language, 45 IBM 701, 39, 43-45, 47, 89, 515-516 (See also design above)
FORTRAN (IBM), FORTRAN 11, FORTRAN PMS, 515 information formats (see data types above)
IV language, 45, 50, 73, 348 (See also IBM 7094) innovations, 562
658 Machine and organization index
IBM System/360, input-output, 588, 598-601 IBM/System 360, PMS and PMS diagrams, L.4RC (UNIVAC), 43-44, 86. 396-397
[See also P(io; data channels) below; PMS T(print, punch, read), 580 Lehman Computer example (IBM Research),
and PMS diagrams below] T(telephone, typewriter), 579, 581 44-45, 446, 456-469
interpreter, 594-595, 604-605 processor state, 564-565, 588, 596-598 application, 464-469
interrupts, 596-597 RT, 568, 570, 572, 603-604 design philosophy, 456-457
introduction, 561, 588 S(cross-point; time-multiplexed; BCU), 573 instructions, 457-461
ISP, 564-566, 588-601 SLT/Solid Logic Technology, 564, 603-604 interrupt, 458-461
logical structure, 588-601 storage protection (see multiprogramming introduction, 456
(See also ISP above) above) operating system, 461-463
M(content addressable), 571, 573-574 storage-to-storage channel, 576-577 performance, 456-457, 463-469
M(Large capacity store), 571-572, 582-583 SVC/Supervisor Call, 597 PMS, 459-461
M(read only), 604-605 system implementations, 602-606 simulation, 463-469
(See also microprogramming below; timer, 597 Leibniz Calculator, 46
Models 30, 40, and SO below) variable-length character strings, 591 LEO I and 11, 39
microprogramming, 563-564, 604-605 ASCII, 593 Leprechan (Bell Telephone Laboratories), 43
(See also Models 30, 40, and 50 below) decimal, 593-594 LGP-30, and LGP-21 (General
model range, 561-564, 588, 602-606 EBCDIC, 592 Precision =) CDC), 44, 45, 74, 91, 192.
(See also performance below) ICT/International Computers and Tabulators, 216-219
Model 20, 563-567 91, 274 ISP, 217, 218-219
Model 25, 184, 563, 567, 569 (See also Atlas; KDF 9) PMS, 217
Model 30, 236, 348, 382-392, 566-568, ILLIAC I (University of Illinoiq), 39, 43-45, LINC/Laboratory Instrument Computer
602-606 89 (M.I.T. Lincoln Laboratory), 43, 44, 120
ISP, 385-388 ILLIAC II (University of Illinois), 43 LINC-8 (DEC) (see PDP-8)
microprogram, 382-385, 388-392 ILLIAC Ill (University of Illinois), 43, 351 Lincoln Laboratory (M.I.T.), 571
RT, 386 ILLIAC IV (University of Illinois), 43-45, 47, (See also CG24; FX1; LINC; MTC; TX-0,
Model 67, 76, 79, 275, 561, 563, ,571, 66, 72, 315, 320-330 TX-2)
573-574 input-output, 322, 327-328 LISP 1.0 and 1.5 language, 45
Model 75, 561, 563, 571 interpreter, 322-325 Lockheed Electronics (see MAC-16)
Model 85, 76, 561, 563, 574-575 introduction, 320-321 Los Alamos (see AEC)
Model 91, 561, 563, 575 ISP, 322-325, 330-333 LRL/Lawrence Radiation Laboratory,
Models 30, 40, and 50, 561, 563, 566, 568, PMS, 321-322, 327-329 Livermore, California, 396-397
602-603 K(P), 322-323 LKL network, 507
Mp, ,563, 571-572, 582-583, 602-603 RT, 326
multiprocessing, 456-469, 585-587 Illinois, University of, 43
multiprogramming, 565-566, 571, 573-574, (See also under ILLIAC) MAC-16 (Lockheed Electronics), 44
597-598 IMP computer (see ARPA, network) MAD language (University of Michigan), 45
iletworks, 576-579, 581, 598 Instrumentation Laboratory, M.I.T. (see AGC) MADM/Manchester Automatic Digital
(See also IBM ASP) Interdata, Model 3 and 4, 44, 184 Machine, 39, 58
performance, 563, 579-587, 602-606 IPL I, 11, Ill, IV, and V, 45, 257 Manchester University, 39, 45, 340
P(io; data channels), 573-574, 576-577, IPL VI/Information Processing Language. 44, (See also Atlas; MADM; Mark I; Muse)
598-601, 605-606 45, 73, 257, 348-362 MANIAC: I and 11 (University of California,
PMS and PMS diagrams, 563, 566-579, design, 349-350 Los Alamos), 39, 43, 89
602-606 interpreter, 351, 354-355, 359-362 Mark I (Manchester University), 43
K(specia1 controls), 576 ISP, 354-359, 361-362 Mark I, 11, Ill, and IV (Harvard), 39, 42-43,
Model 20, 567 RT, 352-354 46
Model 44, 569-571, 569 IPL VC, 257 Mathmatic language, 45
Model 67, 571, 573-574, 573 MEG, 39
Model 75, 567, 571-572 Mercury (Ferranti), 39, 279
Model 85, 574-575, 575 Michigan, University of, MAD, MIDAC, 192,
Jacquard Punched Card Loom, 46
Model 91, 575 209-212, 571
JOHNNIAC (RAND), 43-44, 78, 89
Models 30, 40, and SO, 65, 566-568, MIDAC (Michigan, University of), 39, 44, 192,
JOSS (RAND) language, 45, 78
566-567 209-212
Ms (data cell, disk, drum), 577, 579 JOVIAL (SDC) language, 45
ISP, 209-212
Ms(magnetic tape), 578-579, 578 MILSMAC, 347
P(array), 576-578, 576 MISTIC, 43
P(special), 576-578, 576 KDF 9 (English Electric), 44, 257-266 M.I.T. CTSS operating system, 45
S(c), 579, 581 D, 263-266 M.I.T./Massachusetts Institute of Technology
(See also networks above; IBM ASP) introduction, 282 (see AGC; GE 645; Lincoln Laboratory;
T(analog), 581 ISP, 262-263 MULTICS project; Whirlwind I)
T(audio), 579 PMS, 260 M.I.T. network, 507
T(display), 579 RT, 264 Monorobot, Monorobot XI, 39, 44
Machine and organization index 659
Monroe Calculator, 46 PDP-8, 8S, 81, 8L, and 5, M(core), 128-129 SD-2 (Librascope), 44, 334, 341-347
Monroe Corporation, 46 PMS, 20-21, 123-131, 121, 124, 126, 128 desip, 341-343
Moore School of Electrical Engineering (see RT, 125, 127-131 interpreter, 550-552
Pennsylvania, University of) (See also DEC 338) introduction, 341
MOSAIC, 39 PDP-10 and 6, 43-45, 79, 170, 275, 564 ISP, 343-347
Motorola 1O(M), 44-45 PDP-12, LINC-8 (see PDP-8) microprogram, 345-346
MTC/Memory Test Computer (M.I.T. Lincoln Pegasus (Ferranti), 44, 62, 170-183, 564 packaging, 341-343
Laboratory), 39, 45, 89 circuits, 171-174, 176 PMS, 343
Mueller's Difference Engine, 46 introduction, 181 RT, 343-345
MULTICS project (M.I.T.), 45, 571 ISP, 176-179, 182-183 SDC/Systems Development Corp., 45
Muse (Manchester University), 43, 277 logcal design, 172-175, 179-181 SDS/Scientific Data Systems * XDS/Xerox
packaging, 174-176, 179-182 Data Systems (see SDS 910; SDS 940 and
Pennsylvania, University of (Moore School), 945; Sigma 2 and 3; Sigma 5 and 7)
NBS/National Bureau of Standards (see 43, 46, 05 SDS 92, 44, 120
DYSEAC; PILOT; SEAC) (See also EDVAC; ENIAC) SDS 910, 020, 925, 030, 9300, 43, 44, 91, 291,
Neher Laboratory, 200 Philco 212, 44 542-560
Network of Computers, 504-512, 505-512 PILOT (National Bureau of Standards) 39, 43, history, 542-543
ARPA, 510-512 44, 75, 397-398, 440-445, 449 input-output, 543-545, 552-555
ComLogNet, 509-510 applications, 440 interpreter, 551-552
IBM ASP, 506 input-output, 444-445 interrupt, 553-555
LRL, 507 ISP, 442-444 introduction, 542-543
M.I.T., 507 performance, 440-442 ISP, 544-545, 548-550, 556-560
SABRE, 504 PMS 398, 440-442 PMS, 275, 543, 546-548, 546
SAGE, 504 Polymorphic (RW) (see RW-40 and 400) RT, 550-552
Texas, University of, 506-507 Programma 101 Desk Calculator (See olso BTSS)
typical, 508-509 (Olivetti-Underwood), 44, 216, 235, SDS 940 and 945 (SDS, University of
NORC, 39, 44 237-242 California, Berkeley), 43-44, 79, 275,
NOVA (LRL/Lawrence Radiation ISP, 237-242 291-300, 542
Laboratory), 44, 66, 315-319 PMS, 237-238, 237 (See also BTSS)
applications, 316-317 Programmed Console (Washington University), SEAC (National Bureau of Standards), 39,
introduction, 316 120 43-45, 172, 192, 209-212, 440
ISP, 317-318 PUFFT, compiler, 45 SEL/Systems Engineering Laboratories, 44
RT, 318 SEL 810, 44
NPL/National Physical Laboratory, 45 Sigma 2 and 3 (SDS * XDS), 43-44, 78
(See also ACE) RAND Corporation (see JOHNNIAC; JOSS) Sigma 5 and 7 (SDS 3 XDS), 43, 170, 396, 564
RAYDAC (Raytheon), 39 SILLIAC, 89
RCA/Radio Corporation of America (see SIMSCRIPT language, 45
Olivetti-Underwood (see Programma 101 Desk BIZMAC I, 11; SPECTRA 70 Series) SIMULA, language, 45
Calculator) RCA 110, 43, 44 SNOBOL language, 45
ONR/Office of Naval Research, 137 RCA 301 and 3301, 4.3 SOL language, 45
ORACLE, 89 RCA 501 and 601, 4.3, 44, 225 SOLOMON, 315, 320
ORDVAC (University of Illinois), 39, 43, 89 RCA 1600, 184 Soviet Academy of Sciences, 213
RCA Spectra 70, 561-562 SPC-8 and 12, 44
Recomp I, 11, and 111, 44 SPECTRA 70 Series (RCA), 43
Pascal Calculator, 46 Rice University computer, 45, 53 SS 80 I and I1 (UNIVAC), 43
PB/Packard Bell a Raytheon (see PH-250; RW/Radio Wooldridge (see AN/UYK) Strela/Arrow (Russian), 44, 192, 213-215
PR-440) RW-40 and 400 (Thompson, Ramo, ISP, 213-215
PB-250, 44, 74, 101 Wooldridge), 44, 53, 192, 400, 470-471, Stretch/IBM 7030, 43-45, 47, 91, 396-397,
PB-440, 334 477-488 421-439
PDC 808, 816, 44 design philosophy, 477 arithmetic, 428-431
PDP-1 (DEC), 44-45 interrupt, 481-482 circuits, 433-438
PDP-4, 7, 9, and 15, 43-45 ISP, 470, 480-482 D, 427-431
PUP-8, 8S, 81, 8L, and 5, 20-32, 43-44, 49, 90, ISP language, 486-488 input-output, 421-422
120-136, 396 PMS, 471, 477-480, 482-485 interrupt, 423
applications, 120 introduction, 421
circuits, 132-133 ISP, 422-424
input-output, 123 SABRE network (American Airlines), 45, 504 K(P), 424-428
interpreter, 131 SAGE/Semi-Automatic Ground Environment look-ahead, 426-428
interrupt, 123 network, 45, 504 packaging, 432, 438-439
ISP, 22-33, 120-123, 127, 134-136 SCC/Scientific Control Corp. 650, 120 performance, 421-423, 425-426, 431-433
Logical design, 127-133 Schickhardt Calculator, 46 PMS, 421-423, 425-426
660 Machine and organization index
Stretch/IBM 7030, RT, 426-431 UNIVAC 11 and 111, 39, 43-45 WEIZAC, 43, 89
Subscriber Station (see ComLogNet) UNIVAC 418, 1218, and 1818, 43-44 Whirlwind I (M.I.T.), 10, 39, 43-45, 55,
SWAC, 39, 43 UNIVAC 490, 491, 492, and 494, 43-44 58, 90, 137-145, 303, 470
Systein/3fiO (see IBM System/360) UNIVAC 1004 I, 11, 111, 1005 I, 11, and 111, applications, 138
43, 44 D, 142
UNIVAC 1050, 43, 44 interpreter, 140- 141
Texas, University of, network, 506-507 UNIVAC 1101 and 1102, 39, 43 introduction, 137-139
Toronto University Computer, 44 UNIVAC 1103A, 39, 43, 44, 48, 62, 192, ISP, 145
TRAC language, 45 205-208 K, 139-143
TRE, 39 ISP, 205-208 M, 141
TRW/Thompson, Ramo, Wooldridge (seeRW-40 UNIVAC 1105, 39, 43 packaging, 141-143
and 400) UNIVAC 1108, 1107, and 1106, 10, 43-45, 62. PMS, 90, 138-139
Turing machine, 23 170, 192, 564 Wilkes’ microprogrammed computer example,
TX-2 and TX-0 (Lincoln Laboratory, M.I.T.), UNIVAC 1206, 43 44, 335-340
39, 43-45, 274 UNIVAC 1212 (Military), 43 design, 335-337
UNIVAC 9200 and 9300, 43 introduction, 335
ISP, 337-340
UNCOL language, 8-9, 13 microprogram, 339-340
US. Army Ordnance Department, 92 RT, 336
UNIVAC, 39, 43-45, 48, 91, 157-160 Varian Associates (see under DMI)
applications, 164-165 van Neumann/IAS/Institute for Advanced
design constraints, 163 Studies, 39, 42, 44, 58, 89, 92-119, 152, XDS/Xerox Data Systems (see SDS)
input-outpnt, 158, 161-162 398
interpreter, 159-161 applications, 92-93
ISP, 157-160 checking, 118
performance, 164-168 D, 96-111 ZEBRA (Standard Telephones and Cables,
PMS, 158 design constraints, 92-93 Ltd.), 44, 191-192, 200-204, 216
reliability, 165-169 input-output, 92, 117, 119 introduction, 200
RT, 157-160 interpreter, 111-119 ISP, 200-204
T(io), 161-163 ISP, 111-119 PMS, 201
(See aEo SS 80 I and 11) M, 94-96 ZUSE Company, 39, 42
Subject Index
Page references in boldface refer to the Appendix, ISP descriptions, and PMS diagrams.
abbreviation/, 19, 607, 609 arithmetic element, Whirlwind, 142 boolean-operations E @ 3 V A -,, 608-609,
acceptance test, UNIVAC, 165-166 arithmetic expression, 614 633-635
access-i-unit-operation, 633 arithmetic-function-operation, 614 branch instruction, 595
access-time, 620-622 arithmetic organ, van Neumann, 98 breakouts, IPL VI, 350-351
accessing algorithm, 41 (See also D/data-operation) buffer module, RW-400, 482-484
accumulator, ZEBRA, 202 arithmetic unit, KDF 9, 263-266 bulk core memory (see M/memory)
accumulator register, 59-60, 98 array instructions, NOVA, 316-319 bus, 10 (See also S/switch)
accuracy, HP 9100 A, 246, 256 array processor [see P(array)] business computer (see function)
acoustic delay line, 96 ASCII/American Standards Code for buzzer, ACE, 198
[See also under M(delay line)] Information Interchange, 593 by/byte, 616
action t,23-24, 631-632 assemble instruction, 457-458 IBM Stretch, 423
action-sequence, 23, 631 assignment: =, 23, 607, 609 IBM System/360, 591
actual address, 76-81 associative memory [see look-aside memory;
(See also physical address) M(associative)]
adaptability: attribute, 19, 607, 612-613 C/compnter (see computer)
D825, 447-448 attribute-list, 612-613 C(l Pc), 40-41, 63-70, 395
RW-400, 477-479 attribute: value pair (see attribute; value) C ( l Pc-nPio), 40-41, 63-70, 396-398
adder, Pegasus, 174 auto index register, 120-122, 134 capital letters, 609
addition, van Neumann, 98-99 availability, 447 card, IBM, 617
address-expression, 631-632 Lehman computer, 456-457 carrier, 618
address-range [ 1, 24, 631-633 available space list, IPL VI, 352-353 data-type, 629-631
address-size, P, 626-627 carry, 98-99
addresses-per-instruction, P, 57-63, 627 casting out three, Stretch, 431
(See also instruction format) central processor [see P(c)]
addressing (see memory addressing; memory b (see bit) channels [see P(io)]
mapping; multiprogramming) I3 line: character-base, 631-632
addressing system, memory, 16 Atlas, 277-278 character/char, 616
aerospace computer, 146-1.56 Manchester University, 340 character generation instruction, 308
algorithm-encoding-efficiency, P, 627 (See also index register) character string, 184-185
alias/, 19, 607, 609 barrel, CDC 6600, 474, 489-491 (See also variable-length character string)
alphabet, 609, 613 base, 24, 55-56, 614, 616, 631 checking:
alternation 1, indefinite expression, 17, 610 base-data-type, 630-631 Stretch, 431
and A , 25 base register, MIDAC, 210 UNIVAC, 160-161, 168-169
(See also n-ary-boolean-operation) bench-mark, 52 Whirlwind, 143-144
antecedent, 619 bilinear switch, 623-624 circuit level, 4
applications: binary-arithmetic-operation + - , 614, 633-635 circuits:
Lehman computer, 464-469 binary-boolean-operation, 615, 633-635 CDC 6600, 494-405
NOVA, 316-317 binary-decimal conversion, 211 component count, 470-471
PDP-8, 120 (See also ISP, IBM System/360) PDP-8, 132-133
PILOT, 440 binary machine, 87-88 Pegasus, 171-174, 176
UNIVAC I, 164-165 binary-operation, 28, 633 Stretch, 433-438
van Neumann, 92-93 binary-value, 611 component count, 431-432
Whirlwind I, 138 bit/binary-digit, 611, 616-617 class, 609-610
approximation-, 607-608, 610 bit string, 317-318 cocomponent, 617
architecture, 562 (See also data-type, Stretch) co-incident current memory [see M(core)]
(See also ISP; under PMS) block, 617 colon : , 19, 612-613, 631
archival memory [see M(archiva1)l block diagram (see PMS diagram; PMS level; (See cilso attribute:valrie pair)
area, 617, 619 RT) combinatorial circuits, 5
arithmetic: block transfer, ZEBRA, 204 comma, 611
multiple-precision, AGC, 151-152 BNF/Backus-Normal Form (Backus-Naur commands, 608-610
parallel, 429-4,M Form), 9 (See also abbreviation; assignment, form;
serial, 428-429 boolean, 608, 615 variable)
Stretch, 428-431 booleaii-expre\\lon, 615 COMMENT, 608
66 I
662 Subject index
floating point, Stretch, 429-431, 433 information length, 16 instruction interpreter (see interpreter)
UNIVAC 1103A, 208 information-rate, 617-618 instruction look-ahead (see look-ahead)
Wilkes example, 335 information units, 616-618 instruction-memory, P, 627-628
fork instruction, 325, 457 inhibit drivers [see M(core)] instruction modification, 209-210
form, 607, 610 input-output: instruction-set, 25
format, data-type, 629-631 ACE, 197-199 ISP, 636-637
full-duplex, 617-618 Atlas, 274-283, 285-289 K, 624-625
function, 37, 40, 46-49 BTSS, 297-300 P, 626
biisiness, 47-48 D825, 454-455 (See also ISP)
C, 618 IBM 1800, 405, 509-411 instruction-size, P, 626-627
communication, 48 IBM 7094, 524-525 instruction-source, K, 624-625
component, 617 ILLIAC IV, 322, 327-328 instruction unit, Stretch, 426-427
control, 48 PDP-8, 123 integer-data-type, 630-631
file control, 48 PILOT, 444-445 + integer-data-type, 630-631
operation, 28 SDS 900 series, 543-545, 552-555 integer-name, 614
P, 626-627 Stretch, exchange, 421-422 + integer-name, 614
scientific, 47 UNIVAC I, 158, 161-162 - integer-name, 614
T, 625-626 input and output organ, van Neumann, 91, integrated circuit memory (see M/memory)
terminal, 48-4:) 117, 119 interaction controller, Lehman computer, 460
time-sharing, 49 instruction: interaction function, Lehman computer,
functional units, CDC 6600, 473, 494 control, DEC 338, 308-309 458-461
data, DEC 338, 307-308 interference, processor-memory, 463-4611
ISP, 601-632 interflow, 151
gate tubes, 112-119 special, Lehman computer, 457-461 interlace (see data channel, SDS 900 series)
general conventions, PMS and ISP, 607-615 instruction backup register, IBM 7094, interleaving (see memory interleaving)
general registers: 520-522 interpretation-cycle, 22-36
&bit character computer, 184-187 instruction buffers, 84 ( See also interpreter)
Pegasus, 176-379 ILLIAC IV, 323-324 interpreter, 22-36
generations (first, second, third, and fourth), (See also look-ahead; look-aside) AGC, 147-148
39-40, 43-46 instruction decoding diagram, 122-123, 184 DEC 338, 305
Gibson mix, 49-50 instruction-efficiency, P, 626-627 EULER microprogrammed, 385-392
graph-plot instructions, 308 instruction examples, ISP, 632, 635-637 FORTRAN Machine, 366-379
Instruction-execution, ISP, 25-36, 637 IBM 1401, 229
instruction execution process, ISP, 637 IBM 1800, 408-409
Half-diiplex, 617-618 instruction-expression, 23, 631-632 IBM 7094, 522-523
hexa-decimal-&@/hex, 616 instruction format: ILLIAC IV, 322-325
hierarchy (see structure) 0 address/stack, 62-64, 257-261 IPL VI, 351, 354-355, 359-362
switch, 623-624 stack: B 5000, 268-273 ISP, 636-637
high-level language, B 5000, 267 KDF 9; 262-266 PDP-8, 131
high-speed core memory (see M/memoryj 1 address, 58-60, 64, 87-91 SDS 900 series, 550-552
history, 38-46, 617, 619 ACC, 145)-150 Stretch (see instruction unit)
hyphen-, 25, 607 1 + general register (see general regstersj UNIVAC, 15R-161
hyphen-name, 613-614 1 + 1 address, IRM 650, 220-223 von Neumann, Ill-119
1+ index address, 58-60, 87-9 1 Whirlwind I, 140-141
2 address, 60-61 interprocess communication, 41
i-rate, 617-618 RW-400, 470, 480-482, 486-488 interprogram communication, 81-83
i-imit/information unit, 16, 616-618 UNIVAC 1103A, 205-208 interrupt/interprocess interrupts, 82-83, 411
base-unit, 616 3 address, 60-61 Atlas, 274-283
data-type, 629-631 MIDAC, 209-212 B 5000, 267-272
length, 616 Strela, 213-215 D823, 452-453
i-iinit-name, 616 general registers, 61, 64 Lehinan computer, 458-461
i-unit-prefix (See also general registers) PDP-8, 123
IBM-card, 617 IBM 1800, 407-408, 410-411 RW-400, 48 1-482
iconoscope tube, 84 ISP, 25, 636-637 SDS 900 series, 553-5.55
illegal instruction, BTSS, 293 n + 1 address, 61, 191 Stretch, 423
indefinite expression, 607-608, 610 SUS 900 series, 544-545, 548-552 interrnpt-response-time, P, 626-627
index#, 20, 613 variahle number of addresses per intraprocess interrupt/trap, 82-83
index register, 59-80 instruction, 63 (See also extra codes; trap)
information, 616 instruction highway, ACE, 197 1/0 Bits:
information base, 24, 55-56, 614, 616, 631 instruction interpretation process, ISP, PDP-8, 124-126
information-content, data-type, 629-631 636-637 SDS 900 series (see input-output)
664 Subject index
ISP/Instruction-set Processor, 12, 22-33 large capacity store/LCS, 571-572, 582-583 M(magnetic tape; Univervo), 157
ACE, 193-199 lattice (see structure) M(moving head diskpak), 74
AGC, 152-155 length, 616 M(p/primary memory), 17, 24, 74
Atlas, 276-279, 283-285 length-type, data-type, 629-631 M(p; concurrency), 41, 76-81
B 5000, 268-273 level, system, 3-4 M(p; size), 41
BTSS, 292-297 LINCtape, 124-126 M(photostore; IBM), 507
CDC 6600, 472, 401-493, 497-503 lineage, 617, 619 M(punched card), 74
DEC 338, 305-309, 310-314 linear switch, 623-624 [See also T(punch)]
D825, 453 link, 619-620 M(queue), 73
%bit character computer, 184, 186-187 delay, 620 M(random), 75
EULER, 38.3-385, 388-391 port-to-port delay, 620 M(read only), 604-605
FORTRAN, 363-365 list, 607, 611 M(read only; capacitor; System/360; Model
HP 9100A, 243-249 list processing, EULER, 384 30), 385-387
IBM 650, 220-223 list structure, IPL VI, 350 M(read only; HP 9100A), 235, 250-253
IBM 1401, 226-229, 231-234 literal syllable, B 5000; 272 M(read only; rope; AGC), 146-147
IBM 1800, 407-416, 417-420 location, S, 623-624 M(s/secondary), 74
IBM 7094, 523, 526-541 logic diagrams, PDP-8, 127-133 M(stack), 73
IBM System/360, Model 30, 385-388 logic equations, PDP-8, 127-133 M(stack; B SOOO), 269-271
ILLIAC IV, 322-325, 330-333 logic technology, 40, 617-618 M(thin film; D825), 453-454
1PL VI, 354-358, 361-362 logical address, 76-81 M(togg1e switch; Whirlwind I), 142-143
KDF 9, 262-263 BTSS, 291 M(UNIVAC), 158, 164
LGP-30, LGP-21, 217, 218-219 (See also memory mapping; machine-independent language, B 5000;267
MlDAC, 209-212 multiprogramming) macro-parallelism, 456, 463
NOVA, 317-318 logical design l e d , 5 magnetic card [see M(magnetic card)]
PDO-8, 22-25, 26-27, 28-33, 120-123, 127, FORTRAN Machine, 365-381 magnetic tape [see M(magnetic tape)]
134-136 PDP-8, 127-133 mapetic wire memory, 96
Pegasus, 176-179, 182-183 Pegasus, 172-175, 179-181 main line of computers, 87-91
PILOT, 442-444 logical structure (see ISP, IBM System/m) maintenance:
Programma, 237-242 look-ahead: ILLIAC IV, 328-329
RW-40, RW-400, 470, 480-482, 486-488 Atlas, 281-285, 287-289 Pegasus, 181-182
SD-2, 343-347 CDC 6600, 492-494 UNIVAC, 165-169
SDS 900 series, 544-545, 548-550, 556-560 IBM 7094, 550-552 Whirlwind I, 138-139, 142-143
Strela, 213-215 ILLIAC IV, 323-324 manufacturer catalog number, 617
Stretch, 422-424 Stretch, 397, 422, 424-428 manufacturer name (see proper-name)
UNIVAC, 157-160 look-aside memory, 84, 574 manufacturer-type, 619
UNIV.4C 1103A, 205-208 [See also M(content addressable)] map (see memory map; multiprogramming)
von Neumann, 111-119 marks, 609
Whirlwind, 140-141, 145 master control program, B 5000,267-268
Wilkes example, 337-339 M/memory, 16-22 master slave schemes, D825, 449
ZEBRA, 200-204 (See also memory) matrix milltiply problem, Lehman computer,
ISP conventions, 628-637 M(associative), 76 464-466
italics, 24, 608 [See also M(content addressable)] medium, 618
M(bu1k core), 74 memory, 620-622
M(content addressable), 74 access-time, 620-622
join instruction, 457 (See also look-aside) cycle-time, 620-622
M(core), PDP-8; 128-130 function, 620-622
M(cyclic), 73-74 information-rate, 620-622
K/control, 16-22 M(de1ay line; ACE, Deuce), 191, 193-199 operations, 620-622
(See also control) M(de1ay line; Pegasus), 173-174, 177 permanency, 620-621
k/kilo, 616 M(de1ay line; UNIVAC), 163 portability, 620-621
kernels, 464 M(drum), 74 primary, 621
keyboard: M(e1ectrostatic; Whirlwind I), 141 [See also M(p)]
HP 9100A, 235, 244-249, 251-253 M(fixed-head disk), 74 processor state, 621
Programma 101; 237-242 M(fixed-head disk; ILLIAC IV), 322, 327-328 secondary, 621
[See also T(keyboard)] M(1arge storage; Whirlind), 137-138, 141 [See also M(s)]
M(magnetic card), 74 size, 620-622
M(magnetic card; HP 91(K)A), 248-249, 253 technology, 620-622
L/link, 16-22, 619-620 M(1nagnetic card; Programma 101), 237-242 (See also M/memory; memory organ)
label, 612 M(magnetic tape), 74 memory access algorithm, 73
labeled-entity, 612 M(magnetic tape; IBM format), 126 memory addressing:
language, 9 M(magnetic tape; RW-400), 483 AGC, 155-156
Subject index 665
w/word, 617 word length, CDC 6600, 489, 492 word size, 40
wait instruction, 458 PILOT, 442444 writability, 618
weight, 617, 619 Stretch, 414-421
Wideband Communication Center, 507 (See also performance)
wiring (see packaging) Whirlwind, 137 x-list, 611
word/w, 617 (See also data-type; design philosophy) x-name, 614
word length, 56-57 word mark character, IBM 1401, 226 x-set, 611
AGC, 146, 148-152