Evolution Computer1
Evolution Computer1
!"Under the right conditions the shift to the new technology can lead to possible
increase in processor speed of hundred to thousand times
Electromechanical Computer
All- electronic computer with vacuum tubes
Fully transistorized computer
Scalable massive parallelism
Machine for Abacus from
computational China
assistance Assisted
1822
1642 1672 CharlesBabbage +,-,/,* and solve polynomial
Blasie Pascal Leibniz made equations
made the first a machine
machine that that could Idea of a programmable machine
could Add perform all
four basic Never succeeded but made an “Analytical
functions Machine”
SCALAR PROCESSING
9 * 100 instructions
Task multiply
100 numbers
Output result
100 + 9 instructions
Computer Architectures
Taxonomy of Architectures
For computer architectures, Flynn proposed that the two dimensions be termed Instruction and
Data, and that, for both of them, the two values they could take be Single or Multiple.
• Synchronous (lock-step)
• Deterministic
Multiple Instruction, Multiple Data (MIMD)
• Synchronous or asynchronous
your own thing" mode. Some kinds of algorithms require one or the other, and
different kinds of MIMD systems are better suited to one or the other; optimum
efficiency depends on making sure that the system you run your code on reflects the
style of synchronicity required by your code.
• Non-deterministic
• Multiple Instruction or Single Program
Parallel Tasks : Tasks whose computations are independent of each other, so that
all such tasks can be performed simultaneously with correct results.
Parallelizable Problem : A problem that can be divided into parallel tasks. This
may require changes in the code and/or the underlying algorithm.
Types of Parallelism: There are two basic ways to partition computational work among
parallel tasks:
Data parallelism: each task performs the same series of calculations, but applies them to
different data. For example, four processors can search census data looking for people
above a certain income; each processor does the exact same operations, but works on
different parts of the database.
Functional parallelism: each task performs different calculations, i.e., carries out
different functions of the overall problem. This can be on the same data or different data.
For example, 5 processors can model an ecosystem, with each processor simulating a
different level of the food chain (plants, herbivores, carnivores, scavengers, and
decomposers).
Parallel Overhead
Shared Memory
DISTRIBUTED MEMORY
!"First we cover the ideal goals for a parallel solution. We review functional and
data parallelism, and SPMD and Master Worker.
!"Then we walk through 5 problem examples showing diagrams of possible parallel
solutions.
!"Problems faced in prallel programming
Goals (ideal)
Ideal (read: unrealistic) goals for writing a program with maximum speedup and
scalability:
• Each process has a unique bit of work to do, and does not have to redo any other work in
order to get its bit done.
• Each process stores the data needed to accomplish that work, and does not require anyone
else's data.
• A given piece of data exists only on one process, and each bit of computation only needs
to be done once, by one process.
• Communication between processes is minimized.
• Load is balanced; each process should be finished at the same time.
Functional Parallelism?
Data Parallelism?
Distributed memory architectures are fertile grounds for the use of many different styles
of parallel programming, from those emphasizing homogeneity of process but
heterogeneity of data, to full heterogeneity of both.
Data parallel
Many significant problems, over the entire computational
complexity scale, fall into the data parallel model, which basically
stands for "do the same thing to all this data":
Explicit data distribution (via directives)
The data is assumed to have some form of regularity, some
geometric shape or other such characteristic by which it may be
subdivided among the available processors, usually by use of
directives commonly hidden from the executable code within
program comment statements.
We're now going to discuss some general issues relevant to the construction of well-
designed distributed applications which rely on explicit message passing for data- and
control-communications. These principles are largely concerned with issues you should
be focusing on as you consider the parallelization of your application:
• How is memory going to be used, and from where?
• How will the different parts of the application be coordinated?
• What kinds of operations can be done collectively?
• When should communications be blocking, and when non-blocking?
• What kinds of synchronization considerations need to be addressed, and when?
• What kinds of common problems could be encountered, and how can they be
avoided?
As has been mentioned before, and as will be mentioned again:
There's no substitute for a good design ... and the worse your design, the more time
you'll spending debugging it.
It must be emphasized that the machine does not think for itself. It may exercise some degree of
judgment and discrimination, but the situations in which these are required, the criteria to be
applied, and the actions to be taken according to the criteria, have all to be foreseen in the
program of operating instructions furnished to the machine. Use of the machine is no substitute
for thought on the basic organization of a computation, only for the labour of carrying out the
details of the application of that thought."
Douglass R. Hartree, Moore School lecture, Univ. of Penn., 9 July 1946
Addressability
As one module in a distributed application, knowing what you know, and, for what you
don't who to ask, is one of the central issues in message passing applications. "What you
know" is the data you have resident on your own processor; what you don't know" is
anything that resides elsewhere, but you've discovered is necessary for you to find out.
CPU can issue load/store operations involving local memory space only
Requests for any data stored in remote processor's memory must be converted by
programmer or run-time library into message passing calls which copy data between local
memories.
You not only have to know that you don't know something, or that something that you
used to know is now out-of-date and needs refreshing ... you also need to know where to
go to get the latest version of the information you're interested in.
Synchronization is going to cost you, because there's no easy way to quickly get this kind
of information to everybody ... that's just one of defining characteristics of this model of
operation, and if its implications are too detrimental to the effectiveness of your
application, that's a good enough reason to explore other alternatives.
keep your synchronization requirements to the absolute minimum, and code them to be
lean-and-mean so that as little time is taken up in synchronization (and consequently
away from meaningful computation) as possible.
All messages must be explicitly received (sends and receives must be paired)
Just like the junk mail that piles up in you mailbox and obscures the really important stuff
(like your tax return, or the latest edition of TV-Guide), messages that are sent but never
explicitly received are a drain on network resources.
Grain Size
Grain size loosely refers to the amount of
computation that is done between Starvation
communication or synchronization
The amount of time a processor is
( T + S ) * equally shared load interrupted to report it’s present state
Should not be large or the processor
So S is important will not have time to compute
This should not so much that you are A set of processes is deadlocked if each
unable to the number of tasks exceeds the processes in the set hold and none will
number of processors if this happens the release until the processes have granted the
forward execution of the program fill be other resources that they are waiting
severly impaired
You can try to detect a deadlock a kill a
Dynamic switching is a technique might be process but this requires a monitoring
used to jump between the two. system
By this point, I hope you will have gotten the joint message that:
Parallel processing can be extremely useful, but...
There Ain't No Such Thing As A Free Lunch
Programmer's time
As the programmer, your time is largely going to be spent doing the following:
Recoding
Having discovered the places where you think parallelism will give results, you now have
to put it in. This can be a very time-consuming process.
Complicated debugging