0% found this document useful (0 votes)

145 views

PPC Unit 5 Question Bank and Answers

The document discusses the architecture of CC-NUMA and vector parallel machines like Cray Y-MP. It also discusses memory models in MIMD machines and the butterfly network used in the BBN Butterfly parallel computer.

Uploaded by

chaitali.choudhary2781

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views

PPC Unit 5 Question Bank and Answers

Uploaded by

chaitali.choudhary2781

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

1. What is CC-NUMA machine?

Ans. Stanford University’s directory architecture for shared memory (DASH) project of
the early 1990s had the goal of building an experimental cache-coherent
multiprocessor. The 64-processor prototype resulting from this project, along with the
associated theoretical developments and performance evaluations, contributed insight
and specific techniques to the design of scalable distributed shared-memory machines.
DASH has a two-level processor-to-memory interconnection structure and a
corresponding two-level cache coherence scheme (Fig. below ). Within a cluster of 4–16
processors, access to main memory occurs via a shared bus. Each processor in a cluster
has a private instruction cache, a separate data cache, and a Level-2 cache. The
instruction and data caches use the write-through policy, whereas write-back is the
update policy of the Level-2 cache. The clusters are interconnected by a pair of
wormhole-routed 2D mesh networks: a request mesh, which carries remote memory
access requests, and a reply mesh, which routes data and acknowledgments back to
the requesting cluster. Normally, a processor can access its own cache in one clock
cycle, the caches of processors in the same cluster in a few tens of clock cycles, and
remote data in hundreds of clock cycles. Thus, data access locality, which is the norm in
most applications, leads to better performance. Inside a cluster, cache coherence is
enforced by a snoopy protocol, while across clusters, coherence is maintained by a
write-invalidate directory protocol built on the release consistency model for improved
efficiency. The unit of data sharing is a block or cache line. The directory entry for a
block in the home cluster holds its state (uncached, shared, or dirty) and includes a bit-
vector indicating the presence or absence of the cache line in each cache. Remote
memory accesses, as well as exchanges required to maintain data coherence, are
orchestrated via point-to-point wormhole-routed messages that are sent between
cluster directories over 16-bit-wide channels.
When the required data are not found in the local cluster, an access request is
sent to the cluster holding the home directory, which then initiates appropriate actions
based on the type of request and the state of the requested data. In the case of a read
request, the following will happen:

1. For a shared or uncached block, the data are sent to the requester from the home
cluster and the directory entry is updated to include the new (only) sharer.

2. For a dirty block, a message is sent to the cluster holding the single up-to-date copy.
This remote cluster then sends a shared copy of the block to the requesting cluster
and also performs a sharing write-back to the home cluster.

A write (read-exclusive) request will trigger the following actions by the home
directory, with the end result of supplying the requester with an exclusive copy of the
block and invalidating all other copies, if any:
3. For a shared or uncached block, the data are sent to the requester and the directory
entry is updated to indicate that the block is now dirty. Additionally, for a shared
block, invalidation messages are sent to all caches that hold copies of the block, with
the expectation that they will acknowledge the invalidation to the requester (new
owner).

4. For a dirty block, the request is forwarded to the appropriate cluster. The latter then
sends an exclusive copy of the block to the requesting cluster and also performs a
sharing write-back to the home cluster.

2. Describe the architecture of vector parallel Cray Y-MP machine.

Ans: The Cray Y-MP series of vector-parallel computers were introduced in the late
1980s, following several previous Cray vector supercomputers including Cray-1, Cray-2,
and Cray X-MP. Subsequently, the Cray C-90 series of machines were introduced as
enhanced and scaled-up versions of the Y-MP. The Cray Y-MP consisted of a relatively
small number (up to eight) of very powerful vector processors. A vector processor
essentially executes one instruction on a large number of data items with a great deal
of overlap. Such vector processors can thus be viewed as time-multiplexed
implementations of SIMD parallel processing. With this view, the Cray Y-MP, and more
generally vector-parallel machines, should be classified as hybrid SIMD/MIMD
machines.
Figure below shows the Cray Y-MP processor and its links to the central memory
and inter- processor communication network. Each processor has four ports to access
central memory, with each port capable of delivering 128 bits per clock cycle (4 ns).
Thus, a CPU can fetch two operands (a vector element and a scalar), store one value,
and perform I/O simultaneously. The computation section of the CPU is divided into
four subsystems as follows:

1. Vector integer operations are performed by separate function units for add/subtract,
shift, logic, and bit-counting (e.g., determining the weight or parity of a word).

2. Vector floating-point operations are performed by separate function units for

add/subtract, multiply, and reciprocal approximation. The latter function unit is used in
the first step of a division operation x/y. The approximation to 1/y that is provided by
this unit is refined in a few iterations to derive an accurate value for 1/y, which is
multiplied by x in the final step to complete the division operation.

3. Scalar integer operations are performed by separate integer function units for
addition/subtraction, shift, logic, and bit-counting.

4. The add/subtract and multiply operations needed in address computations are

performed by separate function units within an address subsystem that also has two
sets of eight address registers (these 32–bit address registers and their associated
function units are not shown in Fig. above).

The eight vector registers, each holding a vector of length 64 or a segment of a longer
vector, allow computation and memory accesses to be overlapped. As new data are
being loaded into two registers and emptied from a third one, other vector registers can
supply the operands and receive the results of vector instructions. Vector function units
can be chained to allow the next data-dependent vector computation to begin before
the current one has stored all of its results in a vector register. For example, a vector
multiply–add operation can be done by chaining of the floating-point multiply and add
units. This will cause the add unit to begin its vector operation as soon as the multiply
unit has deposited its first result in a vector register.

3. Explain various memory models of MIMD machine.

Ans:
On MIMD machines, a method is required to allow the independent processors to
exchange data when needed. In the early days of parallel computing, each
manufacturer would produce a series of communication routines that could be called
from within a program to carry out message passing tasks. Nowadays, the situation has
dramatically improved with the implementation of standard message passing routines
across a range of platforms. This allows parallel programs to be run without significant
changes on everything from workstation clusters to large scale parallel
supercomputers. The two most common (and useful) interfaces are PVM (parallel
virtual machine) and MPI (message passing interface) both of which work with Fortran
or C.

4. Explain Min-based butterfly network with the help of diagram.

Ans: The Butterfly parallel processor of Bolt, Beranek, and Newman became available
in 1983. It is a general-purpose parallel computer that is particularly suitable for signal
processing applications. The BBN Butterfly was built of 2–256 nodes (boards), each
holding an MC68000 processor with up to 4 MB of memory, interconnected by a 4-ary
wrapped butterfly network. Typical memory referencing instructions took 2 μs to
execute when they accessed local memory, while remote accesses required 6 μs. The
relatively small difference between the latencies of local and remote memory accesses
leads us to classify the BBN Butterfly as a UMA machine. The structure of each node is
shown in Fig. below. A microcoded processor node controller (PNC) is responsible for
initiating all messages sent over the switch and for receiving messages from it. It also
handles all memory access requests, using the memory management unit for
translating virtual addresses to physical addresses. PNC also augments the
functionality of the main processor in performing operations needed for parallel
processing (such as test-and-set, queuing, and scheduling), easily enforcing the
atomicity requirements in view of its sole control of memory.
The wrapped 4-ary butterfly network of the BBN Butterfly required four stages
of 4×4 bit-serial switches, implemented as custom VLSI chips, to accommodate the
largest 256- processor configuration. A small, 16-node version of the network is
depicted in Fig. below. Routing through the network was done by attaching the binary
destination address as a routing tag to the head of a packet, with each switch using and
discarding 2 bits of this tag.
For example, to send a message to Node 9 = (1001)two in Fig. below, the least-
significant 2 bits would be used to select the second output of the switch at the first
level and the most-significant 2 bits would indicate the choice of the third output in the
second-level switch. In typical applications, message collision did not present any
problem and the latency for remote memory accesses was dominated by the bit-serial
transmission time through the network. Because the probability of some switch failing
increases with the network size, BBN Butterfly systems with more than 16 processing
nodes were configured, through the inclusion of extra switches, to have redundant
paths. Besides improving the reliability, these redundant paths also offered
performance benefits by reducing message collisions.
5. Explain with the help of diagram
a) UMA
b)NUMA
c)CC-NUMA
d)COMA

Ans: Shared-memory implementations vary greatly in the hardware architecture that

they use and in the programming model (logical user view) that they support. With
respect to hardware architecture, shared-memory implementations can be classified
according to the placement of the main memory modules within the system (central or
distributed) and whether or not multiple copies of modifiable data are allowed to
coexist (single- or multiple-copy). The resulting four-way classification is depicted in
Fig. below.

With a central main memory, access to all memory addresses takes the same amount
of time, leading to the designation uniform memory access (UMA). In such machines,
data distribution among the main memory modules is important only to the extent that
it leads to more efficient conflict-free parallel access to data items that are likely to be
needed in succession. If multiple copies of modifiable data are to be maintained within
processor caches in a UMA system, then cache coherence becomes an issue and we
have the class of CC-UMA systems. Simple UMA has been used more widely in practice.
An early, and highly influential, system of this type was Carnegie-Mellon University’s
C.mmp system that was built of 16 PDP-11 minicomputers in the mid-1970s . It had
both a crossbar and a bus for inter-processor communication via shared variables or
message passing. When memory is distributed among processing nodes, access to
locations in the global address space will involve different delays depending on the
current location of the data. The access delay may range from tens of nanoseconds for
locally available data, somewhat higher for data in nearby nodes, and perhaps
approaching several microseconds for data located in distant nodes. This variance of
access delay has led to the designation nonuniform memory access (NUMA).

6. Explain Interconnection network for Message Passing Scheme.

Ans: When more than one processor needs to access a memory structure,
interconnection networks are needed to route data:
• from processors to memories (concurrent access to a shared memory structure), or
• from one PE (processor + memory) to another (to provide a message-passing
facility).

Interconnection networks for message passing are of three basic types:

1. Shared-medium networks. Only one of the units linked to a shared-medium network

is allowed to use it at any given time. Nodes connected to the network typically have
request, drive, and receive circuits. Given the single-user requirement, an arbitration
mechanism is needed to decide which one of the requesting nodes can use the shared
network. The two most commonly used shared-medium networks are backplane buses
and local area networks (LANs). The bus arbitration mechanism is different for
synchronous and asynchronous buses. In bus transactions that involve a request and a
response, a split-transaction protocol is often used so that other nodes can use the bus
while the request of one node is being processed at the other end. To ease the
congestion on a shared bus, multiple buses or hierarchical bus networks may be used.
For LANs, the most commonly used arbitration protocol is based in contention: The
nodes can detect the idle/busy state of the shared medium, transmitting when they
observe the idle state, and considering the transmission as having failed when they
detect a “collision.” Token-based protocols, which implement some form of rotating
priority, are also used.

2. Router-based networks. Such networks, also known as direct networks, are based on
each node (with one or more processors) having a dedicated router that is linked
directly to one or more other routers. The local node(s) connected to the router inject
messages into the network through the injection channel and remove incoming
messages through the ejection channel. Locally injected messages compete with
messages that are passing through the router for the use of output channels. The link
controllers handle interfacing considerations of the physical channels. The queues hold
messages that cannot be forwarded because of contention for the output links. Various
switching strategies (e.g., packet or wormhole) and routing algorithms (e.g., tag-based
or use of routing tables) can be implemented in the router.

3. Switch-based networks. Such networks, also known as indirect networks, are based on
crossbars or regularly interconnected (multistage) networks of simpler switches.
Typically, the communication path between any two nodes goes through one or more
switches. The path to be taken by a message is either predetermined at the source
node and included as part of the message header or else it is computed on the fly at
intermediate nodes based on the source and destination addresses. Switch-based
networks can be classified as unidirectional or bidirectional. In unidirectional networks,
each switch port is either input or output, whereas in bidirectional networks, ports can
be used for either input or output. By superimposing two unidirectional networks, one
can build a full-duplex bidirectional network that can route massages in both directions
simultaneously. A bidirectional switch can be used in forward mode, in backward mode,
or in turnaround mode, where in the latter mode, connections are made between
terminals on the same side of the switch.

7. Explain Data Parallel SIMD Machine

Ans: Data-parallel SIMD machines occupy a special place in the history of parallel
processing. The first supercomputer ever built was a SIMD machine. Some of the most
cost-effective parallel computers in existence are of the SIMD variety. You can now buy
a SIMD array processor attachment for your personal computer that gives you
supercomputer-level performance on some problems for a workstation price. However,
because SIMD machines are often built from custom components, they have suffered a
few setbacks in recent years. In this chapter, after reviewing some of the reasons for
these setbacks and evaluating SIMD prospects in the future, we review several example
SIMD machines, from the pioneering ILLIAC IV, through early massively parallel
processors (Goodyear MPP and DAP), to more recent general-purpose machines (TMC
CM-2 and MasPar MP-2).
Figure below depicts the functional view of an associative memory. There are m
memory cells that store data words, each of which has one or more tag bits for use as
markers. The control unit broadcasts data and commands to all cells. A typical search
instruction has a comparand and a mask word as its parameters. The mask specifies
which bits or fields within the cells are to be searched and the comparand provides the
bit values of interest.
Each cell has comparison logic built in and stores the result of its comparison in
the response or tag store. The tag bits can be included in the search criteria, thus
allowing composite searches to be programmed (e.g., searching only among the cells
that responded or failed to respond to a previous search instruction). Such searches,
along with the capability to read, write, multiwrite (write a value into all cells that have
a particular tag bit set), or perform global tag operations (e.g., detecting the presence
or absence of responders or their multiplicity), allow search operations such as the
following to be effectively programmed:

• Exact-match search: locating data based on partial knowledge of contents

• Inexact-match searches: finding numerically or logically proximate values
• Membership searches: identifying all members of a particular set
• Relational searches: determining values that are less than, less or equal, and so forth
• Interval searches: marking items that are between limits or not between limits
• Extrema searches: min- or max-finding, next higher, next lower
• Rank-based selection: selecting kth or k largest/smallest elements
• Ordered retrieval: repeated max- or min-finding with elimination (sorting)

Additionally, arithmetic operations, such as computing the global sum or adding two
fields in a subset of AM cells, can be effectively programmed using bit-serial algorithms.
Associative processors (APs) are AMs that have been augmented with more flexible
processing logic. From an architectural standpoint, APs can be divided into four classes
:

1. Fully parallel (word-parallel, bit-parallel) APs have comparison logic associated with
each bit of stored data. In simple exact-match searches, the logic associated with each
bit generates a local match or mismatch signal. These local signals are then combined
to produce the cell match or mismatch result. In more complicated searches, the bit
logic typically receives partial search results from a neighboring bit position and
generates partial results to be passed on to the next bit position.

2. Bit-serial (word-parallel, bit-serial) systems process an entire bit-slice of data,

containing 1 bit of every word, simultaneously, but go through multiple bits of the
search field sequentially. Bit-serial systems have been dominant in practice because
they allow the most cost-effective implementations using low-cost, high-density, off-
the-shelf RAM chips.

3. Word-serial (word-serial, bit parallel) APs based on electronic circulating memories

represent the hardware counterparts of programmed linear search. Even though
several such systems were built in the 1960s, they do not appear to be cost-effective
with today’s technology.

4. Block-oriented (block-parallel, word-serial, bit/byte-serial) systems represent a

compromise between bit-serial and word-serial systems in an effort to make large
systems practically realizable. Some block-oriented AP systems are based on
augmenting the read/write logic associated with each head of a head-per-track disk so
that it can search the track contents as they pass underneath. Such a mechanism can
act as a filter between the database and a fast sequential computer or as a special-
purpose database search engine.

8. Explain Processor and memory technologies

Ans: Commodity microprocessors are improving in performance at an astonishing rate.

Over the past two decades, microprocessor clock rates have improved by a factor of
100, from a few megahertz to hundreds of megahertz. Gigahertz processors are not far
off. In the same time frame, memory chip capacity has gone up by a factor of 104 , from
16 Kb to 256 Mb. Gigabit memory chips are now beginning to appear. Along with
speed, the functionality of microprocessors has also improved drastically. This is a
direct result of the larger number of transistors that can be accommodated on one chip.
In the past 20 years, the number of transistors on a microprocessor chip has grown by a
factor of 10³; from tens of thousands (Intel 8086) to a few tens of millions (Intel Pentium
Pro). Older microprocessors contained an ALU for integer arithmetic within the basic
CPU chip and a floating-point coprocessor on a separate chip, but increasing VLSI
circuit density has led to the trend of integrating both units on a single microchip, while
still leaving enough room for large on-chip memories (typically used for an instruction
cache, a data cache, and a Level-2 cache). As an example of modern microprocessors,
we briefly describe a member of Intel’s Pentium family of microprocessors: the Intel
Pentium Pro, also known as Intel P6 (Fig. below).

The primary design goal for the Intel P6 was to achieve the highest possible
performance, while keeping the external appearances compatible with the Pentium
and using the same mass production technology. The Intel P6 has a 32-bit architecture,
internally using a 64-bit data bus, 36-bit addresses, and an 86-bit floating-point format.
In the terminology of modern microprocessors, P6 is superscalar and superpipelined:
superscalar because it can execute multiple independent instructions concurrently in its
many functional units; superpipelined because its instruction execution pipeline with
14+ stages is very deep. The Intel P6 is capable of glueless multiprocessing with up to
four processors, operates at 150–200 MHz, and has 21M
transistors, roughly one-fourth of which are for the CPU and the rest for the on-chip
cache memory. Because high performance in the Intel P6 is gained by out-of-order and
speculative instruction execution, a key component in the design is a reservation
station that is essentially a hardware-level scheduler of micro-operations.

Each instruction is converted to one or more micro-operations which are then

executed in arbitrary order whenever their required operands are available. The result
of a micro-operation is sent to both the reservation station and a special unit called the
reorder buffer. The latter unit is responsible for making sure that program execution
remains consistent by committing the results of micro-operations. There is a full
crossbar between all five ports of the reservation station so that any returning result
can be forwarded directly to any other unit for the next clock cycle. Fetching, decoding,
and setting up the components of an instruction in the reservation station takes eight
clock cycles and is performed as an eight-stage pipelined operation. The retirement
process, mentioned above, takes three clock cycles and is also pipelined. Sandwiched
between the above two pipelines is a variable-length pipeline for instruction execution.
For this middle part of instruction execution, the reservation station needs two cycles
to ascertain that the operands are available and to schedule the micro-operation on an
appropriate unit.

The operation itself takes one cycle for register-to-register integer add and
longer for more complex functions. The multiplicity of functional units with different
latencies is why out-of-order and speculative execution (e.g., branch prediction) are
crucial to high performance. With a great deal of functionality plus on-chip memory
already available, a natural question relates to the way in which additional transistors
might be utilized. One alternative is to build multiple processors on the same chip.
Custom microchips housing several simple processors have long been used in the
design of (massively) parallel computers. Commercially available SIMD parallel systems
of the late 1980s already contained tens of bit-serial processors on each chip and more
recent products offer hundreds of such processors per chip (thousands on one PC
board). Microchips containing multiple general-purpose processors and associated
memory constitute a plausible way of utilizing the higher densities that are becoming
available to us. From past experience with parallel computers requiring custom chips, it
appears that custom chip development for one or a few parallel computers will not be
economically viable. Instead, off-the-shelf components will likely become available as
standard building blocks for parallel systems. No matter how many processors we can
put on one chip, the demand for greater performance, created by novel applications or
larger-scale versions of existing ones, will sustain the need for integrating multiple
chips into systems with even higher levels of parallelism.

With tens to tens of thousands of processors afforded by billion-transistor chips,

small-scale parallel systems utilizing powerful general-purpose processors, as well as
multi-million-processor massively parallel systems, will become not only realizable but
also quite cost-effective. Fortunately, the issues involved in the design of single-chip
multiprocessors and massively parallel systems, as well as their use in synthesizing
larger parallel systems, are no different from the current problems facing parallel
computer designers. Given that interconnects have already become the limiting factor.
regardless of the number of processors on a chip, we need to rely on multilevel
hierarchical or recursive architectures.

9. Explain coarse grain and fine grain:

Ans: Depending on the complexity of processing nodes, three categories of message-

passing MIMD computers can be distinguished:
1. Coarse-grain parallelism. Processing nodes are complete (perhaps large, multi-board)
computers that work on sizable sub-problems, such as complete programs or tasks, and
communicate or synchronize with each other rather infrequently.

2. Medium-grain parallelism. Processing nodes might be based on standard micros that

execute smaller chunks of the application program (e.g., subtasks, processes, threads)
and that communicate or synchronize with greater frequency.

3. Fine-grain parallelism. Processing nodes might be standard micros or custom-built

processing elements (perhaps with multiple PEs fitting on one chip) that execute small
pieces of the application and need constant communication or synchronization.

10. What is Shared Medium network?

Ans: A local area network (LAN) that shares its total available bandwidth with all
transmitting stations. Ethernet is the primary example, although Token Ring and FDDI
networks were earlier examples. In the past, when shared media LANs ran out of
capacity to serve their users effectively, they were upgraded by replacing the network
hubs with switches.

Comparison of Memory Management of Windows With LINUX
83% (6)
Comparison of Memory Management of Windows With LINUX
10 pages
Taxonomy of Parallel Computing Paradigms
No ratings yet
Taxonomy of Parallel Computing Paradigms
9 pages
COME6102 Chapter 1 Introduction 2 of 2
No ratings yet
COME6102 Chapter 1 Introduction 2 of 2
8 pages
Theory of Computation Programming Locality of Reference Computer Storage
No ratings yet
Theory of Computation Programming Locality of Reference Computer Storage
4 pages
Cis620 15 00
No ratings yet
Cis620 15 00
36 pages
Large Computer Systems and Pipelining: Homework
No ratings yet
Large Computer Systems and Pipelining: Homework
11 pages
15CS72 ACA Module1 Chapter1FinalCopy
No ratings yet
15CS72 ACA Module1 Chapter1FinalCopy
25 pages
Distributed Shared Memory: Introduction & Thisis
No ratings yet
Distributed Shared Memory: Introduction & Thisis
22 pages
Distributed Shared Memory: Writes To A Logical Shared Address by One Thread Are Visible To Reads of The Other Threads
No ratings yet
Distributed Shared Memory: Writes To A Logical Shared Address by One Thread Are Visible To Reads of The Other Threads
41 pages
Advanced Computer Architecture Slides
No ratings yet
Advanced Computer Architecture Slides
105 pages
Aca Notes
No ratings yet
Aca Notes
63 pages
Unit V 2
No ratings yet
Unit V 2
16 pages
2. Parallel Computers
No ratings yet
2. Parallel Computers
39 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
Lecture 3 PDC
No ratings yet
Lecture 3 PDC
21 pages
21ec52 Co Arm m2 Notes
50% (2)
21ec52 Co Arm m2 Notes
34 pages
Design and implementation of the memory management unit (MMU) of a 32-bit micro-controller; split cache of 32/32kByte; 4-way set-associative, LFU, Write-Through / Write-Allocate. With an ARM926EJ-S with 1GHz clock speed of unlimited main memory with a clock of 10MHz.
No ratings yet
Design and implementation of the memory management unit (MMU) of a 32-bit micro-controller; split cache of 32/32kByte; 4-way set-associative, LFU, Write-Through / Write-Allocate. With an ARM926EJ-S with 1GHz clock speed of unlimited main memory with a clock of 10MHz.
19 pages
MODULE 4
No ratings yet
MODULE 4
21 pages
Coa Unit5
No ratings yet
Coa Unit5
11 pages
Coa Unit-3,4 Notes
No ratings yet
Coa Unit-3,4 Notes
17 pages
Device Management
No ratings yet
Device Management
33 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
23 pages
Embedded System Architecture
No ratings yet
Embedded System Architecture
10 pages
Embedded Systems Group Assignment
No ratings yet
Embedded Systems Group Assignment
7 pages
SIMD Vs MIMD With Memory Models
No ratings yet
SIMD Vs MIMD With Memory Models
7 pages
Term Paper: Computer Organization and Architecure (Cse211)
No ratings yet
Term Paper: Computer Organization and Architecure (Cse211)
7 pages
Memory (Unit-3) : 6.1 Main Memory, Secondary Memory and Backup Memory
No ratings yet
Memory (Unit-3) : 6.1 Main Memory, Secondary Memory and Backup Memory
8 pages
CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
28 pages
Group Reports
No ratings yet
Group Reports
7 pages
CO Module 3 Memory
No ratings yet
CO Module 3 Memory
30 pages
COA Unit V B
No ratings yet
COA Unit V B
5 pages
PP_CS(451)
No ratings yet
PP_CS(451)
89 pages
Parallel Computing Lecture # 6: Parallel Computer Memory Architectures
No ratings yet
Parallel Computing Lecture # 6: Parallel Computer Memory Architectures
16 pages
HGH Perf Arm
No ratings yet
HGH Perf Arm
4 pages
Unit IVCAO Notes
No ratings yet
Unit IVCAO Notes
67 pages
I) Ti Asc:: Yogesh Chauhan (12-1-5-004) Write A Case Study On
No ratings yet
I) Ti Asc:: Yogesh Chauhan (12-1-5-004) Write A Case Study On
3 pages
WSNOPSYS Article 159 Final 159
No ratings yet
WSNOPSYS Article 159 Final 159
6 pages
Multicore Framework: An Api For Programming Heterogeneous Multicore Processors
No ratings yet
Multicore Framework: An Api For Programming Heterogeneous Multicore Processors
7 pages
Introduction of Cache Memory
No ratings yet
Introduction of Cache Memory
24 pages
Module-5
No ratings yet
Module-5
24 pages
Crash Recovery in A Distributed Data Storage System
No ratings yet
Crash Recovery in A Distributed Data Storage System
28 pages
09 Communication models of Parallel platforms
No ratings yet
09 Communication models of Parallel platforms
25 pages
Parallel Computig Assignment
No ratings yet
Parallel Computig Assignment
15 pages
2 Basic Computer Architecture
No ratings yet
2 Basic Computer Architecture
21 pages
Lec 5 SharedArch PDF
No ratings yet
Lec 5 SharedArch PDF
16 pages
Aos
No ratings yet
Aos
27 pages
OS-Chapter 4 - Device Management
100% (2)
OS-Chapter 4 - Device Management
8 pages
Document 90
No ratings yet
Document 90
11 pages
A Fast, Memory Efficient, Wait-Free Multi-Producers
No ratings yet
A Fast, Memory Efficient, Wait-Free Multi-Producers
18 pages
Assign
No ratings yet
Assign
12 pages
Definition of UMA: Basis For Comparison UMA Numa
No ratings yet
Definition of UMA: Basis For Comparison UMA Numa
10 pages
Minimizes Data Hazards
No ratings yet
Minimizes Data Hazards
10 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
COA Complete Notes
No ratings yet
COA Complete Notes
60 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
IT Assignment 1 PDF
No ratings yet
IT Assignment 1 PDF
14 pages
COA group Assigment
No ratings yet
COA group Assigment
11 pages
CCNA Interview Questions You'll Most Likely Be Asked
From Everand
CCNA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
Adaptive Median Filtering
No ratings yet
Adaptive Median Filtering
15 pages
Overview of The Fourth-Generation Mobile Communication System
No ratings yet
Overview of The Fourth-Generation Mobile Communication System
6 pages
GB Masterys BC 8 12 Catalogue 2012
No ratings yet
GB Masterys BC 8 12 Catalogue 2012
2 pages
Financial Performance of BSNL A Case Study of Visakhapatnam Ssa Chapter - 1
No ratings yet
Financial Performance of BSNL A Case Study of Visakhapatnam Ssa Chapter - 1
104 pages
PT2060-20 SEIS Seismic Module User Manual
100% (1)
PT2060-20 SEIS Seismic Module User Manual
70 pages
B650M-HM.2+
No ratings yet
B650M-HM.2+
56 pages
Eng Boffin 750
No ratings yet
Eng Boffin 750
86 pages
Lab10 Veriloga Adc Tutorial
No ratings yet
Lab10 Veriloga Adc Tutorial
10 pages
LabVIEW PDF
No ratings yet
LabVIEW PDF
2 pages
EEE 105 - Lec 8
No ratings yet
EEE 105 - Lec 8
7 pages
DMG48480F021 01WN DataSheet
No ratings yet
DMG48480F021 01WN DataSheet
16 pages
Questions Networking Ccna
No ratings yet
Questions Networking Ccna
4 pages
Dokumen - Tips CSN Container
No ratings yet
Dokumen - Tips CSN Container
31 pages
KWorld! M210 Manual English V1.1
No ratings yet
KWorld! M210 Manual English V1.1
38 pages
Product Catalogue
No ratings yet
Product Catalogue
40 pages
Design and Characterization of 50 KW Solid-State RF Amplifier
No ratings yet
Design and Characterization of 50 KW Solid-State RF Amplifier
10 pages
Nec Ir Protocol
No ratings yet
Nec Ir Protocol
28 pages
Manual Do Produto
No ratings yet
Manual Do Produto
26 pages
Mobile Signal Jammer
No ratings yet
Mobile Signal Jammer
4 pages
Non-Gaussian Distribution of SRAM Read Current and Design Impact To Low Power Memory Using Voltage Acceleration Method
No ratings yet
Non-Gaussian Distribution of SRAM Read Current and Design Impact To Low Power Memory Using Voltage Acceleration Method
2 pages
Datasheet - SU800 M.2 2280 - EN - 202003
No ratings yet
Datasheet - SU800 M.2 2280 - EN - 202003
2 pages
Hardware in The Loop Simulator
No ratings yet
Hardware in The Loop Simulator
40 pages
RC522 Datasheet
No ratings yet
RC522 Datasheet
94 pages
DPR, Lulu Mall, Lucknow
No ratings yet
DPR, Lulu Mall, Lucknow
8 pages
Inverter
100% (1)
Inverter
231 pages
JBL Hdi Series: Premium Home Loudspeaker Systems
No ratings yet
JBL Hdi Series: Premium Home Loudspeaker Systems
13 pages
Bcr1am 12
No ratings yet
Bcr1am 12
6 pages
Open Modbus TCP English
No ratings yet
Open Modbus TCP English
62 pages
1506 PI FSE 737 En-Folleto
No ratings yet
1506 PI FSE 737 En-Folleto
2 pages
MG TF Electrical Circuit Diagrams
No ratings yet
MG TF Electrical Circuit Diagrams
59 pages

PPC Unit 5 Question Bank and Answers

Uploaded by

PPC Unit 5 Question Bank and Answers

Uploaded by

1. What is CC-NUMA machine?

2. Describe the architecture of vector parallel Cray Y-MP machine.

2. Vector floating-point operations are performed by separate function units for

4. The add/subtract and multiply operations needed in address computations are

3. Explain various memory models of MIMD machine.

4. Explain Min-based butterfly network with the help of diagram.

Ans: Shared-memory implementations vary greatly in the hardware architecture that

6. Explain Interconnection network for Message Passing Scheme.

Interconnection networks for message passing are of three basic types:

1. Shared-medium networks. Only one of the units linked to a shared-medium network

7. Explain Data Parallel SIMD Machine

• Exact-match search: locating data based on partial knowledge of contents

2. Bit-serial (word-parallel, bit-serial) systems process an entire bit-slice of data,

3. Word-serial (word-serial, bit parallel) APs based on electronic circulating memories

4. Block-oriented (block-parallel, word-serial, bit/byte-serial) systems represent a

8. Explain Processor and memory technologies

Ans: Commodity microprocessors are improving in performance at an astonishing rate.

Each instruction is converted to one or more micro-operations which are then

With tens to tens of thousands of processors afforded by billion-transistor chips,

9. Explain coarse grain and fine grain:

Ans: Depending on the complexity of processing nodes, three categories of message-

2. Medium-grain parallelism. Processing nodes might be based on standard micros that

3. Fine-grain parallelism. Processing nodes might be standard micros or custom-built

10. What is Shared Medium network?

You might also like