0% found this document useful (0 votes)
18 views

Unit V 2

The Cray Y-MP was a supercomputer sold by Cray Inc. in 1988 that used a vector processor design. It consisted of a small number of powerful vector processors that could each fetch operands, store values, and perform I/O simultaneously. Computation was divided among vector integer, floating point, and scalar integer units. The processors were connected via a multi-stage crossbar network to central memory. The DASH project at Stanford University aimed to build an experimental cache-coherent multiprocessor (CC-NUMA) with a two-level processor-to-memory interconnect. Within clusters of 4-16 processors, memory was accessed via a shared bus, while clusters were interconnected by a mesh network
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Unit V 2

The Cray Y-MP was a supercomputer sold by Cray Inc. in 1988 that used a vector processor design. It consisted of a small number of powerful vector processors that could each fetch operands, store values, and perform I/O simultaneously. Computation was divided among vector integer, floating point, and scalar integer units. The processors were connected via a multi-stage crossbar network to central memory. The DASH project at Stanford University aimed to build an experimental cache-coherent multiprocessor (CC-NUMA) with a two-level processor-to-memory interconnect. Within clusters of 4-16 processors, memory was accessed via a shared bus, while clusters were interconnected by a mesh network
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Vector

Parallel Cray
Y-MP
General Info
Cray Inc. - Cray Research , American supercomputer
manufacturer.
Cray Y-MP - supercomputer sold by it in 1988 as a
successor to its x-mp.
Vector Processor - a processor that executes one
instruction on a large number of data items with the
great deal of overlap.
Cray Y-MP
Consists of very small no.(upto 8)
very powerful vector processors.

Can be viewed as time-multiplexed


implementations of SIMD parallel
processing.

Classified as hybrid SIMD/MIMD


machines.
Cray Y-MP
- Each processor has 4 ports to access
central memory, with each port
delivering 128 bits/clock cycle
(4ns).
- Thus a CPU can fetch 2 operands(a
vector and a scalar), store 1 value
and perform I/O simultaneously.
The computation
section of CPU is
divided into 4
subsystems :
S
Vector Integer Unit ,
Floating point Unit &
Scalar Integer
Operations
- Performs vector integer operations
through separate functional units for
add/subtract, shift, logic and bit-
counting.
- Performs vector floating-point
operations through separate
functional units for
add/subtract,multiply and reciprocal
approximation.
- Scalar int. Operations performs all
As new data are being loaded into two registers
and emptied from a third one, other vector
registers can supply the operands and receive the
results of vector instructions.

Vector function units can be chained to allow the


next data-dependent vector computation to begin
before the current one has stored all of its results
in a vector register.

For example, a vector multiply–add operation


can be done by chaining of the floating-point
multiply and add units. This will cause the add
unit to begin its vector operation as soon as the
multiply unit has deposited its first result in a
vector register.
Processor to memory
interconnection of
Cray Y-MP
- A multi-staged crossbar network
built of 4X4 and 8X8 crossbar
switches and 1X8 demultiplexers.
- The network uses circuit switching
and ensures multiple access requests
from the same port are satisfied in
presentation order.
CC-Numa
Stanford
DASH
Introduction
Stanford University aimed at building an
experimental cache-coherent multiprocessor in their
directory architecture for shared memory(DASH)
project.
DASH can be classified as a cache-coherent
NUMA(CC-NUMA) architecture.
DASH has 2 level processor-to-memory
interconnection structure. Within a cluster
of 4-16 processors, memory is accessed
via shared bus.

Each processor in cluster has a :

1. A private instruction cache ( Write-


through policy)
2. A separate data cache( Write-
through policy)
3. A level-2 cache ( Write-back policy)
The clusters are inter-connected by a pair of
wormhole-routed 2-D mesh network :

1. A REQUEST mesh (which carries remote


memory access requests)
2. A REPLY mesh (which routes data and
acknowledgments back to the requesting
cluster)

DATA ACCESS LOCALITY leads to better


performance.

Inside a cluster, cache coherence is enforced by


snoopy protocol, while across cluster, it is
maintained by write-invalidate directory
protocol.

Unit of data sharing - block or cache-line


clusters are modified in minor ways and
augmented with two special boards that hold the
directory and network interface subsystems.

The processor board modifications consist of the


addition of a bus retry signal and provision of
masking capability for the bus arbiter. The retry
signal is used when a request involves service
from a remote node. The masking capability
allows the directory to hold off a processor’s retry
(via the bus arbiter) until the requested remote
access has been completed.

Thus, effectively, a split-transaction bus protocol


is used for performing remote accesses. The
added boards contain memory for the directory
entries, buffers, and a piece of the global
interconnection network.
Thank You!

You might also like