0% found this document useful (0 votes)
4 views

An_interconnect_architecture_for_networking_systems_on_chips

Uploaded by

SRIRAAM VS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

An_interconnect_architecture_for_networking_systems_on_chips

Uploaded by

SRIRAAM VS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

AN INTERCONNECT ARCHITECTURE

FOR NETWORKING
SYSTEMS ON CHIPS
NETWORK PROCESSOR SYSTEMS ON CHIPS MEET THE SPEED AND FLEXIBILITY
REQUIREMENTS OF NEXT-GENERATION INTERNET ROUTERS. THE OCTAGON ON-

CHIP COMMUNICATION ARCHITECTURE, WITH ITS COST, PERFORMANCE, AND

SCALABILITY ADVANTAGES, SUPPORTS THESE NETWORK PROCESSOR SOCS.

To meet the demands of ever-increas- gramming flexibility. These shortcomings of


ing Internet traffic, the next generation of Inter- traditional RISC and ASIC designs mean that
net backbone routers must deliver ultrahigh designers must develop new high-speed net-
performance over an optical infrastructure. At work processors that permit flexible program-
the current Internet traffic growth rate, network mability and work at OC-768 speed.
service providers will likely deploy OC-768 At OC-768 (40 Gbps), IP packet arrival rate
routers in the foreseeable future. At the same could reach approximately 114 x 106 packets
time, as Internet and application service per second (assuming 44 bytes per packet). To
Faraydon Karim providers attempt to provide more diverse and ensure that the worst-case time to process a
differentiated services, routers will have to take packet does not exceed the packet arrival rate
Anh Nguyen on new tasks. In addition to routing and pack- and thus violate SLAs, packet-processing time
et forwarding, routers will likely perform pack- should be at most 9 ns per packet. To accom-
STMicroelectronics et classification, distinguishing packets and modate this requirement, a network processor
grouping them according to their requirements; must perform approximately 500 instructions
buffer management, determining buffer alloca- on each arriving packet to enable packet for-
Sujit Dey tion and admission control for packets; and warding and classification on packet flows.
packet scheduling, determining how to sequence Hence, an OC-768 network processor must
University of California, packets to meet service level agreements (SLA).1 process 57 billion instructions per second, a
Traditionally, routers have used general-pur- performance level a multiprocessor system-on-
San Diego pose reduced-instruction-set computer (RISC) a-chip (SOC) architecture can provide.
processors or application-specific ICs (ASICs). Octagon is a novel on-chip communication
Although general-purpose, processor-based architecture that can meet the performance
router architectures provide the flexibility to requirements of network processor SOCs.
upgrade to new router tasks, they will not sat- Octagon’s cost, performance, and scalability
isfy the growing speed requirements for new, advantages make it suitable for the aggressive
complex, packet-processing tasks. On the other on-chip communication demands of not only
hand, ASIC-based router implementations can networking SOCs, but also SOCs in several
provide the speed but not the required pro- other domains.

36 0272-1732/02/$17.00  2002 IEEE

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:59:41 UTC from IEEE Xplore. Restrictions apply
High-performance communications Interface Alliance’s efforts
Consider the on-chip communication have focused on bus interface 0
requirements typical network processor appli- standards.
cations impose. Using T.V. Lakshman and D. Although bus-based on- 7 1
Stiliadis’ packet classification algorithm with chip communication might
an estimated 10,000 classification rules and be suitable for many applica-
16-bit on-chip memory width,2 we must per- tions, it clearly cannot satisfy
form 625 memory accesses per packet arrival, an OC-768 network proces- 6 2
or 71.3 × 109 memory accesses per second (in sor’s very demanding on-chip
the worst case). Clearly, this necessitates the communication needs. For
use of multiple memory components, and an high-performance computing
on-chip communication architecture that systems, interconnect archi- 5 3
enables highly concurrent, high-speed com- tectures based on crossbars, or
4
munication between the multiple processor crossbars mixed with buses,
and memory components. deliver the ultrahigh-perfor-
Recent studies have demonstrated the sig- mance communication need- Figure 1. Basic Octagon configuration
nificant role an on-chip communication archi- ed among components.11 includes eight nodes and 12 bidirectional
tecture plays in determining a SOC’s overall Many switching architec- links.
performance.3 Several techniques let us design tures in high-performance
and synthesize on-chip communication to sat- routers also use crossbars.1 An
isfy components’ interface and communica- on-chip crossbar can satisfy the on-chip com-
tion needs in an application-specific system.4–6 munication needs of an OC-768 network
However, because one of a network processor’s processor SOC. Theoretically, crossbar per-
primary goals is to efficiently execute multiple formance (in terms of throughput or delay) is
applications (including evolving networking high enough to permit development of effi-
applications), synthesizing an application-spe- cient network processing tasks.12 In reality,
cific interconnect architecture for a network crossbar implementation costs are high: Cross-
processor SOC will not work. bars require many on-chip wires and relays to
Rather than synthesize a custom on-chip minimize clock skew across the chip. In addi-
interconnect architecture for a given applica- tion, crossbars do not scale well as the num-
tion, K. Lahiri and colleagues propose to opti- ber of nodes to be connected increases.
mally map a system’s communication Although crossbar-based interconnects might
requirements to a given communication archi- be justified in high-performance computing
tecture.7 In other work, they describe a tech- systems and routers, they might not be the best
nique that allows reconfiguration of the economic choice for lower-cost and higher-
selected communication architecture’s proto- volume network processor SOCs.
cols according to the application’s changing
communication demands.8 Proposed com- Octagon
munication mapping and reconfiguration The Octagon on-chip architecture is sim-
techniques provide up to an order of magni- pler to implement than a crossbar yet has
tude improvement in system performance. much higher throughput than either shared
Taken together, mapping and reconfiguration buses or traditional crossbars. Unlike cross-
techniques show promise for efficiently map- bars, Octagon’s implementation complexity
ping multiple applications to the same inter- increases linearly with the number of nodes—
connect fabric.7,8 For these techniques to be processor or memory components—that the
successful, however, developers must select the network must connect.
appropriate on-chip communication archi-
tecture. Architecture
Despite recent advances in the analysis and As Figure 1 shows, a basic Octagon unit
design of high-performance on-chip commu- consists of eight nodes and 12 bidirectional
nication architectures, commercial SOCs links.
commonly use simple bus-based topologies The Octagon architecture has several desir-
and protocols.9,10 Even the Virtual Socket able properties:

SEPTEMBER–OCTOBER 2002 37
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:59:41 UTC from IEEE Xplore. Restrictions apply
NETWORK SOC COMMUNICATIONS

0 1 2 3
0 1 2 3

7 6 5 4 7 6 5 4

(a) (b)

Figure 2. Octagon (a) and crossbar (b) physical-layout schematic examples. Octagon consists
of 12 horizontal and 12 vertical 32-bit tracks with each horizontal track upper-bounded by 8
mm, and each vertical track upper-bounded by 0.156 micron (13 x 12 micron). The crossbar
has eight horizontal and 32 vertical 32-bit tracks, with the horizontal tracks upper-bounded by
8 mm as in the Octagon, and the vertical tracks upper-bounded by 0.108 micron (9 x 12

• two-hop communication between any Packet routing


pair of nodes; We can code Octagon node addresses into
• higher aggregate throughput than a shared a three-bit field and route an Octagon packet
bus or crossbar interconnect under certain as follows. We prepend a three-bit tag to each
implementation conditions; packet. Each node compares the tag (Pack-
• simple, shortest-path routing algorithm; et_addr) to its own address (Node_addr) to
and determine the next action. The node com-
• less wiring than a crossbar interconnect. putes the relative address of a packet as

Octagon operates in packet- or circuit- Rel_addr = Packet_addr − Node


switched mode. An Octagon packet is data addr (modulo 8)
that must be transferred from the destination
Octagon node to the source Octagon node At each node on the Octagon, packet rout-
as a result of a communication request by the ing is a function of Rel_addr:
source node. An Octagon packet can be fixed
or variable length. In packet-switched mode, • Rel_addr = 0, process at node
the network nodes buffer packets at inter- • Rel_addr = 1 or 2, route clockwise
mediate nodes if there is contention at the • Rel_addr = 6 or 7, route counterclock-
egress link. wise
In circuit-switched mode, a network • Route across otherwise
arbiter allocates the entire path between
source and destination nodes of a communi- Consequently, a predetermined, simple rout-
cating node pair for a number of clock cycles. ing scheme for each network packet permits at
Nonoverlapping communication paths can most two hops to separate any two nodes.
occur concurrently—that is, the arbiter per-
mits spatial reuse. In this mode, system per- Implementation cost
formance is a function of the chosen Figure 2 illustrates the physical layout of
connection schedule. The question is, then, the Octagon and crossbar interconnect archi-
given the set of pending communication tectures. In our network processor, each Octa-
requests, how should the arbiter schedule gon node consists of a processor-memory pair
connections to optimize throughput (or some with an estimated size of 2 mm × 2 mm. Let
other metric)? We have developed a simple us assume that the minimum wire spacing is
connection scheduler, called the best-fit algo- 0.2 µm, and the width of a 32-bit link is
rithm (described later), to enable Octagon’s 12µm (including individual wire width, spac-
circuit-switched mode. ing, and shielding). As Figure 2a shows, the

38 IEEE MICRO
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:59:41 UTC from IEEE Xplore. Restrictions apply
Octagon architecture consists of 12 horizon- age amount of service demand arriving with-
tal and 12 vertical 32-bit tracks. Each hori- in one time unit. The aggregated arrival rate
zontal track is upper-bounded by 8 mm, the is λtot = Sij λ ij. Total utilization ρtotal = λ tot /µ.
total width of the four nodes. Each vertical We model the shared bus as a single server
track is upper-bounded by 0.156 micron (13 queue with Poisson arrivals and exponential
× 12 micron). As Figure 2b shows, the cross- service time. Consider the aggregated request
bar needs eight horizontal and 32 vertical 32- arrival process for the eight nodes. Because the
bit tracks. Although the horizontal tracks are individual arrival process is a Poisson distrib-
upper-bounded by 8 mm as in the Octagon, ution, the superposition is also a Poisson dis-
the vertical tracks are upper-bounded by tribution.12 In memory access applications,
0.108 micron (9 × 12 micron). Thus, wiring service rate corresponds to memory access
in Octagon is less complex than in a crossbar. speed. We ignore all propagation delays. The
server serves queued requests in first-in, first-
Octagon versus crossbars and buses out (FIFO) order. An arriving request’s
Consider a typical SOC communication. response time is the difference between the
Node processes continuously generate requests arrival time and time the bus completes the
for service; examples include memory read and service. Because the server is work conserv-
write requests. We classify requests according ing,13 the expected response time, denoted by
to their source-destination pair; thus, we denote EWbus, for an arbitrary request arriving at the
requests that originate from node i with desti- single server queue is identical to that for a
nation node j as type ij. Requests of type ij single-server multiple-queue system. The
arrive at the system following the Poisson expected response time of a shared bus mod-
process with parameter λij, which is the eled by a single-server queue is
requests’ arrival rate. Service time is the time a
destination node requires to complete all ρtot
EW bus = . 13
λ tot (1 − ρtot )
requested tasks if it processes the request in iso-
lation. We assume the communication links
between source and destination nodes are
locked until service is completed. For crossbar throughput, we use the model
presented by J. Chen and T. Stern.12 They
Service and response times showed that for a large switch (approximate-
The required service time is equivalent to ly 20 nodes) having a speedup factor of one,
packet size or link rate, where packet size is response time is
the number of bytes of data transfer that result λ tot E 2W s
from the communication request, and link EW xbar = EW s + ,
2(8 − λ tot EW s )
rate is the communication link’s data transfer
8λ tot 1
rate. We assume that the service time for where EW s = po + ,
request ij is exponentially distributed with ( 8 µ − λ )2 µ
tot

parameter µij, with 1/µij as the average service ρ


and p o = 1 − tot .
time per request—a reasonable assumption 8
because packet length varies and, in the most
general case, could range from one to thou- Chen and Stern also investigated the impact
sands of bytes. For example, if a node issues a of various arbitration policies on switch per-
read request for an 8-byte data block, then for formance. They found that arbitration poli-
a 1-MHz 8-bit wide data bus, the request ser- cies do not affect maximum throughput
vice time is 8 microseconds. because it is only a function of the average ser-
We can easily extend these assumptions to vice. Different arbitration policies result in
accommodate other discrete-timed distribu- different response times, however.
tion such as Bernoulli arrivals, and geometric
or deterministic service time. We consider a Communication request scheduling
symmetric system where λij = λ, ∀i,j and µij = We investigate the Octagon architecture per-
µ, ∀i j. The utilization of requests type ij is formance through simulation, using a simple
λij = λij/µij = λ/µ = λ. Utilization is the aver- request-response traffic model. That is, a source

SEPTEMBER–OCTOBER 2002 39
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:59:41 UTC from IEEE Xplore. Restrictions apply
NETWORK SOC COMMUNICATIONS

ticipating nodes sever the connection.


P0 M0 We associate a processor and memory mod-
0
ule with each node, as Figure 3 shows. Appli-
P7 P1
cations of this traffic model exist in routing
M7 M1 table lookup, Internet protocol packet classi-
7 1
fication, and other networking functions
where each node generates memory access
P6 P2
requests. If the requested memory location is
6 2 attached to the local node, then it generates no
M6 M2 Octagon communication requests. Otherwise,
it must forward the request to the appropriate
node via Octagon using the routing algorithm
5 3 previously presented. At the destination node,
P5 P3
the memory request consumes several clock
M5 4 M3 cycles before spawning a response, which it
returns to the originating node.
P4 M4
The best-fit scheduler is a connection-ori-
ented communication protocol that can simul-
Figure 3. High-level application model. Each node is associ- taneously accommodate nonoverlapping
ated with a processor and a memory module. connections. Each node maintains three queues
of outstanding requests, one for each egress
link. With respect to the overall network, this
Request global scheduler gives priority to the head-of-
generator line requests in arrival time order (lower arrival
time implies higher priority). At every service
Memory Processor completion time, the scheduler checks to see if
it can make new connections based on the pre-
viously described priority scheme. The sched-
uler sets up connections until it can
accommodate no more without violating the
nonoverlapping rule. Note that we only con-
sider head-of-line requests at each node. When
Scheduler
Arbiter a connection is torn down, the scheduler reac-
tivates to check if it can set up new connections.
Figure 4 is a detailed view of the node
model. In addition to the request generator,
processor, and memory, each node has three
ingress and three egress ports labeled left,
L L
Nonblocking across, or right, consistent with its associated
Ingress

Egress

switch neighbor. Logically, the node emulates a sim-


A A
ple 4 × 4 nonblocking switch (plus processor
R R and memory). The switch has neither input
nor output buffering. The central scheduler
Figure 4. Node model. Each node has a request generator, processor, mem- performs switch arbitration.
ory, and three ingress and three egress ports. Switch arbitration is through
the central scheduler. Performance results
Figure 5 compares the expected response
time of Octagon, a bus, and a crossbar. We fix
node generates a request to send to a destina- µ = 0.5 per clock cycle and vary λtot. The hor-
tion node. It eventually establishes a connection izontal axis represents ρtot = λtot/µ, and the ver-
for the communication. For each connection, tical axis represents expected response time in
the source node sends a request and receives a clock cycles. Recall that λtot is the system’s total
response. After the communications, the par- packet arrival rate; 1/µ, its average packet size;

40 IEEE MICRO
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:59:41 UTC from IEEE Xplore. Restrictions apply
and ρtot, average number of
70
packets the system can service Octagon
concurrently. Octagon has 60 best-fit
significantly higher maxi- scheduler

Latency (clock cycles)


50
mum throughput than both
Crossbar
the bus and crossbar. We 40
obtain similar results for
fixed-size packets. 30
As these results show, the Bus
20
bus saturates at ρtot = 1
because a single server (the 10
bus bandwidth) can provide 0
at most one service unit per 0.1 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1 11.0 12.0 13.0 14.0
time unit. For the crossbar,
Utilization, ρtot
we assume a single queue per
crossbar node—hence we can
model the crossbar as a sys- Figure 5. Throughput comparison of Octagon, a bus, and a crossbar for randomized packet
tem of eight queues sharing
eight servers. Contention and
head-of-line blocking mean 1.0
that the eight available servers
provide approximately four
work units per time unit.12
0.1
Therefore, crossbar saturation
Packet loss probability

occurs at ρtot ≅ 4. We could


have considered eight queues
0.01
per crossbar node, but the
implementation cost would
be prohibitive.
On the other hand, it 0.001
might be reasonable for each
Octagon node to have three
queues. Hence we model the 0.0001
Octagon architecture as a sys- 5 10 15 20 25 30 35
tem of 24 queues and 24 Egress queue size
servers (three egress queues
and three outgoing links per Figure 6. Packet loss probability versus egress queue size for a system using the Octagon
node). This means we incur architecture.
more cost per node for Octa-
gon than for the crossbar.
However, saturation for Octagon occurs at ρtot the average queue occupancy is not excessive.
≅ 12, which is significantly higher than for a For Octagon’s best-fit connection schedul-
crossbar, as Figure 5 shows. Note that the ing, a node (process) is not blocked if the
effective server utilization is about 50 percent scheduler cannot schedule its communication
(12), the same as for the crossbar. request immediately. Instead, the requesting
Some packet service approaches achieve high node queues the request in its egress queue.
throughput by compromising service latency. This strategy can improve system performance
That is, system efficiency and throughput and node utilization more than some com-
increase as the workload at each queue builds. munication protocols (especially most bus pro-
Figure 6 shows that at relatively high utiliza- tocols), which stall the requesting node until its
tion of ρtot = 12 and at 10-4 packet loss proba- request can be granted. However, each Octa-
bility, a system using an Octagon architecture gon node must have a queue large enough to
requires fewer than 50 packet buffers. Thus, avoid packet loss. Figure 6 shows the packet

SEPTEMBER–OCTOBER 2002 41
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:59:41 UTC from IEEE Xplore. Restrictions apply
NETWORK SOC COMMUNICATIONS

Horizontal links Vertical links Maximum


Nodes no./ max length no./ max length distance
1 (mm) (mm) (hops)

Octagon
8 12/8 12/0.156 2
a 15 24/8 24/0.156 4
b
y x
c 22 (shown) 36/8 36/0.156 6 (shown)
(b)

2 Horizontal links Vertical links Maximum


Nodes no. /max length no. /max length distance
(mm) (mm) (hops)

Crossbar
8 8/8 32/0.108 1

15 15/16 120/0.192 1

22 22/11 242/0.276 1
(c)
(a)

Figure 7. Scaling strategy 1. Bridge nodes (node y) connect adjacent Octagons (a) and perform hierarchical packet routing.
Member nodes attach to only one Octagon (node x). The tables give the maximum distance for Octagon (b) and crossbar (c)
networks of various sizes.

loss probability as the size of each node’s egress is its ability to scale linearly. Figure 7a shows
queue increases. Note that a queue size of 25 a scaling strategy that requires two different
results in a nominal packet loss of 0.1 percent, node types: bridge and member. As the name
while a queue size of 35 reduces the packet loss implies, bridge nodes connect adjacent
probability to 0.01 percent. If needed, a sys- Octagons and perform hierarchical packet
tem designer can enable a zero packet loss guar- routing (for example, node y in Figure 7a).
antee in Octagon by having the packet Member nodes attach to only one Octagon
scheduler refuse requests if the egress queue is (node x, for example). Consider a network
at full or near-full capacity, thereby stalling the consisting of eight interconnected Octagons.
requesting node (as many existing buses do). The Octagon address field of each packet is 6
bits wide: three high-order bits to identify the
Scalability local Octagon and three low-order bits to
The increasing performance demands of identify the node within the Octagon. Each
programmable network processors makes scal- bridge node performs static routing based on
ability an important factor in Octagon’s the entire field, and each member node per-
design. Next-generation network processors forms routing based only on the three low-
will likely have 16 or more processors and order bits.
many distributed memory components hold- Figure 7 also shows the estimated wiring cost
ing tables for Internet protocol lookup and as a function of increasing network size for the
classification. The need for interconnecting Octagon and crossbar architectures. To arrive
greater numbers of on-chip components in at these estimates, we assumed the SOC lay-
network processors and other SOCs will accel- out in Figure 2a for each octagon. We extend-
erate in the foreseeable future, increasing the ed the layout to the configuration of Figure 2b
need for scalable, on-chip communication for crossbars with more than eight nodes.
architectures. Octagon scales linearly and the crossbar does
not because in the crossbar, every node is wired
Strategy 1: Low wiring complexity to every other node. As the number of nodes
One of the Octagon architecture’s strengths N grows, the number of required wires is of

42 IEEE MICRO
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:59:41 UTC from IEEE Xplore. Restrictions apply
Maximum
Nodes Links
distance (hops)

8 12 2

64 (shown) 192 4 (shown)

512 2,304 6

(a) (b)

Figure 8. Scaling strategy 2. We extend the Octagon to the multidimensional space by linking corresponding nodes in adja-
cent Octagons according to the Octagon configuration (a). The table (b) indicates the increased wiring complexity (number of
links), and the greater maximum distance needed as the number of connected nodes increases.

order O(N2). In this Octagon scaling strategy, Strategy 2: High performance


on the other hand, each node requires either For systems in which high performance is
three or six wires to its neighbors, resulting in the dominant consideration, we propose a sec-
wiring complexity of O(cN). ond scaling strategy that performs better than
The tables in Figure 7 shows the maximum the first but has more complex wiring. In this
distance (the maximum number of hops strategy, we extend Octagon to multidimen-
between any two nodes) for networks of var- sional space. Figure 8a illustrates this scaling
ious sizes. The maximum distance increases strategy in a 64-node Octagon. We index each
linearly as Octagon grows, while the crossbar SOC node by the 2-tuple (i, j), i, j ∈ [0, 7].
has a constant maximum distance irrespective For each i = I, I ∈ [0, 7], we construct an Octa-
of the number of nodes. Although a higher gon using nodes {(I, j), j ∈ [0, 7]}, which
maximum distance can degrade performance, results in eight individual Octagon structures.
the performance results in Figure 5 indicate We then connect these Octagons to each other
that this might not always be the case: An 8- by linking corresponding i nodes according to
node Octagon with maximum distance 2 per- the Octagon configuration. That is, each node
forms better than an 8-node crossbar with (I, J) belongs to two Octagons: one consisting
maximum distance 1. of nodes {(I, j) j ∈ [0, 7]}, and the other con-
In this strategy, the maximum distance sisting of nodes {(i, J) i ∈ [0, 7]}. The table in
between nodes grows much more slowly, but Figure 8 indicates the increase in wiring com-
it does not remain constant as for the crossbar. plexity (number of links) and maximum dis-
This is fine for SOCs where low wire com- tance (number of hops) as the number of
plexity is the dominant consideration. How- connected nodes increases. The maximum dis-
ever, this characteristic might not suit systems tance between nodes scales much better under
where high throughput is the primary concern. strategy 2 than strategy 1. However, strategy
For example, consider a SOC with 15 nodes. 2’s better performance scalability comes at the
A bridge node connects Octagons 1 and 2. If cost of greater wiring complexity.
all network traffic is across networks, then a Figure 9 (next page) shows a natural scaling
bottleneck occurs at the bridge because it must approach for networks with fewer nodes, and
transmit all traffic from Octagon 1 over the therefore fewer Octagons to connect (each
three links a, b, and c. Therefore, it can con- node represents an Octagon). We scale the net-
currently transfer at most three packets in each work as follows. To connect two Octagons, we
direction, one measure of a communication construct a link between corresponding nodes;
architecture’s maximum throughput. this bidirectional link connects node i from

SEPTEMBER–OCTOBER 2002 43
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:59:41 UTC from IEEE Xplore. Restrictions apply
NETWORK SOC COMMUNICATIONS

O ur analysis shows that Octagon signifi-


cantly outperforms shared bus and
crossbar on-chip communication architec-
tures in terms of performance, implementa-
tion cost, and scalability. We are currently
investigating the use of Octagon to satisfy the
on-chip communication needs of other appli-
cation-specific multiprocessor SOCs. MICRO

Figure 9. Growing the network using scaling strategy 2. Acknowledgments


We acknowledge Naresh Soni for his
encouragement and support, and Razak Hos-
Octagon 1 to node i from Octagon 2. As the sain for his help on physical layout and imple-
number of Octagons to be connected increas- mentation issues.
es, we link corresponding nodes according to
the Octagon rule. That is, to connect eight References
Octagons, 12 bidirectional links are needed to 1. V.P. Kumar, T.V. Lakshman, and D. Stiliadis,
connect each node i of Octagons 0, 1, 2, …, “Beyond Best Effort: Router Architectures
7. For a network with more nodes, we start for the Differentiated Services of Tomor-
adding links in a new dimension. As Figure 9 row’s Internet,” IEEE Comm., vol. 36, no. 5,
shows, by increasing the network size we can May 1998, pp. 152-164.
maintain the maximum hop count at three 2. T.V. Lakshman and D. Stiliadis, “High-Speed
while increasing the number of nodes to 32. Policy-Based Packet Forwarding Using Effi-
These nodes represent a network of four cient Multidimensional Range Matching,”
Octagons with two hops to the corresponding Proc. ACM SIGCOMM, ACM Press, New
intra-Octagon node and one hop to the desti- York, 1998, pp. 191-202.
nation Octagon. 3. K. Lahiri, A. Raghunathan, and S. Dey, “Eval-
Figure 10 shows Octagon’s advantage over uation of the Traffic Performance Charac-
the crossbar architecture as the number of teristics of System-on-Chip Communication
nodes increases. Although the crossbar’s Architectures,” Proc. 14th Int’l Conf. VLSI
implementation cost (measured in number of Design, IEEE CS Press, Los Alamitos, Calif.,
links) increases prohibitively with increasing 2001, pp. 29-35.
nodes, the Octagon scaling strategies make 4. J.A. Rowson and A. Sangiovanni-Vincentel-
scaling feasible. li, “Interface-Based Design,” Proc. 34th Ann.

4,500
Strategy 1
4,000 Strategy 2 Crossbar
3,500
Number of links

3,000
2,500
2,000
1,500
1,000
500
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57
Number of nodes

Figure 10. Comparison of Octagon scaling strategies to a crossbar architecture: number of


nodes versus wiring complexity.

44 IEEE MICRO
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:59:41 UTC from IEEE Xplore. Restrictions apply
Design Automation Conf., ACM Press, New algorithm development. Nguyen has a PhD
York, 1997, pp. 178-183. in electrical engineering from the University
5. R.B. Ortega and G. Borriello, “Communica- of California, Los Angeles. He is a member of
tion Synthesis for Distributed Embedded the IEEE.
Systems,” Proc. Int’l Conf. Computer-Aided
Design (ICCAD 98), IEEE CS Press, Los Sujit Dey is a professor in the Electrical and
Alamitos, Calif., 1998, pp. 437-444. Computer Engineering Department at the
6. M. Gasteier and M. Glesner, “Bus-Based University of California, San Diego. His
Communication Synthesis on System Level,” research interests include configurable plat-
ACM Trans. Design Automation Electronic forms consisting of adaptive wireless proto-
Systems, vol. 4, no. 1, Jan. 1999, pp. 1-11. cols and algorithms, and deep-submicron
7. K. Lahiri, A. Raghunathan, and S. Dey, adaptive SOCs for next-generation wireless
“Communication Architecture Tuners: A appliances and network infrastructure devices.
Methodology for the Design of High-Perfor- Dey has a PhD in computer science from
mance Communication Architectures for Duke University. He is a member of the IEEE.
System-on-Chips,” Proc. 37th Design
Automation Conf., ACM Press, New York, Direct questions and comments to Faray-
2000, pp. 513-518. don Karim at STMicroelectronics, Advanced
8. K. Lahiri, A. Raghunathan, and S. Dey, “Effi- System Technology, San Diego, CA 92121;
cient Exploration of the SOC Communication [email protected].
Architecture Design Space,” Proc. Int’l Conf.
Computer-Aided Design (ICCAD 00), IEEE CS For further information on this or any other
Press, Los Alamitos, Calif., 2000, pp. 424-430. computing topic, visit our Digital Library at
9. Sonics Integration Architecture, www. http://computer.org/publications/dlib.
sonicsinc.com.
10. D. Flynn, “AMBA: Enabling Reusable On-
Chip Designs,” IEEE Micro, vol. 7, no. 4,

JOIN A
July-Aug. 1997, pp. 20-27.
11. A. Charlesworth, “The Sun Fireplane Inter-
connect,” IEEE Micro, vol. 22, no. 1, Jan.-
Feb. 2002, pp. 36-45.

THINK
12. J. Chen and T. Stern, “Throughput Analysis,
Optimal Buffer Allocation, and Traffic Imbal-
ance Study of a Generic Nonblocking Pack-
et Switch,” IEEE J. Selected Areas in

TANK
Comm., vol. 9, no. 3, Apr. 1991, pp. 439-449.
13. D. Gross and C. Harris, Fundamentals of
Queueing Theory, 3rd ed., John Wiley &
Sons, New York, 1998, pp. 297-300.

L
ooking for a community targeted to your
Faraydon Karim is an ST Fellow at STMicro- area of expertise? Computer Society
electronics’ Advanced System Technology, Technical Committees explore a variety
of computing niches and provide forums for
Advanced Computing Lab in La Jolla, Cali-
dialogue among peers. These groups influence
fornia. His research interests include comput- our standards development and offer leading
er and embedded system architecture. Karim conferences in their fields.
has a PhD in computer engineering from La
Salle University. He is a member of the IEEE.
Join a community that targets your discipline.
Anh Nguyen is a research engineer at the
STMicroelectronics Advanced Systems Tech- In our Technical Committees, you’re in good company.
nology, Advanced Computing Lab in La Jolla,
California. His research interests include per- computer.org/TCsignup/
formance analysis, resource allocation, and

SEPTEMBER–OCTOBER 2002 45
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:59:41 UTC from IEEE Xplore. Restrictions apply

You might also like