0% found this document useful (0 votes)
8 views

NOC BOOK 6

The document presents a fault-tolerant adaptive routing algorithm for Network-on-Chip (NoC) designed to improve performance in many-core chip multiprocessors by utilizing global congestion information and a runtime fault-tolerant mechanism. The proposed algorithm ensures deadlock-free operation and effectively handles permanent link failures through dynamic routing table updates and traffic split ratios based on measured delays. Verification and analysis of the algorithm were conducted using the BookSim simulator.

Uploaded by

Sharmila Devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

NOC BOOK 6

The document presents a fault-tolerant adaptive routing algorithm for Network-on-Chip (NoC) designed to improve performance in many-core chip multiprocessors by utilizing global congestion information and a runtime fault-tolerant mechanism. The proposed algorithm ensures deadlock-free operation and effectively handles permanent link failures through dynamic routing table updates and traffic split ratios based on measured delays. Verification and analysis of the algorithm were conducted using the BookSim simulator.

Uploaded by

Sharmila Devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

EECS 578 Final Project Report 1

Fault-Tolerant Adaptive Routing Algorithm for


Network-on-Chip
Tan Bie, Yang Jiao, Zixin Wang and Rong xu

congestion information for each node in the network and adjust


Abstract—For many-core chip multiprocessors (CMPs), path selection according to the downstream link delay. A better
Network-on-chip (NoC) provide high performance on chip network balance and shorter packet latency can be achieved by
communication and great scalability while the choice of routing applying our routing algorithm. Besides, we also add a runtime
algorithm plays a vital role in the performance of on-chip
interconnection networks. In general, adaptive routing utilize on-chip fault-tolerant mechanism to handle permanent link
information about the network state to select among alternative failures in the network by deploying routing tables and logic
path options and offer better performance in term of the latency that are updated upon each fault occurrence. Moreover, our
and throughout. However, recently published adaptive routing routing algorithm also ensure a deadlock-free configuration by
algorithm don’t equip with a well-designed fault tolerant using escape Virtual Channels. Finally, our project is verified
mechanism to handle potential link failures in the network, which and analyzed on BookSim simulator.
induced by rapidly incensement of the circuit density as well as the
extreme transistor scaling.
Thus in our project, we propose and implement an adaptive II. PROPOSED ROUTING ALGORITHM
routing algorithm using global congestion information and a Modern Network-on-chip routing algorithm could be
runtime fault tolerant algorithm to solve multiple permanent link classified into deterministic routing and adaptive routing.
errors in the network. Escaped virtual channels and Up/Down
Different from deterministic routing where packets from a
restriction are applied for deadlock free.
source to a destination follow the same and fixed path, the
Index Terms— NoC, Adaptive routing algorithm, global adaptive routing utilize information about network state to
congestion, fault tolerance, deadlock-free select among alternative path options. By utilizing these
information, a good selection function is able to spread the
I. INTRODUCTION traffic and make network load more balanced.
Network-on-Chip (NoC) has become the most significant In our project, we focus on adaptive routing using only
communication fabric for many-core chip multiprocessors minimal paths in a 2D mesh topology because of its simplicity
(CMPs). Also, the routing algorithms used in these networks and lower latency. The congestion information generation,
play a vital role in determining processor performance. fault-tolerant reconfiguration and the implementation of
Meanwhile, On-chip circuits are vulnerable to errors due to deadlock avoidance are discussed in details below.
transistor geometric shrinking and performance improvement, 2.1 Global-Congestion Adaptive Routing
leading to serious reliability issues. Traditional adaptive routing algorithms relied on local or
Considering the problem above, some good fault-tolerant regional congestion state to adjust the path selection function.
routing algorithms have been proposed while they didn’t quite However, such methods still face a difficult challenge of
consider the loading balance of network [6], leading to longer balancing remote and local congestion state and may not always
packet latencies and potential performance loss. While for accurately reflect the load on the actual paths a packet can take
routing algorithms like regional Congestion Awareness (RCA) to its destination.
[3] and Destination-based adaptive routing (DAR) [4], they Thus we implement a Global-Congestion adaptive routing so
gain good improvements on network loading balance and that every node in the network measures and maintains per-
packet latencies by applying congestion information. However destination congestion state in the form of average delays to all
they don’t equipped with a fault tolerance mechanism. Upon other destination nodes through the possible output ports which
any link failure occurring, such routing algorithms may induce are allowed by the minimal routing. Besides, the measured
huge performance overhead and even lead the whole system to delays propagate from the destination to the source to update
error states. Thus based on this situation, we hope to design a every node through permitted paths and thus more accurately
routing algorithm which will not only be fault tolerant, but also estimating the congestion along paths. Then for every node, a
consider the network congestion state to improve the routing set of traffic spilt ratios in which traffic for a specific destination
performance by adaptive path selection. is calculated based on combing the measured delays propagated
In our project, we proposed and implemented a congestion- from downstream routers and its local delay. The selection
aware adaptive routing algorithm based on the network spatial function of our routing algorithm use these ratios to decide
information while deadlock-free and fault-tolerant features are which path to follow for a specific destination when a packet
also ensured. The proposed routing algorithm collect the global arrive at this node
EECS 578 Final Project Report 2

Here we assume that: congestion information stored in nodes are always up-to-date.
1. A router only decides the distribution of traffic to its
next-hop routers.
2. The ratios are per-destination basis, i.e., for a given
node, all arrived packets destined for the same node use
same ratio while packets using the same output ports but
going to different destinations will be distributed
independently by different ratio.
3. Minimal routing is used in our algorithm, thus for every (a) 1st step (b) 2nd step
node, there are at most two ports to a destination and the
sum of port ratios for a destination equals to one and if
there is only one permitted output port, all traffic is
forced to be routed on that port.
A. Distributed delay measurement and propagation
Next, we illustrate the measurement and the propagation of
the global delay information using an example in a 4x4 mesh
topology in Figure 1 (a). Assume all nodes in the network need (c) 3rd step (d) 4th step
to measure the delay to node 9. Figure 1 Example of the Distributed delay propagation
Firstly, each node periodically estimates the local waiting B. Adaption of traffic split ratio
time in the input queues for all five output ports. For every The purpose of the traffic split ratio is to use the global
output port, this time is considered as the local queuing delay congestion information, which are measured and propagated to
l[p] through port p and is approximated by counting the number each node, to uniformly balance the traffic load in the whole
of flits in the input buffers which have already requested a network. For each node in the network, the adaption process of
virtual channel to the next-hop router. the per destination traffic split ratios will be triggered upon the
Then at the 1st clock cycle, delay from node 9 to itself is just delay information from valid downstream routers is received by
the queuing delay on the ejection port of node 9. 𝐴𝑣𝑔9 [9] stands the current node. The same adaption algorithm will be repeated
for the average delay from node 0 to itself and equal to: for all nodes in the network.
𝐴𝑣𝑔9 [9] = 𝑙[𝐸𝑗] (1) Suppose at node i, there are two output ports 𝑝𝑥 and 𝑝𝑦
This delay information 𝐴𝑣𝑔9 [9] is then propagate to all connected to the destination j along paths which are permitted
neighbors of node 9 at 2nd clock cycle. Node 8, 10, 5 and 13 by the minimal routing. As we discussed at part A, A[x][j] and
receive 𝐴𝑣𝑔9 [9] through their east (E), west (W), south (S) and A[y][j], which are the delay to node j through ports 𝑝𝑥 and 𝑝𝑥
north (N) ports respectively, as shown in Figure 1 (b). Each of respectively, could be estimated by the current node. Here, we
these nodes estimate their delay to node 9 by adding 𝐴𝑣𝑔9 [9] assume that the delay from x port is higher that from y port,
with their locally measured delays on the port leading to node which means that
9. For instances, at node 10, only west port could go to node 9 𝐴[𝑥][𝑗] > 𝐴[𝑦][𝑗]
and the average delay from node 10 to node 9 is given as: Then we use these information to update our traffic spilt ratio
with the below equations.
𝐴𝑣𝑔10 [9] = 𝑙[𝑊] + 𝐴𝑣𝑔9 [9] (2) 𝐴[𝑥][𝑗]−𝐴[𝑦][𝑗]
Upon all one-hop routers finished the measurements of path ∆= min⁡(0.25 ∗ ( ) , 𝑊[𝑥][𝑗]) (6)
𝐴[𝑥][𝑗]
delay, at 3rd clock cycle all two-hop routers 12, 14, 11, 4, 6 and 𝑊[𝑥][𝑗]𝑛𝑒𝑤 = 𝑊[𝑥][𝑗] − ∆; 𝑊[𝑥][𝑗]𝑛𝑒𝑤 = 𝑊[𝑥][𝑗] + ∆ (7)
1 receive updates for the delay to node 9. For instances, node 6 The basic idea of the above equation is to increase the traffic
receives updates about the average delay to node 9 from nodes split ratio of the port with lower downstream delay and decrease
5 and 10 connected to the north and west port respectively. Then the ratio of the ports with higher delay. To avoid ratios
node 6 could estimate its average delay by computing a becoming negative, we chose the minimal value between the
weighted mean of the delays through the north and west ports, ratio difference and current higher ratio.
the weights given by the traffic split ratio along these ports at 2.2 Runtime Fault tolerant mechanism
node 6.
The mechanism to handle with soft/permanent faults in the
𝐴[𝑁][9] = 𝐴𝑣𝑔10 [9] + 𝑙[𝑁] (3)
network during the runtime is necessary for modern routing
𝐴[𝑊][9] = 𝐴𝑣𝑔5 [9] + 𝑙[𝑊] (4) algorithm to deal with potential hard errors in the lifetime. And
𝐴𝑣𝑔6 [9] = 𝑊[𝑁] ∗ 𝐴𝑣𝑔10 [9] + 𝑊[𝑊]𝐴𝑣𝑔5 [9] (5) in our project, we propose and implement a runtime mechanism
Here, 𝐴[𝑁][9] and 𝐴[𝑊][9] represent the delay through to cope with the potential permanent link failures.
north and west ports respectively and W[N] and W[W] stand Since the broken links always mean a topology change, the
for the traffic split ratio at node 6 to destination node 9. original routing table may lead to error state and reconfiguration
Carrying on in this manner, after some clock cycles all nodes is necessary to ensure the complete reachability for all surviving
in the network are able to measure their delay to node 9 through nodes. In general, there are two families based on their method
candidate output ports permitted by the minimal routing. This of the reconfiguration. One is deploying the routing tables and
process will repeat periodically to ensure that the global logic that are updated upon each fault occurrence in the runtime.
EECS 578 Final Project Report 3

The second solution based on the offline software to complete (a) 1st step (b) 2nd step
the reconfiguration upon any fault link detected and then
communicate with surviving topology with a central node. Our
solution is built based on the first family while combing with
the global congestion information forwarding. And we assume
that when a link failure occurs, the node connected with that
link will detect this fault and stop the new packet/flit injection
until the reconfiguration is finished. The routing table
reconfiguration works as follows:
Firstly, if a link error is detected, every node in the network (c) 3rd step (d) 4th step
works as a root node, starting to broadcast a reconfiguration flag Figure 2 Example of the reconfiguration process
to all other nodes in the network only through the healthy links
hop-by-hop. Meanwhile the delay measurement and III. DEADLOCK RECOVERY MECHANISM
propagation process as we discussed in 2.1 is also initialed at We use the escape virtual channel to realize the deadlock-
this node so the delay information Avg[i] are also transmitted. free feature in GCA. The key idea for it is to provide an escape
Then, for each node received the reconfiguration flag: path (escape virtual channel) for every deadlock packet. The
 Stall the router pipeline. If receiving a reconfiguration routing algorithm for the escape path should be deadlock-free.
flag, that node should stop the pipeline and freeze the Thus, when a packet is checked to be stuck in deadlock, we can
virtual channel allocation & switch allocation until the send it on to the escape path and then the packet can use this
reconfiguration complete for all nodes. deadlock-free path to its destination.
 Update the routing table. For ports receiving the flag,
calculate and store the new traffic split ratio W[x][i] A. How escape virtual channel works:
based on the propagated delay information from The approach to dealing with deadlock is not to avoid it, but
downstream nodes. For ports not receiving the flag, rather to recover from it. There are two key phases to any
invalid current split ratio and set to zero. Then deadlock recovery algorithm: detection and recovery [1]. And
calculate the average delay from current node to the in our algorithm, we’d like to separate it into three stages:
root node. This step provides the safe paths as well as Detection, Filtering and Recovery.
the global congestion information for the current node. 1. Detection:
This step is illustrated in Table 1. In the detection phase, the network must be able to detect
 Flag forwarding. Nodes send the reconfiguration flag if itself has reached a deadlock situation. Determining
to its neighbors only through those ports which didn’t exactly whether the network is in deadlock requires finding
receive a flag or connect to a faulty link. a cycle in resource wait-for graph. It’s difficult and costly, so
For nodes detecting a permanent link error, repeat the above we use a conservative detection mechanism - timeout
process to obtain an updated routing table with safe paths from counters. Each input port of the router will be equipped with
other nodes to this faulty node as well as the network congestion a timeout counter. There are only two cases that we will reset
information, which is used to select these safe paths adaptively. the counter: (1) when the input port receives a flit, (2) when
This reconfiguration algorithm makes use of some ideas of we detect the deadlock and allocate an escape virtual channel
our global congestion propagation process, both transmitting
for that packet. Except for the two cases above, we just
information from one destination to every possible source. Thus
increase the counter by 1 per step. When the counter gets to
if any link error occurs, the reconfiguration process co-work
with distributed delay propagation to obtain fully reachability the specified deadlock upper bound, a filtering stage will be
to all surviving as well as the global congestion states. Figure trigger.
2 illustrates an example while one link break in a 4x4 mesh 2. Filtering:
topology network. In this phase, the network needs to figure out whether the
recovery requests are real deadlock or just false positive. The
Table 1 Traffic split ratio update based on the flag signal and delay way we do it is to check the virtual channel’s state. As we
information during reconfigurations know there are four states for the virtual channel: idle,
routing, virtual channel allocation (vc_alloc) and active. If
Destination (i) West North East North there is any virtual channel in idle state or there is a packet
Ratio (W) 0.6 0.55 0.4 0 0 0.45 0 just ready for ejection, we think the deadlock is not true (false
Flag received Yes No Yes No positive), otherwise, we will allocate escape virtual channel
for those virtual channels in vc_alloc states (It means if all
the virtual channels are in their active states, we will not
allocate any escape virtual channel for this input either).
3. Recovery:
In this phase, we have selected those input virtual channels
whose inner packets (head flits) have been waiting for an
available virtual channel for a long time (>deadlock timeout).
We apply a priority selector here to help us determine which
EECS 578 Final Project Report 4

virtual channel should be the first to obtain the escape virtual In order to implement this algorithm, we need firstly add a
channel. After allocating the escape virtual channel, we will bit (named root_arrived) in flit which indicates whether the flit
clear the timeout counter on that input port. Using FSM to has passed through the root node.
describe the process in Figure 3. The reason why this algorithm is deadlock free is that we are
based on GCA table which will always give us a closer-to-dest
direction even when there are permanent faults in NoC. So
when we use GCA to send flit from source to root and then from
root to destination, we actually disallow those paths that include
traversing a down link followed by an up link. In this way, the
algorithm implemented is deadlock-free.

IV. HARDWARE IMPLEMENTATION

Figure 3 Deadlock recovery mechanism: Mainly has three phases:


Detection (D), Filtering (F) and Recovery (R).

B. Up/down deadlock free routing algorithm:


We choose Up/down routing algorithm as our deadlock free
algorithm applying on escape virtual channel. Since our 8x8
mesh network has several permanent faults on it, we cannot use
some simple deadlock free algorithms like x-y dimension order Figure 5 Delay Measurement and Propagation Logic
algorithm for escape path. To take fully advantage of the DAR
We implemented logics needed by the router in Verilog HDL
table generated for GCA algorithm, we finally choose the
to measure the storage overhead of our routing algorithm. In the
up/down algorithm.
DAR [1], they achieve 4.5% storage overhead over baseline
The paper [2] introduces the up/down routing, a deadlock-
router. In our design, we prove that the fault-tolerant feature
free algorithm that can operate on any irregular topology.
cost is also reasonable, which leads to 6.1% overall storage
Up/down requires each link to be assigned a direction: up or
overhead compared to baseline router. Here is some major
down. It then disallows those paths that include traversing a
logics we added to the router.
down link followed by an up link. In this way, all cyclic
dependencies are broken. In this paper, we take fully advantage A. Port Pre-Selection
of our GCA algorithm to generate a pseudo-up/down algorithm Because our router is designed for 2D mesh network,
which can work correctly but may lose a little performance. minimal adaptive routing is used to pre-select outputs ports. A
Instead of choosing the root node when coming across a fault, packet arriving at an input port can have a choice of at most two
we simply fix our root at a certain node at the very beginning. output ports which maps to one of the four quadrants. As one
And then based on this root node we can use the GCA algorithm hot port representation used, the Port Pre-Selection part
directly to realize the deadlock-free algorithm. Now, let’s see introduced 10 bits storage for each destination including the
how this pseudo-up/down routing algorithm works. current node itself.
B. Delay Measurement and Propagation Logic
Seen from Figure 5, delay measurement contains two parts:
local queuing delay count and average delay calculation. Since
we are mainly interested in the relative delays to destination
node through the candidate output ports, the local queuing delay
for output port p is approximated by the number of flits in the
input buffers that have already acquired a VC at the next-hop
router connected to port p.
Since port pre-selection logic has selected at most two output
ports for each destination node, Average delay from
Figure 4 Up/down routing algorithm based on GCA: ‘Cur’ represents
corresponding downstream node will be used to compute delay
the ID of the current router. After the packet has passed through the to destination node through pre-selected ports. Then traffic split
root node (root arrived bit has been set 1), we will just use GCA table ratio will be used to compute weighted average delay from
to find the head flit’s next direction to its destination. If it is on the root current node to destination node. Then computed average delay
node, we will set the root arrived bit to be 1 and use GCA table to find can be propagated to upstream node. Local queuing delay and
an output port from root to its destination. If it hasn’t gone through the average delay both have 6 bits, and every router have to store 2
root node (root arrived bit is 0), we will set this packet’s destination to local delay and one average delay for each destination node.
be the root node and use GCA table to find the next output port In order to reduce storage overhead, we only store one 5-bit
traffic split ratio for each destination node since a packet at an
EECS 578 Final Project Report 5

input buffer can have a choice of at most two output ports which network is conducted in comparison with some extant routing
maps to one of the four quadrants, and split ratios are algorithms in BookSim, as well as a comparison in saturation
normalized such that they always add up to one. throughput. For evaluation of performance on faulty network,
an increasing number of fault is inserted into network with the
C. Adapt Split Ratio
random fault generator at a fixed injection rate, thus fault
The computations involved with adaptation of split ratios are tolerance of the proposed routing algorithm is tested.
given as follows:
A. Evaluation of GCA algorithm in non-faulty network
Dimension-order, min-adaptive and xy_yx-adaptive are used
for a comparison with the proposed routing algorithm in non-
() faulty network, as they are the typical deterministic/adaptive
To simplify the implementation of these computations in routing algorithms on mesh network. For four different traffic
hardware we always assume λ = 0.25 which reduces the patterns – uniform, shuffle, bitrev and transpose, average packet
multiplication to a shift operation. The division is also avoided latency is measured for the three extant algorithms as well as
by extracting only the most significant bit of L[ph][j] that is set the proposed GCA routing algorithm. The result is shown in
and ignoring the remaining less significant bits. This reduces Figure 7.
division to a shift operation.
Uniform Shuffle
500 500
GCA GCA
Dimension Order Dimension Order
400 400

Average latency
Average latency
xy-yx Adaptive xy-yx Adaptive
Min Adaptive Min Adaptive
300 300

200 200

100 100

0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5
Injection Rate Injection Rate
Transpose Bit Reverse
500 GCA 500
Figure 6 Logic for Adaption of Weights Dimension Order
GCA
Dimension Order
400 xy-yx Adaptive
Average latency

400 xy-yx Adaptive

Average latency
Min Adaptive Min Adaptive
D. Reconfiguration flag forwarding unit 300 300

For proposed fault tolerant algorithm, additional hardware 200 200

unit is needed to receive and forward the flag signal where the 100 100

calculation and the updates of traffic split ratio could be done 0


0 0.1 0.2 0.3 0.4 0.5
0
0 0.1 0.2 0.3 0.4 0.5
Injection Rate
using the hardware sources introduced in IV (B) and (C). Injection Rate
Figure 7 Average Latency vs. Injection Rate
The reconfiguration flag forwarding unit is consisted of two
major parts: an arbiter combination logic and a 𝑁 2 size buffer Seen from Figure 7, average latency will increase dramatically
for an N x N mesh network. Thus the area overhead for this unit at a certain point for each routing algorithm and each traffic
is quite small compared to the modern router architecture. pattern. Such certain point on injection rate is named as
The arbiter identifies the id (indicate the destination router) saturation throughput. GCA algorithm performs best in shuffle
of routers which have link errors and send signals to trigger the and transpose traffic pattern but worst in uniform, the reason is
split ratios updates of the corresponding routing table. Besides, that GCA is aimed at keeping the traffic balanced in mesh
the arbiter selects the ports to forward the reconfiguration flag network, for shuffle and transpose traffic, GCA algorithm is
and send to the output buffer. always the most efficient one among these algorithms , but for
The buffer is used to indicate whether the reconfiguration has uniform, GCA loses some performance as a trade-off.
been done or not for a specific root node. Upon the router Saturation throughput can be estimated from Figure 7 by
receive a reconfiguration flag of a specific root node at the first measuring the inject rate at which average latency is triple of
time, the corresponding buffer set high and the reconfiguration the zero-injection latency. And a comparison in saturation
will not be trigger again if the router receives that flag signal throughput is illustrated in Figure 8.
again in the future. This mechanism avoids redundant Seen from Figure 8, the saturation throughput for GCA is
reconfigurations as well as the potential livelock to some preferable among all routing algorithms except the uniform
degree. traffic pattern, but in reality, the uniform scenario is rare, in all
for network without fault, GCA is a proper choice for routing
algorithm.
V. EVALUATION
We evaluated our GCA algorithm with a cycle-accurate NoC B. Simulation of GCA algorithm in faulty network
simulator, BookSim. An 8x8 mesh network is utilized for
evaluation with several different traffic patterns considered. Actually we have several choices on the fault-tolerance
And for the evaluation of faults within network, a random fault solution, the simplest choice is random-walk. After
generator is added to the original BookSim for generating implementing and evaluation of random-walk routing
random faulty network without isolating any node by algorithm on BookSim, the poor efficiency and deadlock
specifying the number of fault. For this evaluation part, an problem prevent us from research deeper on such topic.
evaluation on performance of the GCA algorithm in non-faulty
EECS 578 Final Project Report 6

Table 2 Comparison between proposed work and published work

GCA Dimension Order xy_yx Adaptive Minimal Adaptive This [3] [4] [5] [7]
0.5 work

0.4 Algorithm Adaptive Adaptive Deterministic Adaptive Adaptive

0.3 Fault Yes No No No Yes


tolerant?
0.2
Saturation Throughput for different traffic pattern
0.1
Uniform 0.36 0.35 0.36 0.32 0.34
0
Unifrom Shuffle Transpose Bit Reverse
Shuffle 0.42 - - - -
Figure 8 Comparison of saturation throughput
Transpose 0.37 0.33 0.21 0.27 -
For the Up/Down routing algorithm we have discussed in III.
B, deadlock-free as it is, the comparatively long latency also Bit-comp 0.22 0.21 0.22 0.16 -
prevents us from taking it as a main routing algorithm.
Alternatively, due to its metric in deadlock-free, it can be used
complementarily as the routing algorithm for escape virtual VI. CONCLUSION
channel as deadlock situation is rare but indeed exists in In this paper, the proposed routing algorithm – GCA (Global-
network. Congestion Adaptive) – is designed based on Destination-based
For faulty network, random error generator is utilized for Adaptive Routing (DAR) with relatively hardware overhead of
simulation. Injection rate for this part is fixed to be 0.2. As the 6.1%. Besides, a deadlock recovery mechanism using escape
number of fault within network increases from 1 to 10 in 8x8 virtual channel which is equipped with a deadlock free up/down
mesh network, by measuring the average latency for specific algorithm. Comparing our results with some other algorithms
number of fault 10 times, average latency for each scenario can like improved random walk and original up/down algorithm,
be obtained as below. Note that four different traffic patterns our algorithm has a better for different traffic patterns. In the
are also considered for this part. future, some research can be taken into explore the possibility
in improving the fault-tolerance performance by bringing in
Average delay against number of faults
40 software-based off-line reconfiguration mechanism.
Uniform
Shuffle
38 Tranpose
Bitrev References
Average latency

36
[1] R. Ramanujam and B. Lin, "Destination-based congestion awareness
34 for adaptive routing in 2D mesh networks", ACM Transactions on
Design Automation of Electronic Systems, vol. 18, no. 4, pp. 1-27,
32
2013.
30
[2] K. Aisopos, A. DeOrio, L. Peh, and V. Bertacco, “ARIADNE:
28 Agnostic Reconfiguration In A Disconnected Network Environment”,
0 2 4 6 8 10 International Conference on Parallel Architectures and Compilation
Number of faults Techniques (PACT), Galveston Island, TX, October 2011.
Figure 9 Average delay vs. number of fault
[3] P. Gratz, B. Grot and S. Keckler, "Regional Congestion Awareness for
Seen from Figure 9, due to the effective reconfiguration Load Balance in Networks-on-Chip", HPCA, 2008.
stage in dealing with faults, the average latency increases
slowly as the number of fault increases. [4] D. Seo, A. Ali, W. Lim, N. Rafique and M. Thottethodi, "Near-
Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh
Networks", ACM SIGARCH Computer Architecture News, vol. 33,
C. Comparison between proposed work and some published no. 2, pp. 432-443, 2005.
routing algorithms
[5] A. Singh, W. Dally, B. Towles and A. Gupta, "Globally Adaptive
At last, a table is presented for a comparison between the Load-Balanced Routing on Tori",IEEE Comput. Arch. Lett., vol. 3, no.
proposed GCA routing algorithm and some published routing 1, pp. 2-2, 2004.
algorithms. [6] S. Jovanovic, C. Tanougast, S. Weber, and C. Bobda, “A new
As we can see from Table 2, the proposed routing algorithm deadlock-free fault-tolerant routing algorithm for NoC
works well even in comparison with some published work. interconnections”, in Proc. Int. Conf. Field Program. Logic Appl.,
Aug.–Sep. 2009, pp. 326–331.

[7] R. Parikh, V. Bertacco, “ForEVeR: A complementary formal and


runtime verification approach to correct NoC functionality”. ACM
Trans. Embedded Comput. Syst.13(3s): 104:1-104:30 (2014)

You might also like