Performance Analysis of Congestion-Aware Q-Routing Algorithm For Network On Chip
Performance Analysis of Congestion-Aware Q-Routing Algorithm For Network On Chip
Corresponding Author:
Smriti Srivastava
Department of Computer Science and Engineering, R. V. College of Engineering
Bengaluru, India
Email: [email protected]
1. INTRODUCTION
As technology continues to advance, communication networks require optimal performance to
enable faster data transmission. Communication on the chip is an emerging technology where different
modules integrate on a single chip, known as a system-on-chip (SoC) [1]. The network on chip (NoC) [2]
technology has become a cost-friendly approach for data transfer in SoCs [3]. Overcoming the latter’s
challenges of reliability and scalability along with providing modularity [4], the NoC is simply a systematic
architecture designed especially for communication among subsystems integrated onto a chip via a network.
The interconnection network features a topology with many nodes positioned in a specific pattern, where
every node consists of a functional component–the processing element (PE) and a router. These routing
components are connected via bidirectional links, enabling efficient communication within a network by
determining the appropriate path to transmit packet data [1], [5].
In an NoC architecture, the transmission of data packets to distant nodes raises concerns regarding
latency, mainly because of the massive hop counts in an overloaded network. To mitigate this issue, a
wireless-NoC (WiNoC) architecture has been introduced, which helps to minimize the hopping distance by
incorporating wireless capabilities in specific nodes and utilizing them for data transmission [6]. But the
congestion issues remained in the busy nodes while dealing with dense traffic scenarios. To minimize the
congestion problem, the network must dynamically learn the traffic information. The technological
advancement helps to provide an optimal solution using machine learning (ML), as it enables the network to
analyze the current state and improve decision-making capabilities. Research on the integration of ML
capabilities with network communication has been ongoing for several years. The ultimate goal is to transmit
data packets into least congested route, ensuring fast delivery to their destination and ML approach is a
valuable one for achieving this objective. In this work, a “congestion-aware Q-routing” (CAQR) strategy is
developed which employs reinforcement learning (RL) concepts [7]. A value-based RL strategy utilized Q-
table for managing Q-values. The Q-routing is a routing algorithm which is adaptive and congestion-aware
that uses Q-learning to estimate Q-values to observe the outcomes of specific actions. The algorithm is
intended to monitor both local and global congestion status, and utilizes Q-values from a generated Q-table to
direct a node towards optimal route for forwarding data packets. The section 3.2 provides a detailed
discussion of this approach. The algorithm has been developed using Gem5, a NoC simulator [8]. The
proposed framework strives to enhance network abilities by minimizing the average packet latency (APL) as
well as minimizing energy utilization.
This research is an extension to a previously published paper [9] where the Q-routing proposed in
the study was analyzed with the synthetic traffic and this paper extends the work by analyzing the same with
benchmark traffic using the SPEC CPU2006 suite. The section 2 presents the existing work. Section 3
discusses a thorough technique that comprises the ideas of RL and Q-learning and how they relate to the
suggested Q-routing. Section 4 presents the outcomes of the experiment and an analysis of same. A brief
discussion of possible future enhancements is presented in section 5 along with concluding the work
presented so far.
2. RELATED WORK
DeepNR, an adaptive routing technique was proposed in [10] which was based on deep RL. It
incorporates routing directions as actions, network information as state representations, and queueing delay
as the reward function. With the Gem5 simulator, DeepNR was tested for artificial and real-time traffic
scenarios. Additionally, the proposed work is tested against the benchmarks from SPEC CPU 2006.
Based on the information about traffic and congestion at the NoC, Reza and Le [11] proposes a similar
routing technique which employed three RL algorithms at runtime. Power-saving methods such as power
gating and the concept of dynamic voltage and frequency scaling (DVFS) were used in NoCs by Zheng and
Louri [12]. During runtime, an artificial neural network (ANN) based RL approach is used to predict the
traffic status of NoC. The use of deep RL (DRL) in router less NoC architecture [13] has recently been
reported as well as the optimization of energy usage and power consumption [14].
Farahnakian et al. [15], a method called “Clustered Quality” (CQ) routing which clusters the
network and provides potential solution to minimize the overhead issue. The inter and intra cluster has been
used by Q-routing and XY routing respectively for packet transmission. Each cluster maintain a CQ-table
with a design that is exactly like a Q-table to enable this strategy. Assuming the identical traffic scenarios for
every cluster, potentially resulting in unfavorable outcomes. Additionally, the energy usage and latency of
the proposed framework is comparatively high for WiNoC.
Hu and Marculescu [16] proposed a method where the routing policy called DyAD is designed by
combining deterministic routing methods with the adaptive routing methods. In case of network congestion,
the policy works adaptively otherwise deterministically. This shows an improvement in performance as
compared to a completely adaptive routing policy. Wu et al. [17] used a method called the contention-aware
input selection (CAIS) which selects an input channel among many options that are under contention for
same output channel. This method showed improved routing efficiency. It makes its selection by observing
the number of requests for an input channel and the one with higher contention levels than the others is
chosen thereby removing possible congestion in future.
Ebrahimi et al. [18] presented an agent-based NoC (ANoC) structure that estimates the congested
areas by getting the global congestion information sent across the network. The network is parted into
divisions called clusters and each such obtained division is a cluster agent which is responsible for
communicating the status of its local congestion with the neighboring cluster and circulate the information. A
method called congestion aware selection (CAS) is designed and using the global and local information, the
packets are routed efficiently.
Chen et al. [19] present ML algorithms that make use of ANN concepts. The designed algorithms
can be applied to solve various wireless networking challenges. It has an overview of the basic architectures
for each type of ANN and the tutorial summarizes the specific wireless problems which can be used for
future work.
Nilsson et al. [20] proposed a memory less switch design which decides how the packets are
emitted. If there is a congestion observed the packets are deflected in a non-ideal path. A novel approach
Performance analysis of congestion–aware Q-routing algorithm for network on chip (Smriti Srivastava)
800 ISSN: 2252-8938
called proximity congestion awareness (PCA) was designed so as to keep the information of the neighboring
switches in the current switch. These are called the stress values which indicate the load level in that switch.
Hence, the one with the least stress value among the neighboring switches is the path with least congestion.
Farahnakian et al. [21] employed an NoC simulator based on Omnet++ and evaluated a routing method that
is CAQR approach. It works by estimating the current traffic state in the network to mitigate congestion.
However, in the absence of congestion, the latency is relatively high in comparison to conventional routing
strategies. High efficiency was demonstrated by Majer et al. [22], through the dynamic selection of packet
routing policies using an RL technique. By selecting an optimal routing technique based on various network
states has been implemented for the corresponding network state. Reza [23], showcased the effectiveness of
deep Q-learning (DQL) in enabling a single agent to maintain a comprehensive record of Q-value vectors for
every router action within the network, thereby eliminating the need for individual Q-tables at each node and
minimizing the associated overhead.
Deb et al. [24], presented adaptive routing techniques that are both cost-effective and capable of
forwarding packets to distant nodes through specialized channels constructed using a technology called the
transmission line (TL) technology. The inclusion of additional TLs on the chip decreases the network
diameter, by reducing the APL. The objective listed for the architectures, secured bank treasury receipt
(SBTR) and electronic (e-SBTR), is to minimize the number of intermediate hops and thereby reduce packet
latency. The effectiveness of these techniques is evaluated by utilizing benchmark mixes from paralax of one
arc second (PARSEC) and SPEC CPU 2006 and the findings reveal that the architecture e-SBTR
outperformed the current express virtual channel method, as it attains less hop count and reduced packet
latency.
Ahmad et al. [25] introduced a novel framework that transmits congestion data within the data
packet. The approach is executed on a field-programmable gate array (FPGA)-based mesh NoC. Compared
to current congestion-aware routing (CAR) strategy, the proposed approach minimizes latency, maximizes
throughput, and requires less bandwidth for exchanging the congestion data among routers. Rad et al. [26] a
detailed summary of the congestion control (CC) strategies currently used in WiNoCs. The identified
strategies are categorized into six different categories, which encompass CAR algorithms, Media access
control (MAC) protocols, hardware resources-based CC, rate-based CC, task-migration using mapping and
CA architectures. Objective of this study is to emphasize different traits and the limitations of CC strategies
using a fresh perspective, that can help fellow researchers in developing effective schemes. Arun et al. [27]
proposed an efficient model using 2D Mesh NoCs. Results from experiments show that this strategy reduces
the total link traversals necessary to achieve multicast communication and thereby improves the average
multicast transaction latency.
3.1. Q-learning
A reinforcement learning agent doesn’t depend on training but learns by performing set of actions
and observing their results. If the result is good, it gets a better positive reward. A penalty is given otherwise.
This way an optimal action, i.e., one with the maximum reward, is chosen. The Q-learning employs value-
based RL strategy. The Q-table keeps track of the current state space, available actions, next state space
estimated using the previous two and the Q-value. Selecting an action during the beginning of learning
happens randomly–called the exploration. At the end of learning, the selection happens based on the prepared
Q-table – called the exploitation. Initially, the table entries for each state s, action a are initialized to zero. An
action is selected based on epsilon-greedy strategy where the value of the epsilon refers to the probability to
explore or exploit. An immediate reward r is received and the new state called s’ is observed. The existing Q-
value denoted by Q(s,a) is updated with Q’(s,a) in the table based on the values of reward r and Q(s’,a) as
shown in (1) where γ is the discount rate and α denotes the learning rate. The process continues iteratively,
updating the Q-value at each step, until the learning has stopped.
Q-routing is designed based on the concept of Q-learning. The Q-table is built keeping a network of
nodes as the environment. The Q-values helps the data packet to find the optimal neighbour. Suppose a node
x needs to send a packet to destination node d via the neighbouring node y, then the Q-value Qx(y,d) needs to
be kept updated iteratively. This value here depends on three factors, viz., the queuing delay qy when a packet
spends time in node y’s queue, the transmission delay (δ) for time taken to travel from x to y and the time
taken for the packet to reach ‘d’ from ‘y’. Thus, the q-value Qx(y,d) gets updated with Q’x(y,d) using the
aforementioned factors as shown in (2) [9]. The neighbouring node with the minimum q-value will be the
optimal node to send a packet through.
3.2. Q-table
Consider an example of a 3x3 mesh network as shown in Figure. 1. The suggested approach uses a
novel Q-table to assign a Q-value to each network instance. In a 2D mesh topology, there are NxM nodes
with Q-tables, where ‘N’ and ‘M’ represents the total rows and columns respectively.
The Table 1 comprises four fields such as Q-value which correspond to each Q-table, output port,
current and destination node. Assume a packet is travelling by the longest route available in the network,
from y = 0 to the d = 8. Say the packet is at node 4 for instance ti = t. By considering lowest Q-value at node
4, the packet chooses one of two feasible routes to travel to d = 8. There are only 2 ways to use to route the
packet to its destination, despite there being 4 alternative paths accessible at node 4. Alternatively, node 5 or
7 may receive the packet. Both East and North ports are permitted. Next, the packet is directly transferred to
the adjacent node by retrieving the minimal Q-value among two nodes in the Q-table.
Performance analysis of congestion–aware Q-routing algorithm for network on chip (Smriti Srivastava)
802 ISSN: 2252-8938
potential output ports for node Y have been extracted. The packet will be sent further down the network
using this output port. Following the computation of the output port, (3) is used to determine the new Q-value
of x, the preceding node.
The Q-update approach in Figure 3 also provides a description of the procedure in Figure 2 for
improving the data of the previous node. In addition, a learning rate of 0.5 is utilised for the system to attain
50% review, and a discount rate of 0.7 is applied. The RL-based model is trained for 50 steps, updating the
Q-values each time. The final obtained Q-table helps to transfer every packet at the completion of training
phase.
(a) (b)
Figure 4. Comparison of XY and Q-routing in (a) terms of average packet latency (APL) and
(b) average packet network latency (APNL) using SPEC CPU 2006 benchmark suite
Performance analysis of congestion–aware Q-routing algorithm for network on chip (Smriti Srivastava)
804 ISSN: 2252-8938
Figure 5(a) shows the comparison between XY and Q-routing in terms of average flit latency. The
highest reduction was obtained using leslie3d with a 31% improvement while with bzip2 there was no
reduction observed. The average improvement across all the benchmarks is about 10%. The comparisons in
terms of average flit network latency are shown in the Figure 5(b) shows highest reduction of 31% when
using leslie3d and lowest of 16% when using bzip2. On an average the improvement is about 23% across all
the combinations.
Figure 6(a) shows the comparisons in terms of average energy utilization in mJ. The highest
reduction is found to be in leslie3d of about 53% and least in the combination network with 37% reduction.
The average reduction across all readings is found to be 47%. The Figure 6(b) also shows the average power
consumption in mW. The readings are almost same for both XY and Q-routing for all the cases which is
justified by the fact that Q-learning would find the optimal path by the reinforcement mechanism which
requires traversing random paths in the beginning of the simulation. The results indicate that the congestion-
aware Q-routing algorithm from this study performed better than the XY routing algorithm, particularly with
regards to average packet latency, which is a significant factor in determining a network's efficiency.
(a) (b)
Figure 5. Comparison of XY and Q-routing in (a) terms of average flit latency (AFL) and (b) average flit
network latency (AFNL) using SPEC CPU 2006 benchmark suite
(a) (b)
Figure 6. Comparison of XY and Q-routing in (a) terms of average energy utilization and (b) average power
using SPEC CPU 2006 benchmark suite
5. CONCLUSION
The research presents a Q-routing method that is congestion-aware that lowers average packet
latency and also reducing the energy utilization in an NoC. The algorithm is tested using the CPU2006
benchmark suite. In terms of average packet latency, the proposed congestion-aware Q-routing (CAQR)
algorithm clearly outperformed the XY routing algorithm, which is an important component in determining a
network's efficiency. This Work can further be extended to implement congestion-aware Q-routing for a
WiNoC and perform the analysis of various simulation parameters like average packet latency, power, energy
and area with various traffic patterns.
REFERENCES
[1] H. Cai and Y. Yang, “Congestion Prediction Algorithm for Network on Chip,” TELKOMNIKA Indonesian Journal of Electrical
Engineering, vol. 11, no. 12, pp. 7392–7398, Dec. 2013, doi: 10.11591/telkomnika.v11i12.3987.
[2] S. Kumar et al., “A network on chip architecture and design methodology,” in Proceedings IEEE Computer Society Annual
Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002, 2002, pp. 117–124, doi:
10.1109/ISVLSI.2002.1016885.
[3] D. C. Marinescu, “Cloud Access and Cloud Interconnection Networks,” in Cloud Computing, Amsterdam: Elsevier, 2018, pp.
153–194, doi: 10.1016/b978-0-12-812810-7.00007-8.
[4] W. J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection networks,” in Proceedings of the 38th Design
Automation Conference (IEEE Cat. No.01CH37232), 2001, pp. 684–689, doi: 10.1109/dac.2001.935594.
[5] N. Agarwal, T. Krishna, L.-S. Peh, and N. K. Jha, “GARNET: A detailed on-chip network model inside a full-system simulator,”
in 2009 IEEE International Symposium on Performance Analysis of Systems and Software, Apr. 2009, pp. 33–42, doi:
10.1109/ispass.2009.4919636.
[6] S. Deb, A. Ganguly, P. P. Pande, B. Belzer, and D. Heo, “Wireless NoC as Interconnection Backbone for Multicore Chips:
Promises and Challenges,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 2, no. 2, pp. 228–239,
Jun. 2012, doi: 10.1109/jetcas.2012.2193835.
[7] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3–4, pp. 279–292, May 1992, doi:
10.1007/BF00992698.
[8] N. Binkert et al., “The gem5 simulator,” ACM SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1–7, May 2011, doi:
10.1145/2024716.2024718.
[9] S. Srivastava, M. A. Shaikh, S. G, and M. Moharir, “Intelligent congestion control for NoC architecture in Gem5 simulator,” in
2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Dec. 2022, pp. 353–
360, doi: 10.1109/mcsoc57363.2022.00062.
[10] R. R. R.S. et al., “DeepNR: An adaptive deep reinforcement learning based NoC routing algorithm,” Microprocessors and
Microsystems, vol. 90, p. 104485, Apr. 2022, doi: 10.1016/j.micpro.2022.104485.
[11] M. F. Reza and T. T. Le, “Reinforcement Learning Enabled Routing for High-Performance Networks-on-Chip,” in 2021 IEEE
International Symposium on Circuits and Systems (ISCAS), May 2021, pp. 1–5, doi: 10.1109/iscas51556.2021.9401790.
[12] H. Zheng and A. Louri, “Agile: A Learning-Enabled Power and Performance-Efficient Network-on-Chip Design,” IEEE
Transactions on Emerging Topics in Computing, vol. 10, no. 1, pp. 223–236, Jan. 2022, doi: 10.1109/tetc.2020.3003496.
[13] T.-R. Lin, D. Penney, M. Pedram, and L. Chen, “A Deep Reinforcement Learning Framework for Architectural Exploration: A
Routerless NoC Case Study,” in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb.
2020, pp. 99–110, doi: 10.1109/hpca47549.2020.00018.
[14] H. Zheng and A. Louri, “An Energy-Efficient Network-on-Chip Design using Reinforcement Learning,” in Proceedings of the
56th Annual Design Automation Conference 2019, Jun. 2019, pp. 1–6, doi: 10.1145/3316781.3317768.
[15] F. Farahnakian, M. Ebrahimi, M. Daneshtalab, J. Plosila, and P. Liljeberg, “Optimized Q-learning model for distributing traffic in
on-Chip Networks,” in 2012 IEEE 3rd International Conference on Networked Embedded Systems for Every Application
(NESEA), Dec. 2012, pp. 1–8, doi: 10.1109/nesea.2012.6474016.
[16] J. Hu and R. Marculescu, “DyAD: smart routing for networks-on-chip,” in Proceedings of the 41st annual Design Automation
Conference, Jun. 2004, pp. 260–263, doi: 10.1145/996566.996638.
[17] D. Wu, B. M. Al-Hashimi, and M. T. Schmitz, “Improving routing efficiency for network-on-chip through contention-aware input
selection,” in Asia and South Pacific Conference on Design Automation, 2006., 2006, pp. 36–41, doi:
10.1109/aspdac.2006.1594642.
[18] M. Ebrahimi, M. Daneshtalab, P. Liljeberg, J. Plosila, and H. Tenhunen, “Agent-based on-chip network using efficient selection
method,” in 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip, Oct. 2011, pp. 284–289, doi:
10.1109/vlsisoc.2011.6081593.
[19] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Artificial Neural Networks-Based Machine Learning for Wireless
Networks: A Tutorial,” IEEE Communications Surveys & Tutorials, vol. 21, no. 4, pp. 3039–3071, 2019, doi:
10.1109/comst.2019.2926625.
[20] E. Nilsson, M. Millberg, J. Oberg, and A. Jantsch, “Load distribution with the proximity congestion awareness in a network on
chip,” in 2003 Design, Automation and Test in Europe Conference and Exhibition, 2003, pp. 1126–1127, doi:
10.1109/date.2003.1253765.
[21] F. Farahnakian, M. Ebrahimi, M. Daneshtalab, P. Liljeberg, and J. Plosila, “Q-learning based congestion-aware routing algorithm
for on-chip network,” in 2011 IEEE 2nd International Conference on Networked Embedded Systems for Enterprise Applications,
Dec. 2011, pp. 1126–1127, doi: 10.1109/nesea.2011.6144949.
[22] M. Majer, C. Bobda, A. Ahmadinia, and J. Teich, “Packet Routing in Dynamically Changing Networks on Chip,” 2005, doi:
10.1109/ipdps.2005.323.
[23] M. F. Reza, “Deep Reinforcement Learning for Self-Configurable NoC,” in 2020 IEEE 33rd International System-on-Chip
Conference (SOCC), Sep. 2020, pp. 185–190, doi: 10.1109/socc49529.2020.9524761.
[24] D. Deb, J. Jose, S. Das, and H. K. Kapoor, “Cost effective routing techniques in 2D mesh NoC using on-chip transmission lines,”
Journal of Parallel and Distributed Computing, vol. 123, pp. 118–129, Jan. 2019, doi: 10.1016/j.jpdc.2018.09.009.
[25] K. Ahmad et al., “Congestion-Aware Routing Algorithm for NoC Using Data Packets,” Wireless Communications and Mobile
Computing, vol. 2021, pp. 1–11, Aug. 2021, doi: 10.1155/2021/8588646.
[26] F. Rad, M. Reshadi, and A. Khademzadeh, “A survey and taxonomy of congestion control mechanisms in wireless network on
chip,” Journal of Systems Architecture, vol. 108, Sep. 2020, doi: 10.1016/j.sysarc.2020.101807.
[27] M. R. Arun, P. A. Jisha, and J. Jose, “A Novel Energy Efficient Multicasting Approach For Mesh NoCs,” Procedia Computer
Science, vol. 93, pp. 283–291, 2016, doi: 10.1016/j.procs.2016.07.212.
[28] J. L. Henning, “SPEC CPU2006 benchmark descriptions,” ACM SIGARCH Computer Architecture News, vol. 34, no. 4, pp. 1–17,
Sep. 2006, doi: 10.1145/1186736.1186737.
Performance analysis of congestion–aware Q-routing algorithm for network on chip (Smriti Srivastava)
806 ISSN: 2252-8938
BIOGRAPHIES OF AUTHORS