0% found this document useful (0 votes)

16 views

Advanced Reinforcement Learning Solution For Clock Skew Engineering Modified Q-Table Update Technique For Peak Current and IR Drop Minimization

ML Techiniques

Uploaded by

naresh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Advanced Reinforcement Learning Solution For Clock Skew Engineering Modified Q-Table Update Technique For Peak Current and IR Drop Minimization

ML Techiniques

Uploaded by

naresh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Received 7 June 2023, accepted 26 July 2023, date of publication 11 August 2023, date of current version 22 August 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3304534

Advanced Reinforcement Learning Solution

for Clock Skew Engineering: Modified
Q-Table Update Technique for Peak
Current and IR Drop Minimization
SAYED ARESH BEHESHTI-SHIRAZI 1 , NAJMEH NAZARI 2 , KEVIN IMMANUEL GUBBI2 ,
BANAFSHEH SABER LATIBARI 2 , (Graduate Student Member, IEEE), SETAREH RAFATIRAD 2,

HOUMAN HOMAYOUN 2 , AVESTA SASAN2 , AND P. D. SAI MANOJ 1

1 Department of ECE, George Mason University, Fairfax, VA 22030, USA
2 Department of ECE, University of California, Davis, CA 95616, USA
Corresponding author: Sayed Aresh Beheshti-Shirazi ([email protected])

ABSTRACT This paper discloses a Reinforcement Learning (RL) solution implemented to decrease the
peak current by alteration of the clock skews. Clock skews are elements of the clock network calculated
throughout the Clock Tree Synthesis (CTS) phase of physical design. Initially, the physical design tools
targeted obtaining a balanced clock tree and decreasing the clock skew as low as possible. The resulting
zero-skew clock tree caused a drastic increase in the current demand for the battery. The proposed solutions
in this paper comprise a Reinforcement Learning agent that maneuvers throughout the design and updates
the clock arrival time of each register by either adding, removing, or not changing it. The agent’s end game
is to maximize the clock arrival distribution of the design. The Reinforcement learning solution allows the
exploration and optimization of the clock tree synthesis process beyond the heuristic algorithms employed
by traditional Electronic Design Automation (EDA) tools. This paper contains two experiments using the
Reinforcement Learning algorithm. The first experiment results indicate a 35% reduction in peak current and
a significant reduction in IR decrease (from package to transistor) in the chosen benchmarks. The second
experiment modified the Q-table renewing technique, which resulted in another additional 10% improvement
compared to the first experiment. In both experiments, the agent traverses the environment and explores
different options despite creating timing violations and obtaining a substantial negative feedback reward for
the actions taken. However, the timing violation fixed later results in the agent obtaining a future reward
for modifying the clock arrival time of other registers. The overall process resulted in the broader spread of
clock arrival distribution.

INDEX TERMS Clock tree synthesis (CTS), computer aided design, machine learning, peak current
reduction, reinforcement learning, SARSA learning.

I. INTRODUCTION makes the designs much more complicated. The IC design

As time passes, the Electronic Design Automation (EDA) innovations allow chips to have pioneering designs that hold
tools used by Integrated Circuit (IC) designers become more up to 1.2 trillion transistors [29]. The sophistication of the
and more sophisticated and evolved; designers have the designs results in an increase in size, which in turn makes the
capability to integrate more transistors on a single die which Application Specific Integrated Circuit (ASIC) design flow
more time-consuming, resulting in various complications
The associate editor coordinating the review of this manuscript and and constraints at different phases, including complications
approving it for publication was Ludovico Minati . in clock distribution network design. The complications
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 87869
S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

in clock distribution network results mean more violations

being introduced to the clock network design, which calls
for additional guardrails to ensure no new violation is
introduced; the design needs to go through Static Timing
Analysis (STA) each time a netlist is modified to sure the
clock skews resulting from the clock distribution network
passes different timing analysis rules and guidelines. STA
is a monotonous, repetitive, and time-consuming process;
each time, the design goes through a sequence of multi-
hour iterations, wherein the iteration duration is directly
associated with the size of the designs. Larger designs
FIGURE 1. (Left): A sequential netlist with balanced clock tree,
tend to have more significant and longer iterations, which (right-top): Simultaneous Arrival time of the clock to all register,
consume more time to be completed. During the STA process, (right-bottom): surge of demanded current on active edge of the clock.
the end product required timing rules are analyzed and
verified to guarantee the design satisfies and passes the rules
needed for the end product to function as planned. The STA IR drop and inductive voltage drop result in difficulty and
tool receives the routed netlist, clock definitions, or clock complexion in the design of the package and also degradation
frequency as inputs to verify the design and ensure the in performance [33].
resulting design operates at the designated clock frequency One of the solutions traditionally used by chip designers
without causing any timing violations. The initial design has is to mitigate the effects of inductive drop and reduce the
multiple basic timing violations, such as setup and holds low-frequency oscillation caused by the surge in current
violations; these violations are resolved by the STA tool as the demand, which these solutions result in increases in the
design goes through sequential iterations by inserting buffers, design cost. As mentioned, another complication introduced
re-routing clock-tree, and layout modification. As a result is the increased IR drop that, in turn, leads to reduced voltage
of the rationale provided above, the Clock Tree Synthesis at the transistor level, performance degradation, and more
(CTS) phase is one of the essential steps in a physical considerable cycle-to-cycle voltage variation, a.k.a voltage
design that results in implementing the clock distribution noise. The higher clock jitter is the result of having a
network. The clock tree is the output of the CTS phase. more significant voltage variation of voltage notice that
Another side effect of designs exponentially growing while requires to be mitigated by the designers by modeling
the number of transistors increases and designs become more the uncertainty margin for each endpoint. The substantial
complicated is the power consumption of the extensive clock design uncertainty decreases available slack for each reg-
distribution network that drastically increases and causes to-reg timing path, resulting in an inferior design and
further performance degradation. All the above complications removing Power, Performance, and Area (PPA) optimization
push the chip designers to focus on decreasing and mitigating opportunities.
the impact of complex clock distribution networks on power, Some of the modern Electronic Design Automation tools
performance, and area degradation during the CTS phase. are taking advantage of heuristic algorithms, these tools
To sum up, the CTS process goal is to minimize the augmented with heuristic algorithms, skew the clock tree by
clock skew and concentrate on reducing the peak current, utilizing the unused or useful clock skew allowing for time
or in other words, the variation between the minimum and borrowing from preceding or proceeding reg-to-reg timing
maximum clock arrival time [14]. Regulation of the clock paths, increasing the available slack at the target timing
skews between various branches of the clock tree necessitates path. Time borrowing allows fixing timing violations without
accurate timing analysis [5], [28], [36], and buffer addition setting the IC’s clock frequency to the most extended reg-to-
or removal [32] is a fundamental component of obtaining reg path propagation delay. Useful skew has also been used
a clock tree that is practical without any timing violations. in designing heuristic solutions for spreading the clock arrival
Furthermore, the clock tree synthesis or CTS drifted toward times for peak current reduction in [47].
the zero skew concepts due to design complexity growth [2]. The complication with using heuristic solutions by EDA
Traditional EDA tools reduce the clock skew by outputting tools is that it is heavily restricted and dependent on
balanced clock trees and assigning each register-to-register the expert physical and EDA designers’ experience and
timing path equal time. The result in the zero-skew clock knowledge. In conclusion, as designs become exponentially
tree leads to the simultaneous launch of all registers resulting more significant in size and more complicated, the massive
in a drastic increase of demanded current from the battery optimization space results in the failure of heuristic solutions
on the active edge of the clock. The idea is disclosed in to reach Pareto-optimal frontiers despite improving the
Fig. 1. The Power Delivery Network (PDN) is composed optimization objective.
of a network of resistors and inductors, wherein the drastic This paper presents an Advanced Reinforcement Learning
surge in demanded current results in a significant resister IR Solution for Clock Skew Engineering, aiming to minimize
drop and an even greater inductive voltage drop [47]. The Peak Current and IR Drop through a modified Q-Table update

87870 VOLUME 11, 2023

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

technique. Section II offers a thorough background of the resulting in significant voltage drop and voltage noise. The
research and discusses the motivation behind developing more substantial voltage drop reduces the transistors’ speed,
this advanced reinforcement learning approach. A compre- and the voltage noise increases the clock tree’s jitter.
hensive review of related work in the field is provided The impact of the reduction in average voltage at the
in Section III. Section IV describes the problem, and the transistor level and the surge of the clock jitter is another
proposed reinforcement learning solutions are outlined in essential factor to be considered during the physical design.
detail. Experimental results and performance evaluation are Therefore, designers need to set a lower rail voltage during
presented in Section V. Finally, Section VI concludes the the static timing analysis to plan for the rise of IR drop and
paper by summarizing the achieved results and improvements increase of each register endpoint’s uncertainty to mitigate
resulting from applying the proposed solution. the voltage noise [30]. As designs move toward reducing peak
current, the amount of inductive IR drop and voltage noise is
II. BACKGROUND decreased. In other words, extensive additional timing slack
One of the vital phases of the physical design that results spares are realized due to a higher voltage on the transistor
in obtaining the clock distribution network is Clock Tree level and reduced clock jitter, which could be utilized to
Synthesis (CTS). As designs grow, the clock distribution improve the Power, Performance, and Area (PPA) of design
network becomes enormous. Its sophisticated structure under implementation. Hence, reducing the peak current
contains many elements, such as Phase-Locked Loop (PLL) can significantly impact the PPA optimality outcome of the
systems, clock dividers, buffers, multiplexers, gates, wires, physical design.
etc. The extensive clock distribution network is accountable Authors in [45] disclose the timing closure problem,
for the majority of the chip design power consumption. The explaining its difficulty’s root cause. In addition, providing
CTS and designs are susceptible to a plurality of issues, details regarding traditional techniques that address and
such as jitters, timing issues, power implications, signal mitigate the timing closure complications. Furthermore,
integrity, and area implications. When designers are targeting new challenges that appear at advanced process nodes are
to achieve a balanced clock tree, which in turn causes the highlighted, and solutions to these problems are discussed.
simultaneous switching and clock transition at the clock pin In another prior art by [46], the authors have presented
of all registers within a short timing window at every rising a machine learning (ML) model which is founded on
edge, in rising-edge triggered design, the launch registers bigrams of path stages to predict Path Base Analysis (PBA)
launch the data path in the rising edge of the clock. The outcomes that are expensive from Graph Base Analysis
capture registers to capture the propagated logic value on the (GBA) outcomes that are relatively inexpensive. Their
upcoming rising edge of the clock. The switching activities study also focuses on identifying electrical and structural
are caused by the simultaneous launch of registers (and characteristics of a circuit that affect PBA-GBA deviation
activation of data paths) [47]. Furthermore, the clock level concerning endpoint arrival times. To accomplish this, GBA
transition raises the current discharging and charging the and PBA analysis of a given test case is conducted, coupled
capacitive clock network in each clock switching, either rise with artificially generated timing paths.
or fall. As illustrated by Fig. 1, the demanded battery current The literary creators embarked on a research expedition
is instantly changing at each rising edge of the clock due to where they explored a method of data analysis based
the cumulative effect of all switching activities. The surge of on multivariate linear regression, as documented in [49];
the current is less at the clock’s falling edge when the design is the method helps predict the timing analysis outcomes
based on a rising edge triggered; at the clock’s falling edge, at observed corners. The authors employed a strategy of
the clock network only discharges and does not experience backward step-wise selection to simplify the process of
any additional switching activity in the datapath (unless there choosing which corners to observe and which to predict.
exist half-cycle timing paths) with the sudden reduction in
switching activity between clock edges, the current surge is
rapidly suppressed [7]. A. BACKGROUND ON REINFORCEMENT LEARNING
The Power Distribution Network (PDN) has both resistive As the design expands in scope, an increase in switching
and inductive nature, which results in the transistors will activity leads to greater complexity, which consequently
experience both resistive and inductive voltage drops when complicates the CTS process [15]. As the size of the design
demanded current increases. There is a close correlation grows, the process becomes more unwieldy, presenting
between the resistive drop and the current flowing through an opportunity to investigate the potential advantages of
the resistive portion of PDN (IR), while the inductive IR employing Machine Learning (ML), particularly Reinforce-
drop is proportionate to the amount of change in the current ment Learning (RL), for optimizing the design. Before
(L.di/dt). The resistive drop is proportional to the current delving into the RL-specific solutions proposed in this paper,
that flows through the resistive PDN (IR), and the inductive a comprehensive overview of various Machine Learning
IR drop is proportional to the rate of change in the current techniques that chip designers can utilize to harness the power
(L.di/dt). The instantaneous variation in the current within of Machine Learning for optimizing and improving designs
a short period drastically increases the inductive IR drop and shortening production time is provided.

VOLUME 11, 2023 87871

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

Machine Learning techniques can be categorized based on The expected reward signifies the cumulative reward
the primary data source that drives the algorithm, resulting anticipated when commencing from the current state st and
in four groups: (1) supervised learning, (2) unsupervised adhering to the policy π. To evaluate the value of being in
learning, (3) semi-supervised learning, and (4) reinforcement a particular state, the expected future rewards from that state
learning. In supervised learning, each training data item are taken into account using equation 2:
is labeled correctly, while unsupervised learning involves ∞
. X
training data items with no available labels, requiring the Vπ (st ) = Eπ [Gt |st ] = Eπ [Rt+1 + γ k (Rt+k+1 )|st ] (2)
algorithm to discern input samples. Semi-supervised learning k=1
falls in between, with some labeled training data items.
The following equation calculates the value of being in
Lastly, Reinforcement Learning (RL) approaches are distinct
a state S at time t and following a particular policy π,
in that they do not rely on training data items; instead, they
denoted as Vπ (st ), while Eπ [Gt |st ] represents the expected
depend on an agent’s actions and feedback received from its
reward from the current state St when following the same
environment. Together, these three paradigms offer a wide
policy π. There are several variants of reinforcement learning
range of approaches for training machine learning models to
(RL) algorithms available. The State-Action-Reward-State-
accomplish various tasks. In the field of artificial intelligence,
Action (SARSA) is one such variation of the Q-Learning
reinforcement learning is focused on developing algorithms
algorithm. In Q-Learning, the RL agent uses Q-values stored
and models that enable an intelligent AGENT to make a
in a Q-table, representing the values associated with various
sequence of decisions or actions that lead to the maximization
possible actions in a given state. Q-Learning is an off-policy
of cumulative REWARD in a given ENVIRONMENT. The
and model-free RL algorithm where the RL agent employs
agent interacts with its environment and receives feedback
an iterative approach to improve the quality of Q-table
in the form of rewards for each action it takes. The goal of
entries. On the other hand, SARSA is an on-policy version
reinforcement learning is to develop an agent that can learn
of Q-Learning that uses the value obtained by the current
from experience and improve its decision-making abilities
action taken to learn and improve the Q-table values. SARSA
over time, ultimately achieving optimal performance in the
calculates the Q-value for each action based on a formula that
given task or environment. The activity alters the condition
considers the expected reward for the next state-action pair.
of the environment, as noted by [4], and consequently leads
to a reward and an updated state. The cumulative reward and Q(st , at ) = Q(st , at )+α(Rt+1 + γ Q(st+1 , at+1 ) − Q(st , at ))
state transformation offer the agent with deferred feedback (3)
regarding the outcomes of its chosen actions. Through a
methodical process of trial and error, the agent acquires In this mathematical equation, Q(st , at ) represents the
knowledge on making optimal decisions to achieve the best function that calculates the action value at a given state
possible outcomes. The ultimate objective of the agent is to st and time at , while α represents the learning rate
maximize the overall reward, denoted as Gt in equation 1: and γ the discount factor. Prior works have employed
∞ reinforcement learning techniques to enhance the efficacy of
Computer-Aided Design (CAD) tools by acquiring practical
X
Gt = Rt+1 + γ k (Rt+k+1 ) = Rt+1 + γ k (Rt+2 )
k=1
and adaptive heuristics [11]. For instance, [11] employs
reinforcement learning to increase the efficiency of the placer
+γ (k+1)
(Rt+3 ) + . . . (1)
in exploring the solution space and dynamically adapting
In the given formula, the variable t represents the time step, to specific issues. Similarly, [12] and [16] leverages a
while R signifies the reward. The discount factor, denoted reinforcement learning-based routing approach to expedite
by γ , serves as a measure of the agent’s inclination towards FPGA routing solutions. Moreover, [13] presents a novel
future rewards. Additionally, k is an integer index employed placement and routing algorithm for 3D FPGAs, which
for adjusting time and the discount factor concerning future employs reinforcement learning to significantly reduce the
time steps. This equation illustrates the total reward as a Manhattan distance and wire length. Another instance of rein-
combination of the immediate reward Rt+1 , achieved upon forcement learning application is highlighted by [17], which
transitioning to the subsequent state St+1 , and the discounted leverages this technique to devise an energy-efficient I/O
future rewards commencing from the next step. management scheme between multi-core microprocessors
The discount factor ranges between 0 and 1. A value nearer and memory. In the subsequent section, the proposed solution
to zero indicates the agent prioritizes immediate rewards employing SARSA Q-learning Reinforcement Learning (RL)
and overlooks future rewards. Conversely, a value closer to is described, aiming to optimize the clock tree synthesis and
one reveals that the agent is more concerned with long-term maximize the distribution of clock arrival while exploring
rewards as opposed to short-term ones. Utilizing the total potential areas for further optimization.
reward formula, the agent can ascertain the anticipated return.
During the training phase, the agent acquires knowledge on III. RELATED WORK
maximizing the reward and formulates a policy, π, for its As it is well known, one of the essential phases in chip design
actions. is the construction of the on-chip PDN, as it has one to

87872 VOLUME 11, 2023

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

one relationship and affects the chip’s quality and reliability. hand, [20] puts forward a hybrid optimization method for
Historically, designers utilized a large array of different clock tree synthesis (CTS) that integrates a Generative
techniques to mitigate the impact of IR drop and address Adversarial Network (GAN) with Reinforcement Learning.
the complications arising from the increase in demand peak The conventional GAN model comprises a generator and a
current. These techniques can be categorized into two groups: discriminator, while the reinforcement learning aspect of this
heuristic solutions focused on reducing peak current and ML approach incorporates a pre-trained regression model as a
solutions aimed at achieving the same goal. The subsequent supervisor for the generator.
subsections provide more detailed information about each According to [18], Artificial Neural Networks (ANN)
category. can predict the number of clock tree components such as
clock buffers and wire loads. The disclosed technique utilizes
A. HEURISTIC SOLUTIONS FOR PEAK CURRENT
ANN to determine the number buffer insertions during Clock
REDUCTION
Tree Synthesis (CTS) to achieve the desired clock skew
and maximize input transition times for clock buffers and
The authors of [1] discuss an innovative technique to mitigate
clock sinks. Other techniques like those proposed by [21]
the simultaneous switching in clock networks. It achieves
and [37] employ Machine Learning (ML) techniques like
this by bifurcating network buffers into two distinct groups
Support Vector Machine (SVM) algorithm to estimate clock
- one operating on the rising edge, the other on the
buffer and wire sizing, focusing on reducing clock skew while
falling edge. The current solution does not incorporate this
maintaining power dissipation levels. Further, [20] and [22]
method, but there is potential for future implementation.
utilize the conditional generative adversarial network (GAN)
Nevertheless, integrating this strategy necessitates the need
augmented with reinforcement learning to anticipate and
for edge-triggered registers that function on both edges. This
optimize Clock Tree Synthesis (CTS) outcomes.
results in a considerable capacitive load increase, essentially
doubling that of regular rising-edge or falling-edge triggered Furthermore, [24] offers a solution based on machine
registers, thereby nullifying the benefits of halving the clock learning that enables quick analysis of potential routing
network frequency. patterns and the building of clock trees.
There have been noteworthy advancements in employing
The method outlined in [19] aims to minimize the
reinforcement learning to enhance Computer-Aided Design
switching activity of a Finite State Machine (FSM) and
(CAD) tools’ operation, such as placement and routing
while not directly related to the solutions proposed by this
solutions. Reference [11] uses RL to augment the effi-
paper, it has potential to lower peak current. It achieves this
ciency of the placer in exploring the solution space and
by introducing state replication and re-encoding techniques.
dynamically adapting to particular problems. Reinforcement
However, this approach is limited to the FSM part of a
Learning-based routing techniques are presented in [12]
netlist. The strategy disclosed by this paper aligns more
and [16] to expedite FPGA routing solutions. Reference [13]
with the previous works [3], [7], [9], [10], [23], [26], [27],
proposes a placement and routing algorithm for 3D FPGA
[39], [47], which aim to reduce peak current by dispersing
that leverages Reinforcement Learning to reduce Manhattan
Clock Arrival Times (CAT). Instead of employing heuristic
distance and wire length. Furthermore, [17] employs RL
approaches, the paper uses a reinforcement learning agent
to formulate energy-efficient I/O management between
that learns to distribute clock arrival times to avoid timing
many-core microprocessors and memory.
issues.
In [8], partitioning and superposition techniques are used
In the recent research [48], a novel pyramid structure has
to extract SOC floorplan and PDN features. The extracted
been suggested to optimize resource use and performance
information is then used by a Machine Learning model
in a High-Level Synthesis (HLS) design using Machine
to independently predict an updated static IR drop for
Learning (ML). The researchers built a comprehensive
each power node without requiring a golden IR drop
database of C-to-FPGA outcomes from diverse benchmarks.
tool. It demonstrates superior performance compared to an
They used an automatic hardware optimization tool, Minerva,
industry-leading tool with minimal error rates.
to identify the maximum achievable clock frequency. This
The paper [6] introduces a design flow to generate
tool utilizes Static Timing Analysis (STA) and a heuristic
a PDN with minimal overhead for standard cell routing
algorithm to target optimal throughput or throughput-to-
while meeting the IR drop and EM constraints for a given
area, leading to a more efficient High-Level Synthesis (HLS)
placement. The ML model predicts the total wire length of
design.
the global route linked with a particular PDN configuration
to accelerate the search process.
B. LEARNING SOLUTIONS FOR PEAK CURRENT Authors in [38] calculate the IR drop after each ECO by
REDUCTION using timing, power, and physical features extracted before
The research article [7] introduces a clock-skew optimization ECO to predict the IR drop of a design after ECO. To enhance
solution using a heuristic approach based on genetic algo- prediction accuracy and training time, they develop regional
rithms and clustering techniques to handle the constraints models for cell instances located near IR drop violations. The
of the maximum number of clock drivers. On the other study confirms that for a design with 100,000 cell instances,
VOLUME 11, 2023 87873
S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

IR drop prediction can be completed in a span of two The ANN-based MSD model was further integrated into
minutes. existing timing libraries for MIS-aware timing analysis. Their
In a study by [56], a novel machine learning (ML) comprehensive work presents a practical solution to address
technique is proposed for predicting IR drop in circuits the influence of MIS on SIS delay and its subsequent effects
before ECO revision. This technique enables the creation on the timing performance of the system.
of prediction models that can be reused to predict the The scholarly work by [51] innovatively employs Deep
IR drop of the revised circuit after ECO is complete. Neural Networks (DNN) to create highly accurate approxi-
Reference [40] offers a comprehensive review of diverse mations of signal arrival time distributions, while maintaining
techniques leveraging ML algorithms for IR drop estimation. linear-time complexity. They leveraged various DNN archi-
Meanwhile, the authors of [41] present PowerNet - an tectures to execute the maximum and convolution operations
innovative dynamic IR drop estimation method, employing with utmost efficiency, which was made possible by the
convolutional neural networks (CNNs) that can handle both utilization of appropriate training datasets.
vector-based and vector-less IR analysis. The CNN model In the work by [52], a groundbreaking approach is
employed in PowerNet exhibits high generalizability, making introduced for automatic timing closure of relative-timed
it suitable for a wide range of design applications. circuits using machine learning techniques. This ML-guided
Additionally, [42] introduces a novel automated workflow strategy is designed to accelerate the process by learning
to mitigate IR drop violations instigated by ECO. This work- from the characteristics in each iteration, thereby reducing the
flow provides solutions like cell movement and downsize overall time needed for timing closure of a given design.
options, utilizing an ML algorithm for IR drop prediction The study by [53] presents a novel machine learning (ML)
to avoid overfixing. In an innovative approach, they also based approach to predict pin-to-pin delays of combinational
apply a multi-round bipartite matching technique to optimize circuits at the register transfer level (RTL). To achieve a high
resources during the ECO workflow. In another related study, degree of accuracy, this approach seamlessly integrates slew
[43] proposes a tool that harnesses ML techniques like and delay estimations. They generate a training set using
three-dimensional convolutions and regression-like layers. characteristics of components produced by a model-driven
This tool suggests a more extensive subset of worst-case hardware generator framework. Open-source logic synthesis
test patterns, thus enhancing test coverage and enabling and static timing analysis (STA) tools are employed to
accurate prediction of IR drop. Lastly, [40] employs an determine the ground truth labels for delays, slews, and their
XGBoost-based ML technique to predict dynamic IR drop for interdependencies.
both vector-based and vector-less IR drop analysis. In [44], In this study, the paper presents an innovative RL-based
the authors forecast the symmetry and correlation between approach for clock skew engineering, aiming to optimize the
the predicted data and the golden data by leveraging the clock tree synthesis and maximize the distribution of clock
correlation coefficient. arrival. The proposed RL-based approach overcomes one of
The study presented in [54] brings forth an innovative the significant limitations of supervised learning approaches;
approach to generate predicted PBA timing results from heavy reliance on labeled data for training. Acquiring labeled
pessimistic GBA timing reports. This approach harnesses data can be time-consuming and expensive, especially in
the power of a stage-based delay model integrated with a scenarios with limited samples. The RL-based approach
customized loss function rooted in Machine Learning. What overcomes this limitation as it learns from interactions with
sets this model apart is its consideration for the asymmetric the environment without the need for extensive labeled
loss that might occur while generating these predictions. data. Furthermore, supervised models require help to adapt
The effectiveness of this model extends beyond precise PBA to dynamic environments as they are trained on fixed
timing results. It also enhances designers’ capabilities in datasets. In contrast, RL agents continuously interact with the
swiftly identifying false violation paths within GBA reports, environment and adjust to changing situations, making the
reducing the time expenditure significantly compared to proposed approach in this work more suitable for dynamic
conventional methods. As a consequence, it curtails the scenarios. Supervised models also have limited capability
margin in the post-route optimization phase. The increased to handle unseen or novel situations. In contrast, the
efficiency in generating timing results proposed by this model proposed RL-based approach in this work has the capability
holds the promise of significantly refining the design process. for exploration and learning from novel, never-seen-before
In the study conducted by [50], the researchers aimed to scenarios, which is crucial for clock skew engineering in
capture the impacts of Multiple-Input Switching (MIS) by real-world applications. Moreover, RL inherently balances
deriving a corrective measure, referred to as the MIS-SIS exploration and exploitation to find the best strategies,
Difference (MSD), applicable to traditional Single-Input making it more adept at handling non-linear relationships and
Switching (SIS) delay across diverse scenarios. Several complex interactions than supervised learning techniques.
modeling methods including polynomial regressions, support However, it is also essential to acknowledge the limitations of
vector regression, and artificial neural networks (ANNs) the RL-based approach. RL often requires more interactions
were experimented with to create a precise model for MSD. with the environment to achieve good performance, making

87874 VOLUME 11, 2023

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

it computationally expensive and time-consuming. Policy

optimization can also be challenging due to the large state-
action spaces, especially in complex systems. Properly
designing reward functions is crucial for RL success,
as an improperly designed reward function can lead to
suboptimal or undesirable behavior. Balancing exploration
and exploitation is another challenge, as poorly balanced
RL agents might get stuck in suboptimal solutions or take
excessive time to explore new possibilities. By presenting
these comparison points and limitations, we can demonstrate
the advantages of the proposed RL-based approach in this
work over prior existing supervised learning-based studies for
clock skew engineering. The proposed approach overcomes
the limitations imposed by supervised models and provides
a more adaptive, exploratory, and efficient solution for clock FIGURE 2. Reinforcement Learning Solution for Peak Current Reduction,
skew optimization. and description of rewards in this work.

The next section of this paper discusses a proposed solution

that employs SARSA Q-learning RL to maximize the clock
arrival distribution and explore optimization opportunities in peak current reduction methodologies while overcoming the
Clock Tree Synthesis. limitations of supervised learning-based studies and heuristic
algorithms disclosed by the prior art.
IV. PROBLEM AND PROPOSED SOLUTION Fig. 2 captures the high-level setup of the proposed
Despite extensive research and promising outcomes on RL solution for peak current reduction based on SARSA
reducing peak current using heuristic and machine learn- Q-learning with a Decaying epsilon reedy policy to optimize
ing models in previous works like [7], [18], [20], [21], the distribution of clock skews. The RL agent moves from
and [37], these studies mainly relied on either partially register to register, and in each register, it decides on its
or fully supervised learning approaches. This study draws actions. One solution for the agent’s movement in the netlist
inspiration from the above literature to go beyond the is for the agent to move from one register to all registers
limitations imposed by supervised models’ datasets and connected to it (any launch register in its incoming or any
explore uncharted Pareto Frontier optimal spaces beyond the capture register in its outgoing timing paths). However,
scope of heuristic algorithms. This paper aims to gain a building such an environment carries many complications as
more comprehensive understanding of peak current reduction the adjacency matrix for netlist connectivity is very sparse.
by venturing beyond the well-trodden paths of supervised This complicates the design of the q-table, as the q-table
models and established heuristic algorithms. By expanding should be constructed to support the large in-degree and out-
the exploration of optimal solutions, this study aims to degree reg-to-reg connectivity. In this work, to simplify the
contribute new insights and perspectives to the ongoing problem, the agent movement has been decoupled from the
discourse on peak current reduction. Instead of relying netlist connectivity and arranged the registers in a grid world
on heuristic algorithms or partial reinforcement learning (Torus, to be exact!). In this arrangement, from any register,
approaches, the approach proposed by this paper ventures the agent can go to only four other registers on its left, right,
into a fully-reinforcement learning perspective to address the top, and bottom. The arrangement of registers in the grid
issue of IR Drop by reducing peak current. The reinforcement world is random. The experimental results indicated that the
learning agent is trained to leverage and exploit opportunities number of steps in each episodic task is large enough. It does
for widening the arrival time of clocks and discovering not matter how the grid world is arranged and how much
new Pareto-optimal spaces beyond the optimization areas superficial connectivity is established. The agent’s movement
heuristic algorithms cover. A two-phase solution is proposed in the grid world boundary has been handled by converting
for training the reinforcement learning agent, incorporating the grid world into a torus. For example, an agent on the left
SARSA Q-learning with decaying epsilon greedy to construct side of the grid can decide to move left, and it will be moved
the clock tree network and perform clock tree synthesis. to the same row but on the right side of the grid. In this work,
Although SARSA Q-learning has been used in other fields, different elements of the RL solution, the state, action, and
this proposed solution employs a modified version of SARSA reward, are defined as follows:
Q-learning, augmented with a decaying factor for epsilon Current-State (St ): The relevant factors that need to be
greedy, two-phase operation, two separate Q-tables, and considered in the context of timing analysis for an electronic
a specific environment implementation and deployment design are multiple, including the position of the agent within
for executing the required clock tree synthesis function. the grid, the count of clock buffers that connect to each
By adopting this novel and innovative approach, the proposed endpoint register (representing the clock access time (CAT)
solution by the paper seeks to contribute to advancing of said register), the distribution of clock arrival times (which

VOLUME 11, 2023 87875

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

forms the CAT Dictionary), as well as a collection of Timing timing paths originating from that same register, using it as
Violations (TV) in the form of a dictionary. the launch register. By doing so, the rt effectively captures
Next-State (St+1 ): Following the relocation of the agent, the net impact of buffer insertion or removal on the timing
a revised number of clock buffers is now being employed behavior of the circuit.
to direct towards each endpoint, subsequently recalibrating
the clock arrival time (CAT) of the register. Additionally,
the revised CAT dictionary and the time-varying (TV) A. PROBLEM SPECIFIC ENVIRONMENT
dictionary have been updated to reflect the current state of One of the distinctive aspects and challenges posed by the
the system. These modifications are fundamental to ensuring present study is the creation of a problem-specific environ-
the integrity, accuracy, and precision of the clock distribution ment, as detailed in 1. The development of this environment
network, mitigating any potential errors or discrepancies in demanded a meticulous approach, commencing from the very
the system’s operation. basics. While reinforcement learning algorithms have been
Action: The available activities an agent can undertake employed to tackle a wide range of problems for many years,
at a given state (register) constitute a critical aspect of the unique nature of the problem at hand necessitated an
the proposed design. Specifically, at any given state, the environment tailored specifically for this purpose. The envi-
agent is tasked with taking two distinct actions, namely: ronment was meticulously crafted to address the complexities
1) inserting or removing up to five buffer entities (yielding of clock tree synthesis, encompassing small as well as larger
a total of eleven possible actions, i.e., five insertions, five designs replete with intricate nuances. Algorithm 1 offers
removals, or no action), and 2) moving to an adjacent insights into the functioning of the environment, including
register (achievable via four actions, namely, Move Up the calculation of rewards rσ and rt , alongside the mechanism
(U), Down (D), Left (L), and Right (R)). It is worth for updating the state, which is subsequently conveyed to the
noting that the neighboring register is determined based Agent.
on the grid-world configuration rather than actual register The environment in which the agent operates is char-
connectivity. Importantly, the evaluation of each action or acterized by a hierarchical dictionary structure, where the
move is performed in the corresponding q-tables, utilizing keys correspond to registers utilized in the design. The
the same reward function. To facilitate efficient learning, values associated with these registers consist of three nested
the agent is trained to adopt separate policies for movement dictionaries: The first dictionary stores the Clock Arrival
and buffer insertion or removal, each tailored to the specific Time (CAT) value of the given register. The second dictionary
demands of the corresponding task. The policy learned for the contains all the registers that are connected to the key register,
buffer insertion or removal is independent of the grid-world. serving as inputs. The third dictionary contains all the
However, the policy learned for movement depends on the registers that receive inputs from the key register, downstream
grid-world connectivity. in the circuit.
Reward: The reward serves as the consequential feedback The environment interacts with the agent by receiving
provided to the agent upon transitioning into a new state. and incorporating the selected action, subsequently updating
The agent of the proposed design in this work, upon entering the CAT values in the dictionary structure. The agent is
a new state, is subject to receiving two distinct rewards, afforded three possible actions at each step: adding a buffer,
with the first involving a positive or negative reinforcement removing a buffer, or moving to an adjacent register. When
(rσ ) that is proportionate to the extent to which the spread the agent chooses to add or remove a buffer, the environment
of clock arrival times is amplified or attenuated as a result determines the resulting CAT values and computes the
of the variance of the new distribution relative to the prior number of violations that arise from the selected action.
distribution. The second reward (rt ) is a significantly large The identification of violations is accomplished via rigorous
negative or positive reinforcement that is proportionate to timing analysis, using specialized formulas.
the degree of timing violation created or remedied by the It is important to note that the register undergoing
agent. The reward (rσ ) is the difference between the standard manipulation by the agent is situated at a launchpad or
deviation of clock arrival distribution before and after the capture pad, depending on the timing paths involved. Thus,
agent’s buffer insertion/removal action and it is computed the assessment of violations resulting from the agent’s action
using: requires the utilization of two distinct formulas, tailored to the
specific timing characteristics of the given register. Overall,
rσ = σnew − σold (4)
the environment is characterized by a complex structure,
The variable rt denotes the cumulative sum of timing necessitating the deployment of sophisticated algorithms
violations, measured in picoseconds, which occur upon the and techniques to facilitate the agent’s decision-making and
insertion or removal of a buffer along all timing paths to enhance the overall performance of the system. To check
and from the resident register of the Agent. This value is violations caused by the agent’s selected action with regard
derived by performing setup timing checks twice: firstly, to registers that the current register is feeding their inputs, the
by examining all timing paths leading to the Agent’s register, function calculates the slack between the current register and
using it as the capture register; and secondly, by assessing all each of the registers receiving an input feed from the current

87876 VOLUME 11, 2023

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

register using the following formula:

Algorithm 1 Reinforcement Learning Environment Imple-
Slack = (T + CATcurrent ) − (CATnew + tplogic ) (5) mentation
ds {Added delay in current step t adjust the CAT}
FF {Target register for take the action}
Equition (7) incorporates various parameters relevant to DIC {Dictionary of connections, CAT, T, and tpd }
the design process. T denotes the clock period, which may TV {A list of current Timing Violation}
differ across benchmarks. Meanwhile, CATcurrent and CATnew 1 {Adjusted CAT based on proximity to clock edge}
rσ {reward for widening the CAT distribution}
represent the original and updated clock arrival times of the rt {Reward for timing violation}
current register under manipulation, respectively. In addition, DIC {Updated dictionary of connections, CAT, T, and tpd }
tplogic refers to the delay experienced by data traveling TV {Updated list of Timing Violation}
1 {Updated adjusted CAT based on proximity to clock edge}
between the current register and those whose inputs it feeds. procedure Env(ds , FF, DIC, TV , 1)
To ascertain the occurrence of any violations that may arise T ← Clock Period for design from table 1
from the agent’s selected action on registers that supply CAT ← DIC[0] {Extract dictionary of clock arrival times}
CONNECT ← DIC[1] {Extract dictionary of register connectivity}
inputs to the current register, the function evaluates the slack Slackold ← DIC[2] {Extract slacks of previous step}
between the current register and all such input registers, using CATFF ← CAT (FF) + ds
the following formula: rto ← 0
rfrom ← 0
In equation (8), T represents the variable clock period TVtmp ← TV {create a local copy}
for the design, which varies across benchmarks. The newly for all FFc capture registers in CONNECT where FF is a launch register do
derived clock arrival time, CATnew , represents the adjusted tcq ← clock to Q for register FF {Stored in DIC}
tpd ← propagation delay from FF to FFc {Stored in DIC}
clock timing of the present register as influenced by the tsu ← setup time for register FFc {Stored in DIC}
agent, while CATfeeding−register signifies the clock arrival time Slack = T + CAT (FFc ) − CATFF − tcq − tpd − tsu
associated with the register feeding the input of the current if (Slack > T ) then {action invalid –> Return 0 reward}
Return (0,0,DIC,TV ,1)
register. Moreover, tplogic in formula (8) denotes the data path else if (0 < Slack < T ) then
delay between the current register and other registers serving if (FF_to_FFc in TVtmp ) then {reward for removing violation}
as inputs to the current register. The environment function remove FF_to_FFc from TVtmp
rto += −Slackold
comprises two distinct phases. During the initial phase, the end if
environment is responsible for calculating the newly adjusted else{ slack < 0}
clock arrival time, which corresponds to the current register. if (FF_to_FFc in TVtmp ) then {rto = change in negative slack}
rto += Slack − Slackold
Subsequently, utilizing formulas (7) and (8), it determines the else
slack related to registers that serve as inputs to the current Append FF_to_FFc to TVtmp
register, as well as registers that receive inputs from the rto += Slack {rto proportional to negative slack}
end if
current register. In the event that any slack calculated by end if
means of these formulas (7) and (8) surpasses the designated end for
clock period of the design, the environment shall prohibit for all FFl launch registers in CONNECT where FF is a capture register do
tcq ← clock to Q for register FFl {Stored in DIC}
the occurrence of the associated action, returning the initial tpd ← propagation delay from FFl to FF {Stored in DIC}
received values back to the agent. tsu ← setup time for register FF {Stored in DIC}
Should all calculated slacks during phase one prove less Slack = T + CATFF − CAT (FFl ) − tcq − tpd − tsu
if (Slack > T ) then {action invalid –> Return 0 reward}
than a period and should no violation arise, the environment Return (0,0,DIC,TV ,1)
will proceed to phase two computations. During phase two, else if (0 < Slack < T ) then
the environment shall yet again evaluate slack, employing if FFl _to_FF in TVtmp then {reward for removing violation}
remove FFl _to_FF from TVtmp
formulas (7) and (8) in assessing both the input registers rfrom += −Slackold
feeding the current register and the registers being fed by the end if
current register. In the event that the calculated slack returns else{ slack < 0}
if (FFl _to_FF in TVtmp ) then {rfrom = change in negative slack}
a value greater than zero, the environment shall register rfrom += Slack − Slackold
no violation. Subsequently, the environment shall inspect else
whether the register analyzed concerning the current register Append FFl _to_FF to TVtmp
rfrom += Slack {rto proportional to negative slack}
is on the violation list. If such is the case, and if the slack is end if
greater than zero, the environment shall exclude said register end if
from the list. end for
CAT (FF) ← CATFF {update the DIC}
However, in the event that the calculated slack amounts to σold ← Standard Deviation of 1
less than zero during phase two computations, it indicates δ ← CAT (FF) - Avg(1)
violations specific to the particular register, which may 1(FF) ← min(δ, T − δ). {skew is computed from closest clock edge}
σnew ← Standard Deviation of 1
include registers feeding the input of the current register or rσ ← σnew − σold . {change in standard deviation as reward}
registers being fed by the current register. rt ← rto + rfrom {reward/penalty for reducing/increasing timing violation}
Return (rσ , rt , DIC, TVtmp , 1)
Slack = (T + CATnew ) − (CATfeeding-register + tplogic ) (6) end procedure

VOLUME 11, 2023 87877

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

A fundamental aspect of this algorithm revolves around

the concept of adjusted cycle-to-cycle arrival time (CAT), Algorithm 2 SARSA Learning Agent
represented by the variable 1. If a register’s skew, either α ← 0.9 {Learning rate}
negative or positive, becomes substantial enough such that γ ← 0.9 {Discount factor}
ϵ ← 0.9 {Control knob for exploration vs exploitation}
its launch time draws closer to the mean arrival of the dBUF ← 30ps {delay of buffer used in clock tree}
clock, across all registers, in the previous or next cycle, s = 100 {Scale factor for reward}
steps, episodes, delay-Start ← from table 1 {Benchmark specific variables}
then the skew’s expansion propels it towards another clock start-point ← choose a start point register randomly
edge. Consequently, while broadening the CAT distribution, DICCAT ← build a dictionary of current design clock arrival times
one must consider the clock arrival distribution in the DICConnectivity ← build a dictionary for connectivity of each register
DIC ← [DICCAT , DICConnectivity ]
previous and next cycles to ensure that the distribution’s TV ← [] {empty list for storage of Timing Violations}
tail is not widened to the extent that it coincides with and 1 ← CAT {Adjusted CAT based on proximity to Clk edge (initially equal
contributes to the peak current in the previous or next cycle. to CAT)}
.
To tackle this situation, the paper introduced a dictionary of {————–Phase 1 ————– Delay start to warm up the Q-tables}
adjusted CAT, represented by 1. Initially, the CAT dictionary Initialize QM and QA (arbitrarily) {Move and Action Q-tables}
FF ← start-point
copied to 1. However, with each update to the clock for (st=1, st <= delay-start, st++) do
arrival time of register FF, the 1 dictionary is updated as Randomly select Actions a { a ∈ [add, remove, none] }
follows: if (a = add(x) || a = remove(x) ) then { x ∈ [1,5] = number of buffers}
ds ← x × dBUF
(rσ , rt , DIC, TV , 1) ← Env(ds , FF, DIC, TV , 1) {Algorithm 1}
δ ← CAT (FF) − Avg(1) (7) r ← rσ + s × rt {weighted reward}
1(FF) ← min(δ, T − δ) (8) QA (s, a) ← QA (s, a) + α · (r + γ · (QA (s′ , a′ ) − QA (s, a))
end if
Randomly select Actions m { m ∈ [U , D, L, R]}
The reward signal, denoted by rσ , is determined by FF ← Move in direction m of current FF
evaluating the alteration in the standard deviation of 1. QM (s, m) ← QM (s, m) + α · (r + γ · (QM (s′ , m′ ) − QM (s, m))
end for
This approach ensures that the agent’s decision-making .
process considers not only the mitigation of peak current {————–Phase 2 ————— SARSA Reinforcement Learning}
in a single clock transition but also over a sequence ξ ← 0.995 {ϵ decay factor}
α ← 0.9 {Learning rate}
of clock cycles, culminating in a more substantial over- γ ← 0.35 {Discount factor}
all reduction. The incorporation of the adapted CAT for (epi=0, epi <= episodes, epi++) do {episodes count in table 1}
dictionary, 1, further ensures the agent’s comprehen- FF ← start-point
DIC ← [DICCAT , DICConnectivity ]
sive understanding and consideration of the underlying TV ← [] {empty list for storage of Timing Violations}
circuit design, leading to more informed and effective for (st=0, st <= steps, st++) do
rand ← random ∈ [0, 1]
decision-making. if rand > ϵ then {Agent does exploitation}
a ← argmax(QA ) {Insertion exploitation}
B. PROBLEM SPECIFIC REINFORCEMENT ALGORITHM if (a = add(x) || remove(x)) then
ds ← x × dBUF
The reinforcement learning algorithm employed in the (rσ , rt , DIC, TV , 1) ← Env(ds , FF, DIC, TV , 1)
approach is detailed in Algorithm 2, which incorporates r ← rσ + s × rt {weighted reward}
QA (s, a) ← QA (s, a) + α · (r + γ · (QA (s′ , a′ ) − QA (s, a))
novel components alongside the conventional reinforcement m ← argmax(QM ) {Movement exploitation}
algorithm used for various other problems. Algorithm 2 FF ← Move in direction m of current FF
outlines the agent’s behavior and interaction with the envi- QM (s, m) ← QM (s, m) + α · (r + γ · (QM (s′ , m′ ) − QM (s, m))
end if
ronment, including taking actions, observing the subsequent else{(rand < ϵ) Agent does exploration}
state and reward, and updating its Q-tables. The agent’s Randomly select Actions a { a ∈ [add, remove, none] }
if (a = add(x) || a = remove(x) ) then { x ∈ [1,5]}
learning process can be divided into two stages: 1) delayed ds ← x × dBUF
initiation and 2) decaying epsilon-greedy SARSA learning. (rσ , rt , DIC, TV , 1) ← Env(ds , FF, DIC, TV , 1)
During the delayed initiation phase (phase 1), the agent is r ← rσ + s × rt {weighted reward}
QA (s, a) ← QA (s, a) + α · (r + γ · (QA (s′ , a′ ) − QA (s, a))
placed in a fully exploratory mode. In each step, it selects end if
two random actions—one for movement and another for Randomly select Actions m { m ∈ [U , D, L, R]}
FF ← Move in direction m of current FF
buffer insertion—drawn from the available action set. This QM (s, m) ← QM (s, m) + α · (r + γ · (QM (s′ , m′ ) − QM (s, m))
approach enables the agent to stochastically explore the end if
environment and populate its Q-tables with the outcomes or if (epi = episodes-1) then
ϵ ← 0 . { only exploit in the last episode}
values associated with potential actions for each state. In the else if (ϵ z 0.10) then
SARSA learning phase (phase 2), the agent adheres to an ϵ = ϵ × ξ { decay the ϵ to reduce exploration}
epsilon-greedy policy. The initial value of epsilon (ϵ) is set to else{if (ϵ > 0.10)}
ϵ = 0.10 {keep small exploration}
0.9, which facilitates a gradual transition from exploration to end if
exploitation as the agent learns and refines its understanding end for
end for
of the environment.

87878 VOLUME 11, 2023

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

With each iteration, a random figure ranging from 0 to 1 Algorithm 3 Modified-Q SARSA Learning Agent
is produced. The agent opts for a random action if Same is Previous Algorithm
for (st=1, st <= delay-start, st++) do
the generated number is smaller than ϵ, whereas it Randomly select Actions a { a ∈ [add, remove, none] }
selects the most advantageous actions for insertion/removal if (a = add(x) || a = remove(x) ) then { x ∈ [1,5] = number of
and movement (based on the values in the QM and buffers}
ds ← x × dBUF
QA Q-tables) when the generated figure surpasses ϵ. Imple- (rσ , rt , DIC, TV , 1) ← Env(ds , FF, DIC, TV , 1) {Algorithm 1}
menting an epsilon-greedy strategy enables the agent’s likeli- r ← rσ + s × rt {weighted reward}
hood of choosing random actions to diminish over time. This New Q-table formula for Decision
QA (s, a) ← 2QA (s, a) + α · (r + γ · (QA (s′ , a′ ) − QA (s, a))
approach empowers the agent to explore extensively at the end if
beginning and gradually shift its focus towards exploiting the Randomly select Actions m { m ∈ [U , D, L, R]}
maximum cumulative reward as time progresses. In the final FF ← Move in direction m of current FF
New Q-table formula for Movement
episode, the agent solely concentrates on exploiting, thereby QM (s, m) ← 2QM (s, m) + α · (r + γ · (QM (s′ , m′ ) − QM (s, m))
generating the highest attainable reward. To determine a end for
single reward value (r) for the chosen action, the agent .
{————-Phase 2 —————— SARSA Reinforcement Learning}
employs a weighted assessment of both rewards, calculated ξ ← 0.995 {ϵ decay factor}
using the subsequent formula: α ← 0.9 {Learning rate}
γ ← 0.35 {Discount factor}
for (epi=0, epi <= episodes, epi++) do {episodes count in table 1}
r = rσ + s × rt (9) FF ← start-point
DIC ← [DICCAT , DICConnectivity ]
The variable s represents a scalar multiplier, it serves to TV ← [] {empty list for storage of Timing Violations}
adjust the importance of a timing discrepancy with regards for (st=0, st <= steps, st++) do
rand ← random ∈ [0, 1]
to the gains or losses stemming from the expansion or if rand > ϵ then {Agent does exploitation}
contraction of clock arrival time dispersion. This scaling a ← argmax(QA ) {Insertion exploitation}
factor ensures that the penalties for timing transgressions are if (a = add(x) || remove(x)) then
ds ← x × dBUF
substantially greater than the rewards gained from broadening (rσ , rt , DIC, TV , 1) ← Env(ds , FF, DIC, TV , 1)
the distribution. r ← rσ + s × rt {weighted reward}
The agent is compelled to reduce or prevent the occurrence New Q-table formula for Decision
QA (s, a) ← 2QA (s, a) + α · (r + γ · (QA (s′ , a′ ) − QA (s, a))
of timing violations, except during its exploration phase, m ← argmax(QM ) {Movement exploitation}
where it seeks to rectify them in other areas. Upon FF ← Move in direction m of current FF
successfully addressing the timing violations, the agent is New Q-table formula for Movement
QM (s, m) ← 2QM (s, m) + α · (r + γ · (QM (s′ , m′ ) −
subsequently granted a substantial reward. A potential avenue QM (s, m))
for future research involves progressively increasing the value end if
of ‘s’ in each episode. This would enable the agent to generate else{(rand < ϵ) Agent does exploration}
Randomly select Actions a { a ∈ [add, remove, none] }
timing violations in earlier episodes and address them in if (a = add(x) || a = remove(x) ) then { x ∈ [1,5]}
subsequent ones, thereby enhancing the exploration process. ds ← x × dBUF
The paper intends to examine this approach in the subsequent (rσ , rt , DIC, TV , 1) ← Env(ds , FF, DIC, TV , 1)
r ← rσ + s × rt {weighted reward}
upcoming studies thoroughly. QA (s, a) ← 2QA (s, a) + α · (r + γ · (QA (s′ , a′ ) − QA (s, a))
Utilizing the aforementioned equation, the computed end if
reward will be employed in formula 3 to modify the Q-values Randomly select Actions m { m ∈ [U , D, L, R]}
FF ← Move in direction m of current FF
correlated with state-actions (such as buffer insertion/removal QM (s, m) ← 2QM (s, m)+α ·(r +γ ·(QM (s′ , m′ )−QM (s, m))
and motion) for each of the two Q-tables (QM and QA ). end if
This derived reward will have a direct impact on the agent’s if (epi = episodes-1) then
ϵ ← 0 . { only exploit in the last episode}
conduct during subsequent encounters (in future episodes) else if (ϵ > 0.10) then
when the agent revisits this particular state. ϵ = ϵ × ξ { decay the ϵ to reduce exploration}
else{if (ϵ > 0.10)}
ϵ = 0.10 {keep small exploration}
C. PROBLEM SPECIFIC MODIFIED REINFORCEMENT end if
ALGORITHM end for
end for
In order to expedite the learning process of the agent in
disclosed problem-specific solution, The paper introduces a
minor modification in reinforcement learning while simulta-
neously elevating the agent’s learning saturation to a superior adjusted as described below:
level; the method includes not replacing the old Q-value QA (s, a) ← 2QA (s, a) + α · (r + γ · (QA (s′ , a′ ) − QA (s, a))
associated with the position and decision with a new one,
(10)
instead replacing the old Q-value with a new one obtained by
adding the old and new Q values associated by the position QM (s, m) ← 2QM (s, m)+α · (r +γ ·(QM (s′ , m′ ) − QM (s, m))
and decision, the method for updating both Q-tables will be (11)

VOLUME 11, 2023 87879

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

FIGURE 3. Increase and saturation of the standard deviation (σ ) of the register’s clock arrival time (CAT) distribution over the number of
reinforcement learning episodes. The saturation of σ indicates that the RL model can no longer increase the CAT distribution, and the RL solution
can be terminated.

FIGURE 4. Increase and saturation of the standard deviation (σ ) of the register’s clock arrival time (CAT) distribution over the number of
reinforcement learning episodes. The saturation of σ indicates that the RL model can no longer increase the CAT distribution, and the RL solution
can be terminated.

In other word the algorithm would be modified as follow, set to 20X the number of registers in the benchmark (e.g.,
in algorithm 3 200K steps for AES in each episode).
Fig. 4 illustrates how the agent learning behavior
V. EXPERIMENTS improves as the paper proposes the second approach,
In this work, the reinforcement learning approach has the problem-specific modified Reinforcement Learning
been employed based on the environment outlined in Algorithm, to increase the clock arrival time distribution over
algorithms 1 and 3, along with the agent as specified in time for each benchmark over time. Consider the learning
algorithm 2. The solutions were evaluated on a selection of curve for the AES benchmark: After around 500 episodes of
larger ISCAS89 benchmarks [35], as well as the AES and learning, the learning curve saturates, and the agent can no
Ethernet benchmarks. For each benchmark, the benchmark longer increase the CAT distribution. The number of steps in
implemented in Synopsys ICC2 [25] using two distinct each episode is roughly set to 20X the number of registers in
methods: the standard Place and Route (PnR) flow, and the benchmark (e.g., 200K steps for AES in each episode).
an alternative approach that utilized clock arrival times The aforementioned Fig. 4 clearly discloses at least 20%
generated through reinforcement learning. In the latter faster saturation across all benchmarks.
approach, the reinforcement learning-derived clock arrival
times were supplied to ICC2 as recommended arrival B. CHANGE IN CLOCK ARRIVAL TIMES
times via the ‘‘set_clock_balance_points’’ command. It is The histogram depicted in Fig. 5 exhibits the distribution
important to note that the final outcome of ICC2 might not of clock arrival times in two scenarios - one with the aid
exactly match the CAT list, since this input serves merely of the RL agent and the other without it. The clock arrival
as a suggestion and is implemented to the extent feasible. time (CAT) distribution is obtained post clock tree synthesis
Following this, each design underwent static timing analysis (CTS), wherein the baseline CAT is generated following the
and timing closure before being subject to power and IR standard ICC2 CTS flow. In contrast, the CAT for the RL
analysis using ANSYS Redhawk [57]. The remainder of this approach is derived by supplying the recommended clock
section will comprehensively discuss the results of performed arrival times (output of RL) to ICC2, executing the CTS, and
experiments. obtaining the resultant CAT distribution. The utilization of the
RL agent has considerably broadened the CAT distribution,
A. AGENT LEARNING BEHAVIOR as depicted in the graph. This broadening is anticipated to
Fig. 3 illustrates how the agent learns to increase the clock lead to a corresponding decrease in peak current and IR drop
arrival time distribution over time for each benchmark over of the design.
time. Consider the learning curve for the AES benchmark: As The histogram depicted in Fig. 6 exhibits the distribution
illustrated after around 700 episodes of learning, the learning of clock arrival times in three scenarios, without the
curve saturates, and the agent can no longer increase the CAT employment of RL solution, with the employment of RL
distribution. The number of steps in each episode is roughly solution, and the third is the use of the modified solution,

87880 VOLUME 11, 2023

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

FIGURE 5. Histogram of Clock Arrival Time to endpoint registers before and after running the RL solution for peak current reduction. The broader
distribution of clock arrival times (without causing any timing violation) reduces the extent of simultaneous switching and, in turn, would reduce
the peak current and inductive voltage drop.

FIGURE 6. Histogram of Clock Arrival Time to endpoint registers for peak current reduction before running RL, after running the RL solution, and
after running the modified RL using the new Q-table updating technique. The broader distribution of clock arrival times (without causing any timing
violation) reduces the extent of simultaneous switching and, in turn, would reduce the peak current and inductive voltage drop.

FIGURE 7. Current Waveform of dynamic vector-less simulation of benchmarks in 10 consecutive cycles before and after application of RL solution
for peak current reduction. As illustrated, the wider CAT distribution reduces the demanded battery current across all benchmarks.

wherein the Q-table update is being modified to increase the C. IMPACT ON PEAK CURRENT
learning speed of the RL while improving the results. The Fig. 7 captures the impact of RL solution in peak current
clock arrival time (CAT) distribution is obtained post clock reduction for selected benchmarks over ten cycles of
tree synthesis (CTS), wherein the baseline CAT is generated execution. The peak current Figs. are generated using Ansys
following the standard ICC2 CTS flow. Redhawk [57] where the switching activity of clock and

VOLUME 11, 2023 87881

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

FIGURE 8. Current Waveform of dynamic vector-less simulation of benchmarks in 10 consecutive cycles before and after application of RL solution
for peak current reduction. As illustrated, the wider CAT distribution reduces the demanded battery current across all benchmarks.

FIGURE 9. Current Waveform of dynamic vector-less simulation of benchmarks in 10 consecutive cycles before and after application of RL solution
for peak current reduction. As illustrated, the wider CAT distribution reduces the demanded battery current across all benchmarks.

data in each clock cycle is set to 200% (rise and fall) TABLE 1. Experiments parameters. The clock periods for each benchmark
are selected based on highest achievable performance when using
and 10% respectively. As illustrated, the peak current for standard Vt cells for physical design. Number of steps in each episode is
all benchmarks has reduced between 35% to 40%. This roughly 15X to 20X the number of registers.
is the direct result of spreading the clock arrival time of
the registers and the resulting reduction in simultaneous
switching activity.
Fig. 8 discloses the combined design peak current under
the three scenarios without RL, with RL solution, and with
modified RL solution for selected benchmarks generated
using Ansys Redhawk [57]. Furthermore, Fig. 9 is to disclose
the design peak current variations between two RL and employed the methodology described in [31] to compute the
modified RL scenarios for selected benchmark where in both delay equivalent voltage VDEV of each design (minimum
figures, the switching activity of clock and data in each clock voltage seen by transistors at launch) in both base and
cycle is set to 200% (rise and fall) and 10% respectively. RL assisted design. Furthermore, the paper computed the
As illustrated by these figures, as a consequence of the maximum cycle to cycle voltage noise (from two consecutive
additional spreading of the clock arrival time of the registers, cycles) using a vector-less IR simulation and reported the
the simultaneous switching activity is reduced further, and voltage noise. The result of this analysis is reflected in
the peak current for the modified RL solution provides Table 2. As illustrated, by using the RL solution for peak
an additional 10% to 15% reduction in comparison to the current reduction, the transistors, on average, see a higher rail
presented RL solution initially. voltage. At the same time, the extent of cycle-to-cycle voltage
variation is reduced. The decline in voltage noise relaxes the
D. IMPACT ON IR DROP requirement uncertainty margin (accounting for smaller clock
When the peak current decreases, it dramatically affects how jitter). Combining a higher overall rail voltage and smaller
much voltage drops between the package and the transistor uncertainty (voltage noise) increases the extent of available
level. This is mainly because there’s less of an inductive timing slack in each timing path. This, in turn, could be used
voltage drop caused by Ldi/dt. In simpler terms, when to improve the PPA optimality of the end design.
the current changes slower due to the clock signals being
spread out more, it reduces the inductive voltage drop. You E. SCALAR FACTOR ‘‘S’’
can see the results in Fig. 10, which shows how different One of the innovative aspects of the reinforcement learning
benchmarks were affected by this change when using the base solution applied to the problem disclosed by this work
and RL-assisted CTS flow. is determining the reward given to the agent after each
To measure how much such reduction in IR drop improves action is taken. One of the essential aspects of creating
the rail voltages seen at the transistor level, the paper the reinforcement learning solution and environment is

87882 VOLUME 11, 2023

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

FIGURE 10. The IR map of benchmarks before and after applicaiton of RL solution for peak current reduction. The reduction in simultaneous
switching reduces the di /dt , which in turns result in reduction in inductive IR drop (Ldi /dt ).

TABLE 2. Summary of improvement in Voltage drop and reduction in peak current for selected benchmarks. The Min Rail voltage is computed using the
methodology described in [31]. The Max C2C voltage noise is the maximum amount of change in the voltage (from launch to capture) within one Cycle
(voltage noise). The lower the Voltage noise, the smaller the clock jitter and the resulting requirement for endpoint uncertainty. As illustrated, the RL
solution resulted in a reduction in peak current, improvement in rail voltage, and a decline in the extent of voltage noise.

FIGURE 11. Increase and saturation of the standard deviation (σ ) of register’s clock arrival time (CAT) distribution over the number of
reinforcement learning episodes based on different s-values.

determining the reward mechanism, which is an integral part Fig. 11 discloses the increase of standard deviation of
of the agent learning process. As it was disclosed above, various selected designs register’s clock arrival time (CAT)
by algorithms 2 and 3 the two reward value returned by distribution over the 1000 episodes of reinforcement learning
the environment (rσ ) and (rt ) are combined using the scalar based on different s-values. For this experiment, three
factor ‘‘S’’. variable size benchmarks selected, s13207, s35932, and
AES128, each having 649, 1728, and 10015 registers. The
r = rσ + s × rt (12)
experiment used different s-values s=10, 100, 1000, 2000.
VOLUME 11, 2023 87883
S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

As disclosed by Fig. 11, lowering the value of the scalar [11] K. E. Murray and V. Betz, ‘‘Adaptive FPGA placement
‘‘s’’ slower the rise of the learning curve, which means it optimization via reinforcement learning,’’ in Proc. ACM/IEEE 1st
Workshop Mach. Learn. CAD (MLCAD), Sep. 2019, pp. 1–6, doi:
takes more episodes to achieve the saturation of the CAT 10.1109/MLCAD48534.2019.9142079.
distribution, which is an indicator of the agent learning [12] U. Farooq, N. U. Hasan, I. Baig, and M. Zghaibeh, ‘‘Efficient
completion. As the value ‘‘S’’ increases the slope of the curve FPGA routing using reinforcement learning,’’ in Proc. 12th Int.
Conf. Inf. Commun. Syst. (ICICS), May 2021, pp. 106–111, doi:
become steeper the agent reaches the learning saturation 10.1109/ICICS52457.2021.9464626.
quicker. At the same time, there is a need to have a balance; [13] C. Brej and J. D. Garside, ‘‘A quasi-delay-insensitive method to
if the s-value goes too high, the agent will achieve saturation overcome transistor variation,’’ in Proc. 18th Int. Conf. VLSI Design Held
Jointly, 4th Int. Conf. Embedded Syst. Design, 2005, pp. 451–456, doi:
quicker; however, it would not discover new Pareto Frontier 10.1109/ICVD.2005.30.
optimal that it would if the learning processes is continued [14] X.-W. Shih and Y.-W. Chang, ‘‘Fast timing-model independent
for more number of episodes. buffered clock-tree synthesis,’’ IEEE Trans. Comput.-Aided Design
Integr. Circuits Syst., vol. 31, no. 9, pp. 1393–1404, Sep. 2012, doi:
Another thing to consider when choosing the s-value is 10.1109/TCAD.2012.2191554.
memory usage. Larger s-value increases the reward size for [15] M. Donno, E. Macii, and L. Mazzoni, ‘‘Power-aware clock tree planning,’’
each reward which, in return, there would a need for more in Proc. Int. Symp. Phys. Design, Apr. 2004, pp. 138–147.
[16] M. A. Elgamma, K. E. Murray, and V. Betz, ‘‘Learn to place: FPGA
significant memory when running the experiments. placement using reinforcement learning and directed moves,’’ in Proc.
Int. Conf. Field-Program. Technol. (ICFPT), Dec. 2020, pp. 85–93, doi:
VI. CONCLUSION 10.1109/ICFPT51103.2020.00021.
[17] C. Xu, P.-Y. Chen, D. Niu, Y. Zheng, S. Yu, and Y. Xie, ‘‘Architecting
The work disclosed by this paper conducted a study that 3D vertical resistive memory for next-generation storage systems,’’ in
aimed to reduce peak current during the Clock Tree Synthesis Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2014,
stage of a design using Reinforcement Learning. Specifically, pp. 224–229, doi: 10.1109/ICCAD.2014.7001329.
[18] Y. Kwon, J. Jung, I. Han, and Y. Shin, ‘‘Transient clock power estimation
the work used a decaying epsilon-greedy SARSA approach. pre-CTS netlist,’’ in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS),
The findings showed that when utilizing the proposed RL- May 2018, pp. 1–4, doi: 10.1109/ISCAS.2018.8351430.
based solution, peak current decreased by 35-40% compared [19] J. Gu, G. Qu, L. Yuan, and Q. Zhou, ‘‘Peak current reduction by
simultaneous state replication re-encoding,’’ in Proc. IEEE/ACM Int.
to a baseline design generated using a heuristic Clock-Tree Conf. Comput.-Aided Design (ICCAD), Nov. 2010, pp. 592–595, doi:
Synthesis solution of the physical design EDA tool. The paper 10.1109/ICCAD.2010.5654204.
also discovered that a second RL algorithm with a modified [20] Y.-C. Lu, J. Lee, A. Agnesina, K. Samadi, and S. K. Lim,
‘‘GAN-CTS: A generative adversarial framework for clock tree prediction
Q-table update produced a design that led to an extra 10-15% and optimization,’’ in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design
decrease in peak current compared to the original RL-based (ICCAD), Nov. 2019, pp. 1–8, doi: 10.1109/ICCAD45719.2019.8942063.
solution. By reducing peak current inductive voltage drop and [21] A. B. Kahng, B. Lin, and S. Nath, ‘‘High-dimensional metamodeling for
prediction of clock tree synthesis outcomes,’’ in Proc. ACM/IEEE Int.
voltage noise decreased across selected benchmarks. Workshop Syst. Level Interconnect Predict. (SLIP), Jun. 2013, pp. 1–7, doi:
10.1109/SLIP.2013.6681685.
REFERENCES [22] M. Ancona, E. Ceolini, C. Öztireli, and M. Gross, ‘‘Towards better
understanding of gradient-based attribution methods for deep neural
[1] Y.-T. Nieh, S.-H. Huang, and S.-Y. Hsu, ‘‘Minimizing peak current via networks,’’ 2017, arXiv:1711.06104.
opposite-phase clock tree,’’ in Proc. 42nd Design Autom. Conf., 2005, [23] W. Li, M. E. Dehkordi, S. Yang, and D. Z. Pan, ‘‘Simultaneous
pp. 182–185, doi: 10.1109/DAC.2005.193797. placement and clock tree construction for modern FPGAs,’’ in Proc.
[2] M. Edahiro, ‘‘A clustering-based optimization algorithm in zero-skew rout- ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, 2019, pp. 132–141,
ings,’’ in Proc. 30th Int. Design Autom. Conf. (DAC), 1993, pp. 612–616. doi: 10.1145/3289600.3289631.
[3] A. Mukherjee and R. Sankaranarayan, ‘‘Retiming and clock scheduling [24] M. Liu, Z. Zhang, J. Wen, and Y. Jia, ‘‘An approximate symmetry
to minimize simultaneous switching,’’ in Proc. IEEE Int. SOC Conf., clock tree design with routing topology prediction,’’ in Proc. IEEE Int.
Sep. 2004, pp. 259–262, doi: 10.1109/SOCC.2004.1362427. Midwest Symp. Circuits Syst. (MWSCAS), Aug. 2021, pp. 92–96, doi:
[4] L. P. Kaelbling, M. L. Littman, and A. W. Moore, ‘‘Reinforcement 10.1109/MWSCAS47672.2021.9531772.
learning: A survey,’’ J. Artif. Intell. Res., vol. 4, no. 1, pp. 237–285, [25] Synopsys. Synopsys EDA Tools. Accessed: Jul. 10, 2020. [Online].
Jan. 1996. Available: http://synopsys.com/
[5] A. Vittal and M. Marek-Sadowska, ‘‘Low-power buffered clock tree [26] Y. Kaplan and S. Wimer, ‘‘Post optimization of a clock tree for power
design,’’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 16, supply noise reduction,’’ in Proc. IEEE 27th Conv. Electr. Electron. Eng.
no. 9, pp. 965–975, Sep. 1997. Isr., Nov. 2012, pp. 1–5, doi: 10.1109/EEEI.2012.6377136.
[6] W.-H. Chang, C.-H. Lin, S.-P. Mu, L.-D. Chen, C.-H. Tsai, Y.-C. Chiu, and [27] R. Chaturvedi and J. Hu, ‘‘Buffered clock tree for high-quality IC
M. C.-T. Chao, ‘‘Generating routing-driven power distribution networks design,’’ in Proc. Int. Symp. Signals, Circuits Syst., 2004, pp. 381–386, doi:
with machine-learning technique,’’ IEEE Trans. Comput.-Aided Design 10.1109/ISSCS.2003.1223769.
Integr. Circuits Syst., vol. 36, no. 8, pp. 1237–1250, Aug. 2017. [28] Y.-Y. Chen, C. Dong, and D. Chen, ‘‘Clock tree synthesis under aggressive
[7] P. Vuillod, L. Benini, A. Bogliolo, and G. D. Micheli, ‘‘Clock-skew buffer insertion,’’ in Proc. 47th Design Autom. Conf., Jun. 2010, pp. 86–89.
optimization for peak current reduction,’’ in Proc. Int. Symp. Low Power [29] K. I. Gubbi, S. A. Beheshti-Shirazi, T. Sheaves, S. Salehi, P. D. S. Manoj,
Electron. Design, 1996, pp. 265–270. S. Rafatirad, A. Sasan, and H. Homayoun, ‘‘Survey of machine learning for
[8] C.-T. Ho and A. B. Kahng, ‘‘IncPIRD: Fast learning-based predict of incre- electronic design automation,’’ in Proc. Great Lakes Symp. (VLSI), 2022,
mental IR drop,’’ in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design pp. 513–518, doi: 10.1145/3571502.3571510.
(ICCAD), Nov. 2019, pp. 1–8, doi: 10.1109/ICCAD45719.2019.8942110. [30] S. A. Beheshti-Shirazi, A. Vakil, S. Manoj, I. Savidis, H. Homayoun, and
[9] A. Vijayakumar, V. C. Patil, and S. Kundu, ‘‘An efficient method for clock A. Sasan, ‘‘A reinforced learning solution for clock skew engineering to
skew scheduling to reduce peak current,’’ in Proc. 29th Int. Conf. VLSI reduce peak current and IR drop,’’ in Proc. Great Lakes Symp. (VLSI),
Design 15th Int. Conf. Embedded Syst. (VLSID), Jan. 2016, pp. 505–510, 2021, pp. 181–187, doi: 10.1145/3458776.3460066.
doi: 10.1109/VLSID.2016.24. [31] A. Vakil, H. Homayoun, and A. Sasan, ‘‘IR-ATA: IR annotated timing
[10] W.-C.-D. Lam, C.-K. Koh, and C.-W.-A. Tsao, ‘‘Power supply noise analysis, a flow for closing the loop between PDN design, IR analysis &
suppression via clock skew scheduling,’’ in Proc. Int. Symp. Quality timing closure,’’ in Proc. 24th Asia South Pacific Design Autom. Conf.,
Electron. Design, 2002, pp. 355–360, doi: 10.1109/ISQED.2002.996772. Jan. 2019, pp. 152–159.

87884 VOLUME 11, 2023

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

[32] Y. P. Chen and D. F. Wong, ‘‘An algorithm for zero-skew clock tree routing [52] T. Sharma, S. Kolluru, and K. S. Stevens, ‘‘Learning based tim-
with buffer insertion,’’ in Proc. ED & TC Eur. Design Test Conf., 1996, ing closure on relative timed design,’’ in Proc. IFIP/IEEE Int.
pp. 230–236. Conf. Very Large Scale Integr.-Syst. Chip, Jul. 2020, pp. 133–148,
[33] B. Gunna, L. Bhamidipati, H. Homayoun, and A. Sasan, ‘‘Spatial and doi: 10.1007/978-3-030-59850-1_10.
temporal scheduling of clock arrival times for IR hot-spot mitigation, [53] D. S. Lopera, L. Servadei, V. P. Kasi, S. Prebeck, and W. Ecker,
reformulation of peak current reduction,’’ in Proc. IEEE/ACM Int. ‘‘RTL delay prediction using neural networks,’’ in Proc. IEEE
Symp. Low Power Electron. Design (ISLPED), Jul. 2017, pp. 1–6, doi: Nordic Circuits Syst. Conf. (NorCAS), Oct. 2021, pp. 1–7,
10.1109/ISLPED.2017.8009179. doi: 10.1109/NorCAS53631.2021.9599868.
[34] C.-T. Ho and A. B. Kahng, ‘‘IncPIRD: Fast learning-based [54] A. Han, Z. Zhao, C. Feng, and S. Zhang, ‘‘Stage-based path delay predic-
prediction of incremental IR drop,’’ in Proc. IEEE/ACM Int. tion with customized machine learning technique,’’ in Proc. IEEE/ACM
Conf. Comput.-Aided Design (ICCAD), Nov. 2019, pp. 1–8, doi: Int. Conf. Comput.-Aided Design (ICCAD), Aug. 2021, pp. 1–8, doi:
10.1109/ICCAD45719.2019.8942110. 10.1109/ICCAD53799.2021.9598330.
[35] (2020). ISCAS’89 Benchmark Circuits. Accessed: Sep. 30, 2020. [Online]. [55] S. Zhang and S. Zhang, ‘‘Time and power constrained chip multiprocessor
Available: http://www.pld.ttu.ee/~maksim/benchmarks/iscas89/verilog/ energy optimization using machine learning techniques,’’ in Proc. 5th
Int. Conf. Electron. Inf. Technol. Comput. Eng., 2021, pp. 926–933, doi:
[36] I.-M. Liu, T.-L. Chou, A. Aziz, and D. F. Wong, ‘‘Zero-skew clock tree
10.1145/3466186.3466517.
construction by simultaneous routing, wire sizing buffer insertion,’’ in
[56] S.-Y. Lin, Y.-C. Fang, Y.-C. Li, Y.-C. Liu, T.-S. Yang, S.-C. Lin, C.-M. Li,
Proc. Int. Symp. Phys. Design, 2000, pp. 33–38.
and E. J. Fang, ‘‘IR drop prediction of ECO-revised circuits using machine
[37] R. Samanta, J. Hu, and P. Li, ‘‘Discrete buffer and wire sizing for link-based learning,’’ in Proc. IEEE 36th VLSI Test Symp. (VTS), Apr. 2018, pp. 1–6,
non-tree clock networks,’’ IEEE Trans. Very Large Scale Integr. (VLSI) doi: 10.1109/VTS.2018.8368657.
Syst., vol. 18, no. 7, pp. 1025–1035, Jul. 2010. [57] ANSYS Apache. Redhawk. Accessed: Nov. 17, 2020. [Online]. Available:
[38] Y.-C. Fang, H.-Y. Lin, M.-Y. Sui, C.-M. Li, and E. J. Fang, ‘‘Machine- https://www.apache-da.com/products/redhawk
learning-based dynamic IR drop prediction for ECO,’’ in Proc. IEEE/ACM
Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2018, pp. 1–7.
[39] S.-H. Huang, C.-M. Chang, and Y.-T. Nieh, ‘‘Fast multi-domain clock skew SAYED ARESH BEHESHTI-SHIRAZI received
scheduling for peak current reduction,’’ in Proc. Asia South Pacific Conf. the B.Sc. degree in electrical engineering from
Design Autom., 2006, doi: 10.1109/ASPDAC.2006.1594691. Qazvin Islamic Azad University (QIAU), Iran,
[40] Z. Xie, H. Li, X. Xu, J. Hu, and Y. Chen, ‘‘Fast IR drop estimation in 2006, and the M.Sc. degree in electrical
with machine learning,’’ in Proc. 39th Int. Conf. Comput.-Aided Design, and computer engineering from George Mason
Nov. 2020, pp. 1–8, doi: 10.1145/3400302.3415763. University (GMU), USA, in 2010, where
[41] Z. Xie, H. Ren, B. Khailany, Y. Sheng, S. Santosh, J. Hu, and he is currently pursuing the Ph.D. degree.
Y. Chen, ‘‘PowerNet: Transferable dynamic IR drop estimation via From 2019 to 2021, he was with the GATE
maximum convolutional neural network,’’ in Proc. 25th Asia South Research Laboratory, GMU, under the supervision
Pacific Design Autom. Conf. (ASP-DAC), Jan. 2020, pp. 13–18,
of Dr. Avesta Sasan, focusing on applied machine
doi: 10.1109/ASP-DAC47756.2020.9045574.
learning computer-aided design (CAD) and reinforcement learning.
[42] H.-Y. Lin, Y.-C. Fang, S.-T. Liu, J.-X. Chen, C.-M. Li, and
Currently, he is the Assistant Director of the Innovation Laboratory,
E. J. Fang, ‘‘Automatic IR-drop ECO using machine learning,’’ in
Government Accountability Office (GAO). He oversees cloud analytic and
Proc. IEEE Int. Test Conf. Asia (ITC-Asia), Sep. 2020, pp. 7–12,
doi: 10.1109/ITC-Asia51099.2020.00013. machine learning programs and products in this capacity. In addition, he has
[43] V. A. Chhabria, Y. Zhang, H. Ren, B. Keller, B. Khailany, and a career spanning over a decade in various roles at the United States
S. S. Sapatnekar, ‘‘MAVIREC: ML-aided vectored IR-drop estimation and Patent and Trademark Office (USPTO), excelling as a master’s level Patent
classification,’’ in Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE), Examiner and leading the Cybersecurity Division’s Continues Diagnostics
Feb. 2021, pp. 1825–1828, doi: 10.23919/DATE51398.2021.9473914. and Mitigation (CDM) Program. He also played a pivotal role in revitalizing
[44] P. Huang, C. Ma, and Z. Wu, ‘‘Fast dynamic IR-drop prediction using globally accessible tools, such as the Global Dossier (GD) and One Patent
machine learning in bulk FinFET technologies,’’ Symmetry, vol. 13, no. 10, Dossier Programs (OPD) at the Office of International Patent Cooperation
p. 1807, Sep. 2021, doi: 10.3390/sym13101807. (OIPC).
[45] S. Saurabh, H. Shah, and S. Singh, ‘‘Timing closure problem: Review of
challenges at advanced process nodes and solutions,’’ IETE Tech. Rev.,
NAJMEH NAZARI received the B.Sc. degree
vol. 35, no. 4, pp. 349–358, 2018.
in computer engineering from Shiraz University,
[46] A. B. Kahng, U. Mallappa, and L. Saul, ‘‘Using machine learning to
in 2010, and the M.Sc. degree in computer engi-
predict path-based slack from graph-based timing analysis,’’ in Proc. IEEE
neering from the Isfahan University of Technol-
36th Int. Conf. Comput. Design (ICCD), Oct. 2018, pp. 603–612, doi:
10.1109/ICCD.2018.00096. ogy, in 2013. She is currently pursuing the Ph.D.
[47] L. Bhamidipati, B. Gunna, H. Homayoun, and A. Sasan, ‘‘A power delivery degree with the ECE Department, University of
network and cell placement aware IR-drop mitigation technique: Harvest- California, Davis. From 2013 to 2015, she was a
ing unused timing slacks to schedule useful skews,’’ in Proc. IEEE Comput. Lecturer with the Shahid Chamran University of
Soc. Annu. Symp. VLSI (ISVLSI), Jul. 2017, pp. 272–277. Ahwaz. Her research interests include deep learn-
[48] H. M. Makrani, F. Farahmand, H. Sayadi, S. Bondi, S. M. P. Dinakarrao, H. ing, computer architecture, embedded systems,
Homayoun, and S. Rafatirad, ‘‘Pyramid: Machine learning framework to applied machine learning, and hardware security.
estimate the optimal timing and resource usage of a high-level synthesis
design,’’ in Proc. 29th Int. Conf. Field Program. Log. Appl. (FPL),
KEVIN IMMANUEL GUBBI received the B.Sc.
Sep. 2019, pp. 397–403, doi: 10.1109/FPL.2019.00069.
degree in electrical and electronics engineering
[49] A. B. Kahng, U. Mallappa, L. Saul, and S. Tong, ‘‘‘Unobserved corner’
from Anna University, in 2018, and the M.Sc.
prediction: Reducing timing analysis effort for faster design convergence
in advanced-node design,’’ in Proc. Design, Autom. Test Eur. Conf. Exhib. degree in computer engineering from San Fran-
(DATE), 2019, pp. 168–173, doi: 10.23919/DATE.2019.8715219. cisco State University, in 2021. He is currently
[50] O. V. S. S. Ram and S. Saurabh, ‘‘Modeling multiple-input switching pursuing the Ph.D. degree with the Electrical and
in timing analysis using machine learning,’’ IEEE Trans. Comput.-Aided Computer Engineering Department, University of
Design Integr. Circuits Syst., vol. 40, no. 4, pp. 723–734, Apr. 2021, doi: California, Davis. From 2021 to 2023, he was
10.1109/TCAD.2020.3009624. a Graduate Research Assistant with the ASEEC
[51] M. A. Savari and H. Jahanirad, ‘‘NN-SSTA: A deep neural network Laboratory and the GATE Laboratory, University
approach for statistical static timing analysis,’’ Expert Syst. Appl., vol. 149, of California, Davis. His research interests include electronic design
Jul. 2020, Art. no. 113309, doi: 10.1016/j.eswa.2020.113309. automation, hardware security, VLSI, and applied machine learning.

VOLUME 11, 2023 87885

S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering

BANAFSHEH SABER LATIBARI (Graduate Stu- AVESTA SASAN received the B.Sc. degree
dent Member, IEEE) received the B.Sc. degree (summa cum laude) (Hons.) in computer engi-
in computer engineering from the K. N. Toosi neering and the M.Sc. and Ph.D. degrees in
University of Technology, in 2014, and the M.Sc. electrical and computer engineering from the
degree in computer architecture from the Sharif University of California, Irvine, in 2005, 2006,
University of Technology, in 2017. She is currently and 2010, respectively. In 2010, he joined the
pursuing the Ph.D. degree with the Electrical and Office of CTO in Broadcom Company, working
Computer Engineering Department, University of on the physical design and implementation of
California, Davis. From 2019 to 2021, she was ARM processors, as a Physical Designer, a Timing
a Graduate Research Assistant with the GATE Signoff Specialist, and the lead of signal and
Laboratory, George Mason University. Her research interests include applied power integrity signoff. In 2014, he was recruited by Qualcomm Office
machine learning and computer architecture. of VLSI Technology, where he has developed different methodologies
and in-house EDAs for accurate signoff and analysis of hardened ASIC
solutions. He joined George Mason University, in 2016, as an Associate
Professor with the Department of Electrical and Computer Engineering,
while simultaneously as the Associate Chair of research. In 2021, he joined
SETAREH RAFATIRAD received the M.Sc. and the Faculty at the Electrical and Computer Engineering Department,
Ph.D. degrees in computer science from the University of California, Davis. His research interests include hardware
University of California, Irvine, in 2009 and 2012, security, machine learning, neuromorphic computing, low power design and
respectively. She is an Associate Professor with methodology, approximate computing, and the Internet of Things (IoT).
the Department of Computer Science, University
of California, Davis. Prior to that, she was an
Associate Term Professor with the Department
of Information Sciences and Technology, George
Mason University. Her research interests include
applied machine learning, IoT security, and natural
language processing.

P. D. SAI MANOJ received the Ph.D. degree

in electrical and electronics engineering from
Nanyang Technological University, Singapore.
HOUMAN HOMAYOUN is currently a Professor He is an Assistant Professor with George Mason
with the Department of Electrical and Computer University. His research interests include on-chip
Engineering, University of California, Davis. hardware security, neuromorphic computing,
He is also the Director of the National Science adversarial machine learning, self-aware SoC
Foundation Center for Hardware and Embedded design, image processing and time-series analysis,
Systems Security and Trust (CHEST). He con- emerging memory devices, and heterogeneous
ducts research in hardware security and trust, integration techniques. He won the Best Paper
applied machine learning and AI, data-intensive Award in International Conference on Data Mining, in 2019. His works
computing, and heterogeneous computing, where were nominated for the Best Paper Award in prestigious conferences, such as
he has published more than 200 technical papers Design Automation and Test in Europe (DATE), in 2018, and International
in prestigious conferences and journals on the subject and directed over 10M Conference on Consumer Electronics, in 2020. He has won Xilinx Open
dollars in research funding from NSF, DARPA, AFRL, NIST, US Congress, Hardware Contest, in 2017 (student category).
and various industrial sponsors.

87886 VOLUME 11, 2023

Low-Current Systems Engineer’S Technical Handbook: A Guide to Design and Supervision
From Everand
Low-Current Systems Engineer’S Technical Handbook: A Guide to Design and Supervision
Habbieb T. Mansour
5/5 (2)
Maintenance Training of IP Camera V2.0
No ratings yet
Maintenance Training of IP Camera V2.0
25 pages
Clock Tree Synthesis Presentation by Sudhir Kumar Madhi
100% (2)
Clock Tree Synthesis Presentation by Sudhir Kumar Madhi
69 pages
213 Overhead Projector Proyector
No ratings yet
213 Overhead Projector Proyector
29 pages
Systematic Approach in Building Clock Tree For SOC's
No ratings yet
Systematic Approach in Building Clock Tree For SOC's
3 pages
An Efficient Clock Tree Synthesis Method in Physical Design: December 2009
No ratings yet
An Efficient Clock Tree Synthesis Method in Physical Design: December 2009
5 pages
ClockGating Cts
No ratings yet
ClockGating Cts
8 pages
Clock Gating
No ratings yet
Clock Gating
7 pages
PAGE9
No ratings yet
PAGE9
6 pages
Clock Skew: Hold Time Requirements
100% (1)
Clock Skew: Hold Time Requirements
4 pages
Clock-Tree-Synthesis
No ratings yet
Clock-Tree-Synthesis
68 pages
Clock Tree Synthesis: Presentation by Sudhir Kumar Madhi
100% (2)
Clock Tree Synthesis: Presentation by Sudhir Kumar Madhi
69 pages
Shift_Power_Reduction_in_High-Performance_Clock_Network_Designs
No ratings yet
Shift_Power_Reduction_in_High-Performance_Clock_Network_Designs
6 pages
Techniques To Reduce Timing Violations Using Clock Tree Optimizations in Synopsys ICC2
100% (2)
Techniques To Reduce Timing Violations Using Clock Tree Optimizations in Synopsys ICC2
10 pages
Clock Routing For High-Performance Ics: CP 1 DL +
No ratings yet
Clock Routing For High-Performance Ics: CP 1 DL +
7 pages
Designing A Robust Clock Tree Structure
100% (1)
Designing A Robust Clock Tree Structure
9 pages
Clock Balance Ieee Seminar04
100% (1)
Clock Balance Ieee Seminar04
49 pages
Clock Network Synthesis For Synchronous Circuits
No ratings yet
Clock Network Synthesis For Synchronous Circuits
59 pages
Lecture 8 CTS PDF
No ratings yet
Lecture 8 CTS PDF
40 pages
Low Power Clock Tree Synthesis Using Clustering Algorithm
No ratings yet
Low Power Clock Tree Synthesis Using Clustering Algorithm
5 pages
An Efficient Clock Tree Synthesis PDF
No ratings yet
An Efficient Clock Tree Synthesis PDF
4 pages
CTS Signoffsemiconductors
No ratings yet
CTS Signoffsemiconductors
7 pages
CTS CLOCK TREE SYNTHESIS
No ratings yet
CTS CLOCK TREE SYNTHESIS
15 pages
High-Dimensional Metamodeling For Prediction of Clock Tree Synthesis Outcomes
No ratings yet
High-Dimensional Metamodeling For Prediction of Clock Tree Synthesis Outcomes
7 pages
Cts
No ratings yet
Cts
79 pages
Timing Optimization Through Clock Skew Scheduling - Ivan S.kourtev, Baris Taskin, Eby G. Friedman
100% (1)
Timing Optimization Through Clock Skew Scheduling - Ivan S.kourtev, Baris Taskin, Eby G. Friedman
274 pages
Timing Issues in Digital ASIC Design
No ratings yet
Timing Issues in Digital ASIC Design
101 pages
Clock Tree Synthesis (CTS) Interview Questions - Vlsi4freshers
100% (1)
Clock Tree Synthesis (CTS) Interview Questions - Vlsi4freshers
11 pages
Constraints Sta PDF
No ratings yet
Constraints Sta PDF
92 pages
CTS Basics
No ratings yet
CTS Basics
13 pages
Lecture24 Clock Power Routing
100% (1)
Lecture24 Clock Power Routing
30 pages
Clock Tree Synthesis Techniques For Timing Convergence & Power Reduction of Soc Partitions
No ratings yet
Clock Tree Synthesis Techniques For Timing Convergence & Power Reduction of Soc Partitions
2 pages
Clock Tree Synthesis Techniques For Timing Convergence & Power Reduction of Complex Soc Partitions
No ratings yet
Clock Tree Synthesis Techniques For Timing Convergence & Power Reduction of Complex Soc Partitions
2 pages
Clock Tree Synthesis and Optimization of Socs Under Low Voltage
No ratings yet
Clock Tree Synthesis and Optimization of Socs Under Low Voltage
5 pages
A_Clock_Tree_Synthesis_Scheme_Based_On_Flexible_H-tree
No ratings yet
A_Clock_Tree_Synthesis_Scheme_Based_On_Flexible_H-tree
4 pages
Clock Distribution
100% (2)
Clock Distribution
52 pages
Clock Tree Divergence Ti
No ratings yet
Clock Tree Divergence Ti
5 pages
Clock Buffers
100% (2)
Clock Buffers
29 pages
Lecture24 Clockpower Routing
No ratings yet
Lecture24 Clockpower Routing
30 pages
Clock Tree Synthesis (CTS) Interview Questions - Vlsi4freshers
100% (1)
Clock Tree Synthesis (CTS) Interview Questions - Vlsi4freshers
5 pages
Gan Cts
No ratings yet
Gan Cts
8 pages
Ultimate Guide - Clock Tree Synthesis - AnySilicon
No ratings yet
Ultimate Guide - Clock Tree Synthesis - AnySilicon
12 pages
PHYSICAL DESIGN - Clock Tree Synthesis
No ratings yet
PHYSICAL DESIGN - Clock Tree Synthesis
20 pages
TKT-9627 Digital and Computer Systems Seminar: Closing The Gap Between ASIC & Custom
No ratings yet
TKT-9627 Digital and Computer Systems Seminar: Closing The Gap Between ASIC & Custom
28 pages
VLSI Physical Design Automation: Clock and Power Routing
No ratings yet
VLSI Physical Design Automation: Clock and Power Routing
30 pages
Cts 1
No ratings yet
Cts 1
16 pages
10_CTS
No ratings yet
10_CTS
21 pages
An OCV-Aware Clock Tree Synthesis Methodology: Necati Uysal and Rickard Ewetz
No ratings yet
An OCV-Aware Clock Tree Synthesis Methodology: Necati Uysal and Rickard Ewetz
9 pages
Sta Notes
No ratings yet
Sta Notes
33 pages
PD - Training Topic: CTS Author: Nilesh Ingale & P. Ravikumar Date:08-11-2012
100% (1)
PD - Training Topic: CTS Author: Nilesh Ingale & P. Ravikumar Date:08-11-2012
90 pages
Dynamic Timing Analysis Static Timing Analysis
No ratings yet
Dynamic Timing Analysis Static Timing Analysis
30 pages
EE 459/500 - HDL Based Digital Design With Programmable Logic Timing and Clock Issues
No ratings yet
EE 459/500 - HDL Based Digital Design With Programmable Logic Timing and Clock Issues
23 pages
CTS stage in Physical Design
No ratings yet
CTS stage in Physical Design
26 pages
Generalised H-Tree
No ratings yet
Generalised H-Tree
14 pages
Placement: Key Features
100% (1)
Placement: Key Features
7 pages
Vlsi Digital Design Issues
No ratings yet
Vlsi Digital Design Issues
94 pages
Clock Tree Synthesis
100% (2)
Clock Tree Synthesis
21 pages
PnR-II-CTS Routing Chip Finishing
No ratings yet
PnR-II-CTS Routing Chip Finishing
88 pages
Harnessing_Hybrid_Clock_Tree_Topology_to_Boost_PPA_in_Highly_Utilized_Designs
No ratings yet
Harnessing_Hybrid_Clock_Tree_Topology_to_Boost_PPA_in_Highly_Utilized_Designs
5 pages
Cts
No ratings yet
Cts
6 pages
Physical Design PD Interview Preparation
No ratings yet
Physical Design PD Interview Preparation
5 pages
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
From Everand
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
Analog Dialogue
No ratings yet
Sensor 61-12n Tacometro
No ratings yet
Sensor 61-12n Tacometro
21 pages
Batch Controller BCT-775: Process Control Instrumentation
No ratings yet
Batch Controller BCT-775: Process Control Instrumentation
2 pages
Excel Ereview - ELEC Module 2
No ratings yet
Excel Ereview - ELEC Module 2
4 pages
Delta Ia-Plc All Om en 20151113
No ratings yet
Delta Ia-Plc All Om en 20151113
173 pages
A ActualTests 220-701
100% (2)
A ActualTests 220-701
253 pages
RT9 Safety Relay Would You Like A Small Safety Relay For All Your Safety Applications?
No ratings yet
RT9 Safety Relay Would You Like A Small Safety Relay For All Your Safety Applications?
4 pages
MCB LS PDF
100% (1)
MCB LS PDF
56 pages
Aretha The Super Famicom FAQ-Walkthrough For Super Nintendo by Ritchie - GameFAQs
No ratings yet
Aretha The Super Famicom FAQ-Walkthrough For Super Nintendo by Ritchie - GameFAQs
27 pages
EE16104 Electrical Systems in Power Plant
No ratings yet
EE16104 Electrical Systems in Power Plant
43 pages
IPC UNIT 4
No ratings yet
IPC UNIT 4
14 pages
Po54G00A, Po74G00A: 54, 74 Series Noise Cancellation GHZ Logic
No ratings yet
Po54G00A, Po74G00A: 54, 74 Series Noise Cancellation GHZ Logic
6 pages
OD116824293935341000
No ratings yet
OD116824293935341000
1 page
TYPE 874-LBA/-LBB Slotted Lines: Operating Instructions
No ratings yet
TYPE 874-LBA/-LBB Slotted Lines: Operating Instructions
37 pages
EEPC108 Autotransformer
No ratings yet
EEPC108 Autotransformer
8 pages
129 2024 A
No ratings yet
129 2024 A
14 pages
AquaTemp - Submersible Temperature Sensor
No ratings yet
AquaTemp - Submersible Temperature Sensor
10 pages
9113 Safety Manualv8r0
No ratings yet
9113 Safety Manualv8r0
20 pages
EAF
No ratings yet
EAF
27 pages
STK404 070S
No ratings yet
STK404 070S
5 pages
Instruction Sheet: DR/2400 Lamp Module Replacement
No ratings yet
Instruction Sheet: DR/2400 Lamp Module Replacement
2 pages
3030 - S36W SMD LED Datasheet
No ratings yet
3030 - S36W SMD LED Datasheet
22 pages
Features Description: D D D D D D D D D D D D D D
No ratings yet
Features Description: D D D D D D D D D D D D D D
34 pages
HUAWEI DBS3900 2100Mhz CABINET: Configuration: 4 / 4 / 4
No ratings yet
HUAWEI DBS3900 2100Mhz CABINET: Configuration: 4 / 4 / 4
2 pages
Important MCQ - Electrical Machine Design - WWW - Allexamreview.com - PDF
No ratings yet
Important MCQ - Electrical Machine Design - WWW - Allexamreview.com - PDF
14 pages
X20PS3300-ENG_V3.18
No ratings yet
X20PS3300-ENG_V3.18
7 pages
Technical Sheet EL2270
No ratings yet
Technical Sheet EL2270
8 pages
Subject
No ratings yet
Subject
6 pages
SM Dual LFP 5.37K
No ratings yet
SM Dual LFP 5.37K
2 pages

Advanced Reinforcement Learning Solution For Clock Skew Engineering Modified Q-Table Update Technique For Peak Current and IR Drop Minimization

Uploaded by

Advanced Reinforcement Learning Solution For Clock Skew Engineering Modified Q-Table Update Technique For Peak Current and IR Drop Minimization

Uploaded by

Received 7 June 2023, accepted 26 July 2023, date of publication 11 August 2023, date of current version 22 August 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3304534

Advanced Reinforcement Learning Solution

HOUMAN HOMAYOUN 2 , AVESTA SASAN2 , AND P. D. SAI MANOJ 1

I. INTRODUCTION makes the designs much more complicated. The IC design

in clock distribution network results mean more violations

87870 VOLUME 11, 2023

VOLUME 11, 2023 87871

87872 VOLUME 11, 2023

87874 VOLUME 11, 2023

it computationally expensive and time-consuming. Policy

The next section of this paper discusses a proposed solution

VOLUME 11, 2023 87875

87876 VOLUME 11, 2023

register using the following formula:

VOLUME 11, 2023 87877

A fundamental aspect of this algorithm revolves around

87878 VOLUME 11, 2023

VOLUME 11, 2023 87879

87880 VOLUME 11, 2023

VOLUME 11, 2023 87881

87882 VOLUME 11, 2023

87884 VOLUME 11, 2023

VOLUME 11, 2023 87885

P. D. SAI MANOJ received the Ph.D. degree

87886 VOLUME 11, 2023

You might also like