Advanced Reinforcement Learning Solution For Clock Skew Engineering Modified Q-Table Update Technique For Peak Current and IR Drop Minimization
Advanced Reinforcement Learning Solution For Clock Skew Engineering Modified Q-Table Update Technique For Peak Current and IR Drop Minimization
ABSTRACT This paper discloses a Reinforcement Learning (RL) solution implemented to decrease the
peak current by alteration of the clock skews. Clock skews are elements of the clock network calculated
throughout the Clock Tree Synthesis (CTS) phase of physical design. Initially, the physical design tools
targeted obtaining a balanced clock tree and decreasing the clock skew as low as possible. The resulting
zero-skew clock tree caused a drastic increase in the current demand for the battery. The proposed solutions
in this paper comprise a Reinforcement Learning agent that maneuvers throughout the design and updates
the clock arrival time of each register by either adding, removing, or not changing it. The agent’s end game
is to maximize the clock arrival distribution of the design. The Reinforcement learning solution allows the
exploration and optimization of the clock tree synthesis process beyond the heuristic algorithms employed
by traditional Electronic Design Automation (EDA) tools. This paper contains two experiments using the
Reinforcement Learning algorithm. The first experiment results indicate a 35% reduction in peak current and
a significant reduction in IR decrease (from package to transistor) in the chosen benchmarks. The second
experiment modified the Q-table renewing technique, which resulted in another additional 10% improvement
compared to the first experiment. In both experiments, the agent traverses the environment and explores
different options despite creating timing violations and obtaining a substantial negative feedback reward for
the actions taken. However, the timing violation fixed later results in the agent obtaining a future reward
for modifying the clock arrival time of other registers. The overall process resulted in the broader spread of
clock arrival distribution.
INDEX TERMS Clock tree synthesis (CTS), computer aided design, machine learning, peak current
reduction, reinforcement learning, SARSA learning.
technique. Section II offers a thorough background of the resulting in significant voltage drop and voltage noise. The
research and discusses the motivation behind developing more substantial voltage drop reduces the transistors’ speed,
this advanced reinforcement learning approach. A compre- and the voltage noise increases the clock tree’s jitter.
hensive review of related work in the field is provided The impact of the reduction in average voltage at the
in Section III. Section IV describes the problem, and the transistor level and the surge of the clock jitter is another
proposed reinforcement learning solutions are outlined in essential factor to be considered during the physical design.
detail. Experimental results and performance evaluation are Therefore, designers need to set a lower rail voltage during
presented in Section V. Finally, Section VI concludes the the static timing analysis to plan for the rise of IR drop and
paper by summarizing the achieved results and improvements increase of each register endpoint’s uncertainty to mitigate
resulting from applying the proposed solution. the voltage noise [30]. As designs move toward reducing peak
current, the amount of inductive IR drop and voltage noise is
II. BACKGROUND decreased. In other words, extensive additional timing slack
One of the vital phases of the physical design that results spares are realized due to a higher voltage on the transistor
in obtaining the clock distribution network is Clock Tree level and reduced clock jitter, which could be utilized to
Synthesis (CTS). As designs grow, the clock distribution improve the Power, Performance, and Area (PPA) of design
network becomes enormous. Its sophisticated structure under implementation. Hence, reducing the peak current
contains many elements, such as Phase-Locked Loop (PLL) can significantly impact the PPA optimality outcome of the
systems, clock dividers, buffers, multiplexers, gates, wires, physical design.
etc. The extensive clock distribution network is accountable Authors in [45] disclose the timing closure problem,
for the majority of the chip design power consumption. The explaining its difficulty’s root cause. In addition, providing
CTS and designs are susceptible to a plurality of issues, details regarding traditional techniques that address and
such as jitters, timing issues, power implications, signal mitigate the timing closure complications. Furthermore,
integrity, and area implications. When designers are targeting new challenges that appear at advanced process nodes are
to achieve a balanced clock tree, which in turn causes the highlighted, and solutions to these problems are discussed.
simultaneous switching and clock transition at the clock pin In another prior art by [46], the authors have presented
of all registers within a short timing window at every rising a machine learning (ML) model which is founded on
edge, in rising-edge triggered design, the launch registers bigrams of path stages to predict Path Base Analysis (PBA)
launch the data path in the rising edge of the clock. The outcomes that are expensive from Graph Base Analysis
capture registers to capture the propagated logic value on the (GBA) outcomes that are relatively inexpensive. Their
upcoming rising edge of the clock. The switching activities study also focuses on identifying electrical and structural
are caused by the simultaneous launch of registers (and characteristics of a circuit that affect PBA-GBA deviation
activation of data paths) [47]. Furthermore, the clock level concerning endpoint arrival times. To accomplish this, GBA
transition raises the current discharging and charging the and PBA analysis of a given test case is conducted, coupled
capacitive clock network in each clock switching, either rise with artificially generated timing paths.
or fall. As illustrated by Fig. 1, the demanded battery current The literary creators embarked on a research expedition
is instantly changing at each rising edge of the clock due to where they explored a method of data analysis based
the cumulative effect of all switching activities. The surge of on multivariate linear regression, as documented in [49];
the current is less at the clock’s falling edge when the design is the method helps predict the timing analysis outcomes
based on a rising edge triggered; at the clock’s falling edge, at observed corners. The authors employed a strategy of
the clock network only discharges and does not experience backward step-wise selection to simplify the process of
any additional switching activity in the datapath (unless there choosing which corners to observe and which to predict.
exist half-cycle timing paths) with the sudden reduction in
switching activity between clock edges, the current surge is
rapidly suppressed [7]. A. BACKGROUND ON REINFORCEMENT LEARNING
The Power Distribution Network (PDN) has both resistive As the design expands in scope, an increase in switching
and inductive nature, which results in the transistors will activity leads to greater complexity, which consequently
experience both resistive and inductive voltage drops when complicates the CTS process [15]. As the size of the design
demanded current increases. There is a close correlation grows, the process becomes more unwieldy, presenting
between the resistive drop and the current flowing through an opportunity to investigate the potential advantages of
the resistive portion of PDN (IR), while the inductive IR employing Machine Learning (ML), particularly Reinforce-
drop is proportionate to the amount of change in the current ment Learning (RL), for optimizing the design. Before
(L.di/dt). The resistive drop is proportional to the current delving into the RL-specific solutions proposed in this paper,
that flows through the resistive PDN (IR), and the inductive a comprehensive overview of various Machine Learning
IR drop is proportional to the rate of change in the current techniques that chip designers can utilize to harness the power
(L.di/dt). The instantaneous variation in the current within of Machine Learning for optimizing and improving designs
a short period drastically increases the inductive IR drop and shortening production time is provided.
Machine Learning techniques can be categorized based on The expected reward signifies the cumulative reward
the primary data source that drives the algorithm, resulting anticipated when commencing from the current state st and
in four groups: (1) supervised learning, (2) unsupervised adhering to the policy π. To evaluate the value of being in
learning, (3) semi-supervised learning, and (4) reinforcement a particular state, the expected future rewards from that state
learning. In supervised learning, each training data item are taken into account using equation 2:
is labeled correctly, while unsupervised learning involves ∞
. X
training data items with no available labels, requiring the Vπ (st ) = Eπ [Gt |st ] = Eπ [Rt+1 + γ k (Rt+k+1 )|st ] (2)
algorithm to discern input samples. Semi-supervised learning k=1
falls in between, with some labeled training data items.
The following equation calculates the value of being in
Lastly, Reinforcement Learning (RL) approaches are distinct
a state S at time t and following a particular policy π,
in that they do not rely on training data items; instead, they
denoted as Vπ (st ), while Eπ [Gt |st ] represents the expected
depend on an agent’s actions and feedback received from its
reward from the current state St when following the same
environment. Together, these three paradigms offer a wide
policy π. There are several variants of reinforcement learning
range of approaches for training machine learning models to
(RL) algorithms available. The State-Action-Reward-State-
accomplish various tasks. In the field of artificial intelligence,
Action (SARSA) is one such variation of the Q-Learning
reinforcement learning is focused on developing algorithms
algorithm. In Q-Learning, the RL agent uses Q-values stored
and models that enable an intelligent AGENT to make a
in a Q-table, representing the values associated with various
sequence of decisions or actions that lead to the maximization
possible actions in a given state. Q-Learning is an off-policy
of cumulative REWARD in a given ENVIRONMENT. The
and model-free RL algorithm where the RL agent employs
agent interacts with its environment and receives feedback
an iterative approach to improve the quality of Q-table
in the form of rewards for each action it takes. The goal of
entries. On the other hand, SARSA is an on-policy version
reinforcement learning is to develop an agent that can learn
of Q-Learning that uses the value obtained by the current
from experience and improve its decision-making abilities
action taken to learn and improve the Q-table values. SARSA
over time, ultimately achieving optimal performance in the
calculates the Q-value for each action based on a formula that
given task or environment. The activity alters the condition
considers the expected reward for the next state-action pair.
of the environment, as noted by [4], and consequently leads
to a reward and an updated state. The cumulative reward and Q(st , at ) = Q(st , at )+α(Rt+1 + γ Q(st+1 , at+1 ) − Q(st , at ))
state transformation offer the agent with deferred feedback (3)
regarding the outcomes of its chosen actions. Through a
methodical process of trial and error, the agent acquires In this mathematical equation, Q(st , at ) represents the
knowledge on making optimal decisions to achieve the best function that calculates the action value at a given state
possible outcomes. The ultimate objective of the agent is to st and time at , while α represents the learning rate
maximize the overall reward, denoted as Gt in equation 1: and γ the discount factor. Prior works have employed
∞ reinforcement learning techniques to enhance the efficacy of
Computer-Aided Design (CAD) tools by acquiring practical
X
Gt = Rt+1 + γ k (Rt+k+1 ) = Rt+1 + γ k (Rt+2 )
k=1
and adaptive heuristics [11]. For instance, [11] employs
reinforcement learning to increase the efficiency of the placer
+γ (k+1)
(Rt+3 ) + . . . (1)
in exploring the solution space and dynamically adapting
In the given formula, the variable t represents the time step, to specific issues. Similarly, [12] and [16] leverages a
while R signifies the reward. The discount factor, denoted reinforcement learning-based routing approach to expedite
by γ , serves as a measure of the agent’s inclination towards FPGA routing solutions. Moreover, [13] presents a novel
future rewards. Additionally, k is an integer index employed placement and routing algorithm for 3D FPGAs, which
for adjusting time and the discount factor concerning future employs reinforcement learning to significantly reduce the
time steps. This equation illustrates the total reward as a Manhattan distance and wire length. Another instance of rein-
combination of the immediate reward Rt+1 , achieved upon forcement learning application is highlighted by [17], which
transitioning to the subsequent state St+1 , and the discounted leverages this technique to devise an energy-efficient I/O
future rewards commencing from the next step. management scheme between multi-core microprocessors
The discount factor ranges between 0 and 1. A value nearer and memory. In the subsequent section, the proposed solution
to zero indicates the agent prioritizes immediate rewards employing SARSA Q-learning Reinforcement Learning (RL)
and overlooks future rewards. Conversely, a value closer to is described, aiming to optimize the clock tree synthesis and
one reveals that the agent is more concerned with long-term maximize the distribution of clock arrival while exploring
rewards as opposed to short-term ones. Utilizing the total potential areas for further optimization.
reward formula, the agent can ascertain the anticipated return.
During the training phase, the agent acquires knowledge on III. RELATED WORK
maximizing the reward and formulates a policy, π, for its As it is well known, one of the essential phases in chip design
actions. is the construction of the on-chip PDN, as it has one to
one relationship and affects the chip’s quality and reliability. hand, [20] puts forward a hybrid optimization method for
Historically, designers utilized a large array of different clock tree synthesis (CTS) that integrates a Generative
techniques to mitigate the impact of IR drop and address Adversarial Network (GAN) with Reinforcement Learning.
the complications arising from the increase in demand peak The conventional GAN model comprises a generator and a
current. These techniques can be categorized into two groups: discriminator, while the reinforcement learning aspect of this
heuristic solutions focused on reducing peak current and ML approach incorporates a pre-trained regression model as a
solutions aimed at achieving the same goal. The subsequent supervisor for the generator.
subsections provide more detailed information about each According to [18], Artificial Neural Networks (ANN)
category. can predict the number of clock tree components such as
clock buffers and wire loads. The disclosed technique utilizes
A. HEURISTIC SOLUTIONS FOR PEAK CURRENT
ANN to determine the number buffer insertions during Clock
REDUCTION
Tree Synthesis (CTS) to achieve the desired clock skew
and maximize input transition times for clock buffers and
The authors of [1] discuss an innovative technique to mitigate
clock sinks. Other techniques like those proposed by [21]
the simultaneous switching in clock networks. It achieves
and [37] employ Machine Learning (ML) techniques like
this by bifurcating network buffers into two distinct groups
Support Vector Machine (SVM) algorithm to estimate clock
- one operating on the rising edge, the other on the
buffer and wire sizing, focusing on reducing clock skew while
falling edge. The current solution does not incorporate this
maintaining power dissipation levels. Further, [20] and [22]
method, but there is potential for future implementation.
utilize the conditional generative adversarial network (GAN)
Nevertheless, integrating this strategy necessitates the need
augmented with reinforcement learning to anticipate and
for edge-triggered registers that function on both edges. This
optimize Clock Tree Synthesis (CTS) outcomes.
results in a considerable capacitive load increase, essentially
doubling that of regular rising-edge or falling-edge triggered Furthermore, [24] offers a solution based on machine
registers, thereby nullifying the benefits of halving the clock learning that enables quick analysis of potential routing
network frequency. patterns and the building of clock trees.
There have been noteworthy advancements in employing
The method outlined in [19] aims to minimize the
reinforcement learning to enhance Computer-Aided Design
switching activity of a Finite State Machine (FSM) and
(CAD) tools’ operation, such as placement and routing
while not directly related to the solutions proposed by this
solutions. Reference [11] uses RL to augment the effi-
paper, it has potential to lower peak current. It achieves this
ciency of the placer in exploring the solution space and
by introducing state replication and re-encoding techniques.
dynamically adapting to particular problems. Reinforcement
However, this approach is limited to the FSM part of a
Learning-based routing techniques are presented in [12]
netlist. The strategy disclosed by this paper aligns more
and [16] to expedite FPGA routing solutions. Reference [13]
with the previous works [3], [7], [9], [10], [23], [26], [27],
proposes a placement and routing algorithm for 3D FPGA
[39], [47], which aim to reduce peak current by dispersing
that leverages Reinforcement Learning to reduce Manhattan
Clock Arrival Times (CAT). Instead of employing heuristic
distance and wire length. Furthermore, [17] employs RL
approaches, the paper uses a reinforcement learning agent
to formulate energy-efficient I/O management between
that learns to distribute clock arrival times to avoid timing
many-core microprocessors and memory.
issues.
In [8], partitioning and superposition techniques are used
In the recent research [48], a novel pyramid structure has
to extract SOC floorplan and PDN features. The extracted
been suggested to optimize resource use and performance
information is then used by a Machine Learning model
in a High-Level Synthesis (HLS) design using Machine
to independently predict an updated static IR drop for
Learning (ML). The researchers built a comprehensive
each power node without requiring a golden IR drop
database of C-to-FPGA outcomes from diverse benchmarks.
tool. It demonstrates superior performance compared to an
They used an automatic hardware optimization tool, Minerva,
industry-leading tool with minimal error rates.
to identify the maximum achievable clock frequency. This
The paper [6] introduces a design flow to generate
tool utilizes Static Timing Analysis (STA) and a heuristic
a PDN with minimal overhead for standard cell routing
algorithm to target optimal throughput or throughput-to-
while meeting the IR drop and EM constraints for a given
area, leading to a more efficient High-Level Synthesis (HLS)
placement. The ML model predicts the total wire length of
design.
the global route linked with a particular PDN configuration
to accelerate the search process.
B. LEARNING SOLUTIONS FOR PEAK CURRENT Authors in [38] calculate the IR drop after each ECO by
REDUCTION using timing, power, and physical features extracted before
The research article [7] introduces a clock-skew optimization ECO to predict the IR drop of a design after ECO. To enhance
solution using a heuristic approach based on genetic algo- prediction accuracy and training time, they develop regional
rithms and clustering techniques to handle the constraints models for cell instances located near IR drop violations. The
of the maximum number of clock drivers. On the other study confirms that for a design with 100,000 cell instances,
VOLUME 11, 2023 87873
S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering
IR drop prediction can be completed in a span of two The ANN-based MSD model was further integrated into
minutes. existing timing libraries for MIS-aware timing analysis. Their
In a study by [56], a novel machine learning (ML) comprehensive work presents a practical solution to address
technique is proposed for predicting IR drop in circuits the influence of MIS on SIS delay and its subsequent effects
before ECO revision. This technique enables the creation on the timing performance of the system.
of prediction models that can be reused to predict the The scholarly work by [51] innovatively employs Deep
IR drop of the revised circuit after ECO is complete. Neural Networks (DNN) to create highly accurate approxi-
Reference [40] offers a comprehensive review of diverse mations of signal arrival time distributions, while maintaining
techniques leveraging ML algorithms for IR drop estimation. linear-time complexity. They leveraged various DNN archi-
Meanwhile, the authors of [41] present PowerNet - an tectures to execute the maximum and convolution operations
innovative dynamic IR drop estimation method, employing with utmost efficiency, which was made possible by the
convolutional neural networks (CNNs) that can handle both utilization of appropriate training datasets.
vector-based and vector-less IR analysis. The CNN model In the work by [52], a groundbreaking approach is
employed in PowerNet exhibits high generalizability, making introduced for automatic timing closure of relative-timed
it suitable for a wide range of design applications. circuits using machine learning techniques. This ML-guided
Additionally, [42] introduces a novel automated workflow strategy is designed to accelerate the process by learning
to mitigate IR drop violations instigated by ECO. This work- from the characteristics in each iteration, thereby reducing the
flow provides solutions like cell movement and downsize overall time needed for timing closure of a given design.
options, utilizing an ML algorithm for IR drop prediction The study by [53] presents a novel machine learning (ML)
to avoid overfixing. In an innovative approach, they also based approach to predict pin-to-pin delays of combinational
apply a multi-round bipartite matching technique to optimize circuits at the register transfer level (RTL). To achieve a high
resources during the ECO workflow. In another related study, degree of accuracy, this approach seamlessly integrates slew
[43] proposes a tool that harnesses ML techniques like and delay estimations. They generate a training set using
three-dimensional convolutions and regression-like layers. characteristics of components produced by a model-driven
This tool suggests a more extensive subset of worst-case hardware generator framework. Open-source logic synthesis
test patterns, thus enhancing test coverage and enabling and static timing analysis (STA) tools are employed to
accurate prediction of IR drop. Lastly, [40] employs an determine the ground truth labels for delays, slews, and their
XGBoost-based ML technique to predict dynamic IR drop for interdependencies.
both vector-based and vector-less IR drop analysis. In [44], In this study, the paper presents an innovative RL-based
the authors forecast the symmetry and correlation between approach for clock skew engineering, aiming to optimize the
the predicted data and the golden data by leveraging the clock tree synthesis and maximize the distribution of clock
correlation coefficient. arrival. The proposed RL-based approach overcomes one of
The study presented in [54] brings forth an innovative the significant limitations of supervised learning approaches;
approach to generate predicted PBA timing results from heavy reliance on labeled data for training. Acquiring labeled
pessimistic GBA timing reports. This approach harnesses data can be time-consuming and expensive, especially in
the power of a stage-based delay model integrated with a scenarios with limited samples. The RL-based approach
customized loss function rooted in Machine Learning. What overcomes this limitation as it learns from interactions with
sets this model apart is its consideration for the asymmetric the environment without the need for extensive labeled
loss that might occur while generating these predictions. data. Furthermore, supervised models require help to adapt
The effectiveness of this model extends beyond precise PBA to dynamic environments as they are trained on fixed
timing results. It also enhances designers’ capabilities in datasets. In contrast, RL agents continuously interact with the
swiftly identifying false violation paths within GBA reports, environment and adjust to changing situations, making the
reducing the time expenditure significantly compared to proposed approach in this work more suitable for dynamic
conventional methods. As a consequence, it curtails the scenarios. Supervised models also have limited capability
margin in the post-route optimization phase. The increased to handle unseen or novel situations. In contrast, the
efficiency in generating timing results proposed by this model proposed RL-based approach in this work has the capability
holds the promise of significantly refining the design process. for exploration and learning from novel, never-seen-before
In the study conducted by [50], the researchers aimed to scenarios, which is crucial for clock skew engineering in
capture the impacts of Multiple-Input Switching (MIS) by real-world applications. Moreover, RL inherently balances
deriving a corrective measure, referred to as the MIS-SIS exploration and exploitation to find the best strategies,
Difference (MSD), applicable to traditional Single-Input making it more adept at handling non-linear relationships and
Switching (SIS) delay across diverse scenarios. Several complex interactions than supervised learning techniques.
modeling methods including polynomial regressions, support However, it is also essential to acknowledge the limitations of
vector regression, and artificial neural networks (ANNs) the RL-based approach. RL often requires more interactions
were experimented with to create a precise model for MSD. with the environment to achieve good performance, making
forms the CAT Dictionary), as well as a collection of Timing timing paths originating from that same register, using it as
Violations (TV) in the form of a dictionary. the launch register. By doing so, the rt effectively captures
Next-State (St+1 ): Following the relocation of the agent, the net impact of buffer insertion or removal on the timing
a revised number of clock buffers is now being employed behavior of the circuit.
to direct towards each endpoint, subsequently recalibrating
the clock arrival time (CAT) of the register. Additionally,
the revised CAT dictionary and the time-varying (TV) A. PROBLEM SPECIFIC ENVIRONMENT
dictionary have been updated to reflect the current state of One of the distinctive aspects and challenges posed by the
the system. These modifications are fundamental to ensuring present study is the creation of a problem-specific environ-
the integrity, accuracy, and precision of the clock distribution ment, as detailed in 1. The development of this environment
network, mitigating any potential errors or discrepancies in demanded a meticulous approach, commencing from the very
the system’s operation. basics. While reinforcement learning algorithms have been
Action: The available activities an agent can undertake employed to tackle a wide range of problems for many years,
at a given state (register) constitute a critical aspect of the unique nature of the problem at hand necessitated an
the proposed design. Specifically, at any given state, the environment tailored specifically for this purpose. The envi-
agent is tasked with taking two distinct actions, namely: ronment was meticulously crafted to address the complexities
1) inserting or removing up to five buffer entities (yielding of clock tree synthesis, encompassing small as well as larger
a total of eleven possible actions, i.e., five insertions, five designs replete with intricate nuances. Algorithm 1 offers
removals, or no action), and 2) moving to an adjacent insights into the functioning of the environment, including
register (achievable via four actions, namely, Move Up the calculation of rewards rσ and rt , alongside the mechanism
(U), Down (D), Left (L), and Right (R)). It is worth for updating the state, which is subsequently conveyed to the
noting that the neighboring register is determined based Agent.
on the grid-world configuration rather than actual register The environment in which the agent operates is char-
connectivity. Importantly, the evaluation of each action or acterized by a hierarchical dictionary structure, where the
move is performed in the corresponding q-tables, utilizing keys correspond to registers utilized in the design. The
the same reward function. To facilitate efficient learning, values associated with these registers consist of three nested
the agent is trained to adopt separate policies for movement dictionaries: The first dictionary stores the Clock Arrival
and buffer insertion or removal, each tailored to the specific Time (CAT) value of the given register. The second dictionary
demands of the corresponding task. The policy learned for the contains all the registers that are connected to the key register,
buffer insertion or removal is independent of the grid-world. serving as inputs. The third dictionary contains all the
However, the policy learned for movement depends on the registers that receive inputs from the key register, downstream
grid-world connectivity. in the circuit.
Reward: The reward serves as the consequential feedback The environment interacts with the agent by receiving
provided to the agent upon transitioning into a new state. and incorporating the selected action, subsequently updating
The agent of the proposed design in this work, upon entering the CAT values in the dictionary structure. The agent is
a new state, is subject to receiving two distinct rewards, afforded three possible actions at each step: adding a buffer,
with the first involving a positive or negative reinforcement removing a buffer, or moving to an adjacent register. When
(rσ ) that is proportionate to the extent to which the spread the agent chooses to add or remove a buffer, the environment
of clock arrival times is amplified or attenuated as a result determines the resulting CAT values and computes the
of the variance of the new distribution relative to the prior number of violations that arise from the selected action.
distribution. The second reward (rt ) is a significantly large The identification of violations is accomplished via rigorous
negative or positive reinforcement that is proportionate to timing analysis, using specialized formulas.
the degree of timing violation created or remedied by the It is important to note that the register undergoing
agent. The reward (rσ ) is the difference between the standard manipulation by the agent is situated at a launchpad or
deviation of clock arrival distribution before and after the capture pad, depending on the timing paths involved. Thus,
agent’s buffer insertion/removal action and it is computed the assessment of violations resulting from the agent’s action
using: requires the utilization of two distinct formulas, tailored to the
specific timing characteristics of the given register. Overall,
rσ = σnew − σold (4)
the environment is characterized by a complex structure,
The variable rt denotes the cumulative sum of timing necessitating the deployment of sophisticated algorithms
violations, measured in picoseconds, which occur upon the and techniques to facilitate the agent’s decision-making and
insertion or removal of a buffer along all timing paths to enhance the overall performance of the system. To check
and from the resident register of the Agent. This value is violations caused by the agent’s selected action with regard
derived by performing setup timing checks twice: firstly, to registers that the current register is feeding their inputs, the
by examining all timing paths leading to the Agent’s register, function calculates the slack between the current register and
using it as the capture register; and secondly, by assessing all each of the registers receiving an input feed from the current
With each iteration, a random figure ranging from 0 to 1 Algorithm 3 Modified-Q SARSA Learning Agent
is produced. The agent opts for a random action if Same is Previous Algorithm
for (st=1, st <= delay-start, st++) do
the generated number is smaller than ϵ, whereas it Randomly select Actions a { a ∈ [add, remove, none] }
selects the most advantageous actions for insertion/removal if (a = add(x) || a = remove(x) ) then { x ∈ [1,5] = number of
and movement (based on the values in the QM and buffers}
ds ← x × dBUF
QA Q-tables) when the generated figure surpasses ϵ. Imple- (rσ , rt , DIC, TV , 1) ← Env(ds , FF, DIC, TV , 1) {Algorithm 1}
menting an epsilon-greedy strategy enables the agent’s likeli- r ← rσ + s × rt {weighted reward}
hood of choosing random actions to diminish over time. This New Q-table formula for Decision
QA (s, a) ← 2QA (s, a) + α · (r + γ · (QA (s′ , a′ ) − QA (s, a))
approach empowers the agent to explore extensively at the end if
beginning and gradually shift its focus towards exploiting the Randomly select Actions m { m ∈ [U , D, L, R]}
maximum cumulative reward as time progresses. In the final FF ← Move in direction m of current FF
New Q-table formula for Movement
episode, the agent solely concentrates on exploiting, thereby QM (s, m) ← 2QM (s, m) + α · (r + γ · (QM (s′ , m′ ) − QM (s, m))
generating the highest attainable reward. To determine a end for
single reward value (r) for the chosen action, the agent .
{————-Phase 2 —————— SARSA Reinforcement Learning}
employs a weighted assessment of both rewards, calculated ξ ← 0.995 {ϵ decay factor}
using the subsequent formula: α ← 0.9 {Learning rate}
γ ← 0.35 {Discount factor}
for (epi=0, epi <= episodes, epi++) do {episodes count in table 1}
r = rσ + s × rt (9) FF ← start-point
DIC ← [DICCAT , DICConnectivity ]
The variable s represents a scalar multiplier, it serves to TV ← [] {empty list for storage of Timing Violations}
adjust the importance of a timing discrepancy with regards for (st=0, st <= steps, st++) do
rand ← random ∈ [0, 1]
to the gains or losses stemming from the expansion or if rand > ϵ then {Agent does exploitation}
contraction of clock arrival time dispersion. This scaling a ← argmax(QA ) {Insertion exploitation}
factor ensures that the penalties for timing transgressions are if (a = add(x) || remove(x)) then
ds ← x × dBUF
substantially greater than the rewards gained from broadening (rσ , rt , DIC, TV , 1) ← Env(ds , FF, DIC, TV , 1)
the distribution. r ← rσ + s × rt {weighted reward}
The agent is compelled to reduce or prevent the occurrence New Q-table formula for Decision
QA (s, a) ← 2QA (s, a) + α · (r + γ · (QA (s′ , a′ ) − QA (s, a))
of timing violations, except during its exploration phase, m ← argmax(QM ) {Movement exploitation}
where it seeks to rectify them in other areas. Upon FF ← Move in direction m of current FF
successfully addressing the timing violations, the agent is New Q-table formula for Movement
QM (s, m) ← 2QM (s, m) + α · (r + γ · (QM (s′ , m′ ) −
subsequently granted a substantial reward. A potential avenue QM (s, m))
for future research involves progressively increasing the value end if
of ‘s’ in each episode. This would enable the agent to generate else{(rand < ϵ) Agent does exploration}
Randomly select Actions a { a ∈ [add, remove, none] }
timing violations in earlier episodes and address them in if (a = add(x) || a = remove(x) ) then { x ∈ [1,5]}
subsequent ones, thereby enhancing the exploration process. ds ← x × dBUF
The paper intends to examine this approach in the subsequent (rσ , rt , DIC, TV , 1) ← Env(ds , FF, DIC, TV , 1)
r ← rσ + s × rt {weighted reward}
upcoming studies thoroughly. QA (s, a) ← 2QA (s, a) + α · (r + γ · (QA (s′ , a′ ) − QA (s, a))
Utilizing the aforementioned equation, the computed end if
reward will be employed in formula 3 to modify the Q-values Randomly select Actions m { m ∈ [U , D, L, R]}
FF ← Move in direction m of current FF
correlated with state-actions (such as buffer insertion/removal QM (s, m) ← 2QM (s, m)+α ·(r +γ ·(QM (s′ , m′ )−QM (s, m))
and motion) for each of the two Q-tables (QM and QA ). end if
This derived reward will have a direct impact on the agent’s if (epi = episodes-1) then
ϵ ← 0 . { only exploit in the last episode}
conduct during subsequent encounters (in future episodes) else if (ϵ > 0.10) then
when the agent revisits this particular state. ϵ = ϵ × ξ { decay the ϵ to reduce exploration}
else{if (ϵ > 0.10)}
ϵ = 0.10 {keep small exploration}
C. PROBLEM SPECIFIC MODIFIED REINFORCEMENT end if
ALGORITHM end for
end for
In order to expedite the learning process of the agent in
disclosed problem-specific solution, The paper introduces a
minor modification in reinforcement learning while simulta-
neously elevating the agent’s learning saturation to a superior adjusted as described below:
level; the method includes not replacing the old Q-value QA (s, a) ← 2QA (s, a) + α · (r + γ · (QA (s′ , a′ ) − QA (s, a))
associated with the position and decision with a new one,
(10)
instead replacing the old Q-value with a new one obtained by
adding the old and new Q values associated by the position QM (s, m) ← 2QM (s, m)+α · (r +γ ·(QM (s′ , m′ ) − QM (s, m))
and decision, the method for updating both Q-tables will be (11)
FIGURE 3. Increase and saturation of the standard deviation (σ ) of the register’s clock arrival time (CAT) distribution over the number of
reinforcement learning episodes. The saturation of σ indicates that the RL model can no longer increase the CAT distribution, and the RL solution
can be terminated.
FIGURE 4. Increase and saturation of the standard deviation (σ ) of the register’s clock arrival time (CAT) distribution over the number of
reinforcement learning episodes. The saturation of σ indicates that the RL model can no longer increase the CAT distribution, and the RL solution
can be terminated.
In other word the algorithm would be modified as follow, set to 20X the number of registers in the benchmark (e.g.,
in algorithm 3 200K steps for AES in each episode).
Fig. 4 illustrates how the agent learning behavior
V. EXPERIMENTS improves as the paper proposes the second approach,
In this work, the reinforcement learning approach has the problem-specific modified Reinforcement Learning
been employed based on the environment outlined in Algorithm, to increase the clock arrival time distribution over
algorithms 1 and 3, along with the agent as specified in time for each benchmark over time. Consider the learning
algorithm 2. The solutions were evaluated on a selection of curve for the AES benchmark: After around 500 episodes of
larger ISCAS89 benchmarks [35], as well as the AES and learning, the learning curve saturates, and the agent can no
Ethernet benchmarks. For each benchmark, the benchmark longer increase the CAT distribution. The number of steps in
implemented in Synopsys ICC2 [25] using two distinct each episode is roughly set to 20X the number of registers in
methods: the standard Place and Route (PnR) flow, and the benchmark (e.g., 200K steps for AES in each episode).
an alternative approach that utilized clock arrival times The aforementioned Fig. 4 clearly discloses at least 20%
generated through reinforcement learning. In the latter faster saturation across all benchmarks.
approach, the reinforcement learning-derived clock arrival
times were supplied to ICC2 as recommended arrival B. CHANGE IN CLOCK ARRIVAL TIMES
times via the ‘‘set_clock_balance_points’’ command. It is The histogram depicted in Fig. 5 exhibits the distribution
important to note that the final outcome of ICC2 might not of clock arrival times in two scenarios - one with the aid
exactly match the CAT list, since this input serves merely of the RL agent and the other without it. The clock arrival
as a suggestion and is implemented to the extent feasible. time (CAT) distribution is obtained post clock tree synthesis
Following this, each design underwent static timing analysis (CTS), wherein the baseline CAT is generated following the
and timing closure before being subject to power and IR standard ICC2 CTS flow. In contrast, the CAT for the RL
analysis using ANSYS Redhawk [57]. The remainder of this approach is derived by supplying the recommended clock
section will comprehensively discuss the results of performed arrival times (output of RL) to ICC2, executing the CTS, and
experiments. obtaining the resultant CAT distribution. The utilization of the
RL agent has considerably broadened the CAT distribution,
A. AGENT LEARNING BEHAVIOR as depicted in the graph. This broadening is anticipated to
Fig. 3 illustrates how the agent learns to increase the clock lead to a corresponding decrease in peak current and IR drop
arrival time distribution over time for each benchmark over of the design.
time. Consider the learning curve for the AES benchmark: As The histogram depicted in Fig. 6 exhibits the distribution
illustrated after around 700 episodes of learning, the learning of clock arrival times in three scenarios, without the
curve saturates, and the agent can no longer increase the CAT employment of RL solution, with the employment of RL
distribution. The number of steps in each episode is roughly solution, and the third is the use of the modified solution,
FIGURE 5. Histogram of Clock Arrival Time to endpoint registers before and after running the RL solution for peak current reduction. The broader
distribution of clock arrival times (without causing any timing violation) reduces the extent of simultaneous switching and, in turn, would reduce
the peak current and inductive voltage drop.
FIGURE 6. Histogram of Clock Arrival Time to endpoint registers for peak current reduction before running RL, after running the RL solution, and
after running the modified RL using the new Q-table updating technique. The broader distribution of clock arrival times (without causing any timing
violation) reduces the extent of simultaneous switching and, in turn, would reduce the peak current and inductive voltage drop.
FIGURE 7. Current Waveform of dynamic vector-less simulation of benchmarks in 10 consecutive cycles before and after application of RL solution
for peak current reduction. As illustrated, the wider CAT distribution reduces the demanded battery current across all benchmarks.
wherein the Q-table update is being modified to increase the C. IMPACT ON PEAK CURRENT
learning speed of the RL while improving the results. The Fig. 7 captures the impact of RL solution in peak current
clock arrival time (CAT) distribution is obtained post clock reduction for selected benchmarks over ten cycles of
tree synthesis (CTS), wherein the baseline CAT is generated execution. The peak current Figs. are generated using Ansys
following the standard ICC2 CTS flow. Redhawk [57] where the switching activity of clock and
FIGURE 8. Current Waveform of dynamic vector-less simulation of benchmarks in 10 consecutive cycles before and after application of RL solution
for peak current reduction. As illustrated, the wider CAT distribution reduces the demanded battery current across all benchmarks.
FIGURE 9. Current Waveform of dynamic vector-less simulation of benchmarks in 10 consecutive cycles before and after application of RL solution
for peak current reduction. As illustrated, the wider CAT distribution reduces the demanded battery current across all benchmarks.
data in each clock cycle is set to 200% (rise and fall) TABLE 1. Experiments parameters. The clock periods for each benchmark
are selected based on highest achievable performance when using
and 10% respectively. As illustrated, the peak current for standard Vt cells for physical design. Number of steps in each episode is
all benchmarks has reduced between 35% to 40%. This roughly 15X to 20X the number of registers.
is the direct result of spreading the clock arrival time of
the registers and the resulting reduction in simultaneous
switching activity.
Fig. 8 discloses the combined design peak current under
the three scenarios without RL, with RL solution, and with
modified RL solution for selected benchmarks generated
using Ansys Redhawk [57]. Furthermore, Fig. 9 is to disclose
the design peak current variations between two RL and employed the methodology described in [31] to compute the
modified RL scenarios for selected benchmark where in both delay equivalent voltage VDEV of each design (minimum
figures, the switching activity of clock and data in each clock voltage seen by transistors at launch) in both base and
cycle is set to 200% (rise and fall) and 10% respectively. RL assisted design. Furthermore, the paper computed the
As illustrated by these figures, as a consequence of the maximum cycle to cycle voltage noise (from two consecutive
additional spreading of the clock arrival time of the registers, cycles) using a vector-less IR simulation and reported the
the simultaneous switching activity is reduced further, and voltage noise. The result of this analysis is reflected in
the peak current for the modified RL solution provides Table 2. As illustrated, by using the RL solution for peak
an additional 10% to 15% reduction in comparison to the current reduction, the transistors, on average, see a higher rail
presented RL solution initially. voltage. At the same time, the extent of cycle-to-cycle voltage
variation is reduced. The decline in voltage noise relaxes the
D. IMPACT ON IR DROP requirement uncertainty margin (accounting for smaller clock
When the peak current decreases, it dramatically affects how jitter). Combining a higher overall rail voltage and smaller
much voltage drops between the package and the transistor uncertainty (voltage noise) increases the extent of available
level. This is mainly because there’s less of an inductive timing slack in each timing path. This, in turn, could be used
voltage drop caused by Ldi/dt. In simpler terms, when to improve the PPA optimality of the end design.
the current changes slower due to the clock signals being
spread out more, it reduces the inductive voltage drop. You E. SCALAR FACTOR ‘‘S’’
can see the results in Fig. 10, which shows how different One of the innovative aspects of the reinforcement learning
benchmarks were affected by this change when using the base solution applied to the problem disclosed by this work
and RL-assisted CTS flow. is determining the reward given to the agent after each
To measure how much such reduction in IR drop improves action is taken. One of the essential aspects of creating
the rail voltages seen at the transistor level, the paper the reinforcement learning solution and environment is
FIGURE 10. The IR map of benchmarks before and after applicaiton of RL solution for peak current reduction. The reduction in simultaneous
switching reduces the di /dt , which in turns result in reduction in inductive IR drop (Ldi /dt ).
TABLE 2. Summary of improvement in Voltage drop and reduction in peak current for selected benchmarks. The Min Rail voltage is computed using the
methodology described in [31]. The Max C2C voltage noise is the maximum amount of change in the voltage (from launch to capture) within one Cycle
(voltage noise). The lower the Voltage noise, the smaller the clock jitter and the resulting requirement for endpoint uncertainty. As illustrated, the RL
solution resulted in a reduction in peak current, improvement in rail voltage, and a decline in the extent of voltage noise.
FIGURE 11. Increase and saturation of the standard deviation (σ ) of register’s clock arrival time (CAT) distribution over the number of
reinforcement learning episodes based on different s-values.
determining the reward mechanism, which is an integral part Fig. 11 discloses the increase of standard deviation of
of the agent learning process. As it was disclosed above, various selected designs register’s clock arrival time (CAT)
by algorithms 2 and 3 the two reward value returned by distribution over the 1000 episodes of reinforcement learning
the environment (rσ ) and (rt ) are combined using the scalar based on different s-values. For this experiment, three
factor ‘‘S’’. variable size benchmarks selected, s13207, s35932, and
AES128, each having 649, 1728, and 10015 registers. The
r = rσ + s × rt (12)
experiment used different s-values s=10, 100, 1000, 2000.
VOLUME 11, 2023 87883
S. A. Beheshti-Shirazi et al.: Advanced RL Solution for Clock Skew Engineering
As disclosed by Fig. 11, lowering the value of the scalar [11] K. E. Murray and V. Betz, ‘‘Adaptive FPGA placement
‘‘s’’ slower the rise of the learning curve, which means it optimization via reinforcement learning,’’ in Proc. ACM/IEEE 1st
Workshop Mach. Learn. CAD (MLCAD), Sep. 2019, pp. 1–6, doi:
takes more episodes to achieve the saturation of the CAT 10.1109/MLCAD48534.2019.9142079.
distribution, which is an indicator of the agent learning [12] U. Farooq, N. U. Hasan, I. Baig, and M. Zghaibeh, ‘‘Efficient
completion. As the value ‘‘S’’ increases the slope of the curve FPGA routing using reinforcement learning,’’ in Proc. 12th Int.
Conf. Inf. Commun. Syst. (ICICS), May 2021, pp. 106–111, doi:
become steeper the agent reaches the learning saturation 10.1109/ICICS52457.2021.9464626.
quicker. At the same time, there is a need to have a balance; [13] C. Brej and J. D. Garside, ‘‘A quasi-delay-insensitive method to
if the s-value goes too high, the agent will achieve saturation overcome transistor variation,’’ in Proc. 18th Int. Conf. VLSI Design Held
Jointly, 4th Int. Conf. Embedded Syst. Design, 2005, pp. 451–456, doi:
quicker; however, it would not discover new Pareto Frontier 10.1109/ICVD.2005.30.
optimal that it would if the learning processes is continued [14] X.-W. Shih and Y.-W. Chang, ‘‘Fast timing-model independent
for more number of episodes. buffered clock-tree synthesis,’’ IEEE Trans. Comput.-Aided Design
Integr. Circuits Syst., vol. 31, no. 9, pp. 1393–1404, Sep. 2012, doi:
Another thing to consider when choosing the s-value is 10.1109/TCAD.2012.2191554.
memory usage. Larger s-value increases the reward size for [15] M. Donno, E. Macii, and L. Mazzoni, ‘‘Power-aware clock tree planning,’’
each reward which, in return, there would a need for more in Proc. Int. Symp. Phys. Design, Apr. 2004, pp. 138–147.
[16] M. A. Elgamma, K. E. Murray, and V. Betz, ‘‘Learn to place: FPGA
significant memory when running the experiments. placement using reinforcement learning and directed moves,’’ in Proc.
Int. Conf. Field-Program. Technol. (ICFPT), Dec. 2020, pp. 85–93, doi:
VI. CONCLUSION 10.1109/ICFPT51103.2020.00021.
[17] C. Xu, P.-Y. Chen, D. Niu, Y. Zheng, S. Yu, and Y. Xie, ‘‘Architecting
The work disclosed by this paper conducted a study that 3D vertical resistive memory for next-generation storage systems,’’ in
aimed to reduce peak current during the Clock Tree Synthesis Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2014,
stage of a design using Reinforcement Learning. Specifically, pp. 224–229, doi: 10.1109/ICCAD.2014.7001329.
[18] Y. Kwon, J. Jung, I. Han, and Y. Shin, ‘‘Transient clock power estimation
the work used a decaying epsilon-greedy SARSA approach. pre-CTS netlist,’’ in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS),
The findings showed that when utilizing the proposed RL- May 2018, pp. 1–4, doi: 10.1109/ISCAS.2018.8351430.
based solution, peak current decreased by 35-40% compared [19] J. Gu, G. Qu, L. Yuan, and Q. Zhou, ‘‘Peak current reduction by
simultaneous state replication re-encoding,’’ in Proc. IEEE/ACM Int.
to a baseline design generated using a heuristic Clock-Tree Conf. Comput.-Aided Design (ICCAD), Nov. 2010, pp. 592–595, doi:
Synthesis solution of the physical design EDA tool. The paper 10.1109/ICCAD.2010.5654204.
also discovered that a second RL algorithm with a modified [20] Y.-C. Lu, J. Lee, A. Agnesina, K. Samadi, and S. K. Lim,
‘‘GAN-CTS: A generative adversarial framework for clock tree prediction
Q-table update produced a design that led to an extra 10-15% and optimization,’’ in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design
decrease in peak current compared to the original RL-based (ICCAD), Nov. 2019, pp. 1–8, doi: 10.1109/ICCAD45719.2019.8942063.
solution. By reducing peak current inductive voltage drop and [21] A. B. Kahng, B. Lin, and S. Nath, ‘‘High-dimensional metamodeling for
prediction of clock tree synthesis outcomes,’’ in Proc. ACM/IEEE Int.
voltage noise decreased across selected benchmarks. Workshop Syst. Level Interconnect Predict. (SLIP), Jun. 2013, pp. 1–7, doi:
10.1109/SLIP.2013.6681685.
REFERENCES [22] M. Ancona, E. Ceolini, C. Öztireli, and M. Gross, ‘‘Towards better
understanding of gradient-based attribution methods for deep neural
[1] Y.-T. Nieh, S.-H. Huang, and S.-Y. Hsu, ‘‘Minimizing peak current via networks,’’ 2017, arXiv:1711.06104.
opposite-phase clock tree,’’ in Proc. 42nd Design Autom. Conf., 2005, [23] W. Li, M. E. Dehkordi, S. Yang, and D. Z. Pan, ‘‘Simultaneous
pp. 182–185, doi: 10.1109/DAC.2005.193797. placement and clock tree construction for modern FPGAs,’’ in Proc.
[2] M. Edahiro, ‘‘A clustering-based optimization algorithm in zero-skew rout- ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, 2019, pp. 132–141,
ings,’’ in Proc. 30th Int. Design Autom. Conf. (DAC), 1993, pp. 612–616. doi: 10.1145/3289600.3289631.
[3] A. Mukherjee and R. Sankaranarayan, ‘‘Retiming and clock scheduling [24] M. Liu, Z. Zhang, J. Wen, and Y. Jia, ‘‘An approximate symmetry
to minimize simultaneous switching,’’ in Proc. IEEE Int. SOC Conf., clock tree design with routing topology prediction,’’ in Proc. IEEE Int.
Sep. 2004, pp. 259–262, doi: 10.1109/SOCC.2004.1362427. Midwest Symp. Circuits Syst. (MWSCAS), Aug. 2021, pp. 92–96, doi:
[4] L. P. Kaelbling, M. L. Littman, and A. W. Moore, ‘‘Reinforcement 10.1109/MWSCAS47672.2021.9531772.
learning: A survey,’’ J. Artif. Intell. Res., vol. 4, no. 1, pp. 237–285, [25] Synopsys. Synopsys EDA Tools. Accessed: Jul. 10, 2020. [Online].
Jan. 1996. Available: http://synopsys.com/
[5] A. Vittal and M. Marek-Sadowska, ‘‘Low-power buffered clock tree [26] Y. Kaplan and S. Wimer, ‘‘Post optimization of a clock tree for power
design,’’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 16, supply noise reduction,’’ in Proc. IEEE 27th Conv. Electr. Electron. Eng.
no. 9, pp. 965–975, Sep. 1997. Isr., Nov. 2012, pp. 1–5, doi: 10.1109/EEEI.2012.6377136.
[6] W.-H. Chang, C.-H. Lin, S.-P. Mu, L.-D. Chen, C.-H. Tsai, Y.-C. Chiu, and [27] R. Chaturvedi and J. Hu, ‘‘Buffered clock tree for high-quality IC
M. C.-T. Chao, ‘‘Generating routing-driven power distribution networks design,’’ in Proc. Int. Symp. Signals, Circuits Syst., 2004, pp. 381–386, doi:
with machine-learning technique,’’ IEEE Trans. Comput.-Aided Design 10.1109/ISSCS.2003.1223769.
Integr. Circuits Syst., vol. 36, no. 8, pp. 1237–1250, Aug. 2017. [28] Y.-Y. Chen, C. Dong, and D. Chen, ‘‘Clock tree synthesis under aggressive
[7] P. Vuillod, L. Benini, A. Bogliolo, and G. D. Micheli, ‘‘Clock-skew buffer insertion,’’ in Proc. 47th Design Autom. Conf., Jun. 2010, pp. 86–89.
optimization for peak current reduction,’’ in Proc. Int. Symp. Low Power [29] K. I. Gubbi, S. A. Beheshti-Shirazi, T. Sheaves, S. Salehi, P. D. S. Manoj,
Electron. Design, 1996, pp. 265–270. S. Rafatirad, A. Sasan, and H. Homayoun, ‘‘Survey of machine learning for
[8] C.-T. Ho and A. B. Kahng, ‘‘IncPIRD: Fast learning-based predict of incre- electronic design automation,’’ in Proc. Great Lakes Symp. (VLSI), 2022,
mental IR drop,’’ in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design pp. 513–518, doi: 10.1145/3571502.3571510.
(ICCAD), Nov. 2019, pp. 1–8, doi: 10.1109/ICCAD45719.2019.8942110. [30] S. A. Beheshti-Shirazi, A. Vakil, S. Manoj, I. Savidis, H. Homayoun, and
[9] A. Vijayakumar, V. C. Patil, and S. Kundu, ‘‘An efficient method for clock A. Sasan, ‘‘A reinforced learning solution for clock skew engineering to
skew scheduling to reduce peak current,’’ in Proc. 29th Int. Conf. VLSI reduce peak current and IR drop,’’ in Proc. Great Lakes Symp. (VLSI),
Design 15th Int. Conf. Embedded Syst. (VLSID), Jan. 2016, pp. 505–510, 2021, pp. 181–187, doi: 10.1145/3458776.3460066.
doi: 10.1109/VLSID.2016.24. [31] A. Vakil, H. Homayoun, and A. Sasan, ‘‘IR-ATA: IR annotated timing
[10] W.-C.-D. Lam, C.-K. Koh, and C.-W.-A. Tsao, ‘‘Power supply noise analysis, a flow for closing the loop between PDN design, IR analysis &
suppression via clock skew scheduling,’’ in Proc. Int. Symp. Quality timing closure,’’ in Proc. 24th Asia South Pacific Design Autom. Conf.,
Electron. Design, 2002, pp. 355–360, doi: 10.1109/ISQED.2002.996772. Jan. 2019, pp. 152–159.
[32] Y. P. Chen and D. F. Wong, ‘‘An algorithm for zero-skew clock tree routing [52] T. Sharma, S. Kolluru, and K. S. Stevens, ‘‘Learning based tim-
with buffer insertion,’’ in Proc. ED & TC Eur. Design Test Conf., 1996, ing closure on relative timed design,’’ in Proc. IFIP/IEEE Int.
pp. 230–236. Conf. Very Large Scale Integr.-Syst. Chip, Jul. 2020, pp. 133–148,
[33] B. Gunna, L. Bhamidipati, H. Homayoun, and A. Sasan, ‘‘Spatial and doi: 10.1007/978-3-030-59850-1_10.
temporal scheduling of clock arrival times for IR hot-spot mitigation, [53] D. S. Lopera, L. Servadei, V. P. Kasi, S. Prebeck, and W. Ecker,
reformulation of peak current reduction,’’ in Proc. IEEE/ACM Int. ‘‘RTL delay prediction using neural networks,’’ in Proc. IEEE
Symp. Low Power Electron. Design (ISLPED), Jul. 2017, pp. 1–6, doi: Nordic Circuits Syst. Conf. (NorCAS), Oct. 2021, pp. 1–7,
10.1109/ISLPED.2017.8009179. doi: 10.1109/NorCAS53631.2021.9599868.
[34] C.-T. Ho and A. B. Kahng, ‘‘IncPIRD: Fast learning-based [54] A. Han, Z. Zhao, C. Feng, and S. Zhang, ‘‘Stage-based path delay predic-
prediction of incremental IR drop,’’ in Proc. IEEE/ACM Int. tion with customized machine learning technique,’’ in Proc. IEEE/ACM
Conf. Comput.-Aided Design (ICCAD), Nov. 2019, pp. 1–8, doi: Int. Conf. Comput.-Aided Design (ICCAD), Aug. 2021, pp. 1–8, doi:
10.1109/ICCAD45719.2019.8942110. 10.1109/ICCAD53799.2021.9598330.
[35] (2020). ISCAS’89 Benchmark Circuits. Accessed: Sep. 30, 2020. [Online]. [55] S. Zhang and S. Zhang, ‘‘Time and power constrained chip multiprocessor
Available: http://www.pld.ttu.ee/~maksim/benchmarks/iscas89/verilog/ energy optimization using machine learning techniques,’’ in Proc. 5th
Int. Conf. Electron. Inf. Technol. Comput. Eng., 2021, pp. 926–933, doi:
[36] I.-M. Liu, T.-L. Chou, A. Aziz, and D. F. Wong, ‘‘Zero-skew clock tree
10.1145/3466186.3466517.
construction by simultaneous routing, wire sizing buffer insertion,’’ in
[56] S.-Y. Lin, Y.-C. Fang, Y.-C. Li, Y.-C. Liu, T.-S. Yang, S.-C. Lin, C.-M. Li,
Proc. Int. Symp. Phys. Design, 2000, pp. 33–38.
and E. J. Fang, ‘‘IR drop prediction of ECO-revised circuits using machine
[37] R. Samanta, J. Hu, and P. Li, ‘‘Discrete buffer and wire sizing for link-based learning,’’ in Proc. IEEE 36th VLSI Test Symp. (VTS), Apr. 2018, pp. 1–6,
non-tree clock networks,’’ IEEE Trans. Very Large Scale Integr. (VLSI) doi: 10.1109/VTS.2018.8368657.
Syst., vol. 18, no. 7, pp. 1025–1035, Jul. 2010. [57] ANSYS Apache. Redhawk. Accessed: Nov. 17, 2020. [Online]. Available:
[38] Y.-C. Fang, H.-Y. Lin, M.-Y. Sui, C.-M. Li, and E. J. Fang, ‘‘Machine- https://www.apache-da.com/products/redhawk
learning-based dynamic IR drop prediction for ECO,’’ in Proc. IEEE/ACM
Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2018, pp. 1–7.
[39] S.-H. Huang, C.-M. Chang, and Y.-T. Nieh, ‘‘Fast multi-domain clock skew SAYED ARESH BEHESHTI-SHIRAZI received
scheduling for peak current reduction,’’ in Proc. Asia South Pacific Conf. the B.Sc. degree in electrical engineering from
Design Autom., 2006, doi: 10.1109/ASPDAC.2006.1594691. Qazvin Islamic Azad University (QIAU), Iran,
[40] Z. Xie, H. Li, X. Xu, J. Hu, and Y. Chen, ‘‘Fast IR drop estimation in 2006, and the M.Sc. degree in electrical
with machine learning,’’ in Proc. 39th Int. Conf. Comput.-Aided Design, and computer engineering from George Mason
Nov. 2020, pp. 1–8, doi: 10.1145/3400302.3415763. University (GMU), USA, in 2010, where
[41] Z. Xie, H. Ren, B. Khailany, Y. Sheng, S. Santosh, J. Hu, and he is currently pursuing the Ph.D. degree.
Y. Chen, ‘‘PowerNet: Transferable dynamic IR drop estimation via From 2019 to 2021, he was with the GATE
maximum convolutional neural network,’’ in Proc. 25th Asia South Research Laboratory, GMU, under the supervision
Pacific Design Autom. Conf. (ASP-DAC), Jan. 2020, pp. 13–18,
of Dr. Avesta Sasan, focusing on applied machine
doi: 10.1109/ASP-DAC47756.2020.9045574.
learning computer-aided design (CAD) and reinforcement learning.
[42] H.-Y. Lin, Y.-C. Fang, S.-T. Liu, J.-X. Chen, C.-M. Li, and
Currently, he is the Assistant Director of the Innovation Laboratory,
E. J. Fang, ‘‘Automatic IR-drop ECO using machine learning,’’ in
Government Accountability Office (GAO). He oversees cloud analytic and
Proc. IEEE Int. Test Conf. Asia (ITC-Asia), Sep. 2020, pp. 7–12,
doi: 10.1109/ITC-Asia51099.2020.00013. machine learning programs and products in this capacity. In addition, he has
[43] V. A. Chhabria, Y. Zhang, H. Ren, B. Keller, B. Khailany, and a career spanning over a decade in various roles at the United States
S. S. Sapatnekar, ‘‘MAVIREC: ML-aided vectored IR-drop estimation and Patent and Trademark Office (USPTO), excelling as a master’s level Patent
classification,’’ in Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE), Examiner and leading the Cybersecurity Division’s Continues Diagnostics
Feb. 2021, pp. 1825–1828, doi: 10.23919/DATE51398.2021.9473914. and Mitigation (CDM) Program. He also played a pivotal role in revitalizing
[44] P. Huang, C. Ma, and Z. Wu, ‘‘Fast dynamic IR-drop prediction using globally accessible tools, such as the Global Dossier (GD) and One Patent
machine learning in bulk FinFET technologies,’’ Symmetry, vol. 13, no. 10, Dossier Programs (OPD) at the Office of International Patent Cooperation
p. 1807, Sep. 2021, doi: 10.3390/sym13101807. (OIPC).
[45] S. Saurabh, H. Shah, and S. Singh, ‘‘Timing closure problem: Review of
challenges at advanced process nodes and solutions,’’ IETE Tech. Rev.,
NAJMEH NAZARI received the B.Sc. degree
vol. 35, no. 4, pp. 349–358, 2018.
in computer engineering from Shiraz University,
[46] A. B. Kahng, U. Mallappa, and L. Saul, ‘‘Using machine learning to
in 2010, and the M.Sc. degree in computer engi-
predict path-based slack from graph-based timing analysis,’’ in Proc. IEEE
neering from the Isfahan University of Technol-
36th Int. Conf. Comput. Design (ICCD), Oct. 2018, pp. 603–612, doi:
10.1109/ICCD.2018.00096. ogy, in 2013. She is currently pursuing the Ph.D.
[47] L. Bhamidipati, B. Gunna, H. Homayoun, and A. Sasan, ‘‘A power delivery degree with the ECE Department, University of
network and cell placement aware IR-drop mitigation technique: Harvest- California, Davis. From 2013 to 2015, she was a
ing unused timing slacks to schedule useful skews,’’ in Proc. IEEE Comput. Lecturer with the Shahid Chamran University of
Soc. Annu. Symp. VLSI (ISVLSI), Jul. 2017, pp. 272–277. Ahwaz. Her research interests include deep learn-
[48] H. M. Makrani, F. Farahmand, H. Sayadi, S. Bondi, S. M. P. Dinakarrao, H. ing, computer architecture, embedded systems,
Homayoun, and S. Rafatirad, ‘‘Pyramid: Machine learning framework to applied machine learning, and hardware security.
estimate the optimal timing and resource usage of a high-level synthesis
design,’’ in Proc. 29th Int. Conf. Field Program. Log. Appl. (FPL),
KEVIN IMMANUEL GUBBI received the B.Sc.
Sep. 2019, pp. 397–403, doi: 10.1109/FPL.2019.00069.
degree in electrical and electronics engineering
[49] A. B. Kahng, U. Mallappa, L. Saul, and S. Tong, ‘‘‘Unobserved corner’
from Anna University, in 2018, and the M.Sc.
prediction: Reducing timing analysis effort for faster design convergence
in advanced-node design,’’ in Proc. Design, Autom. Test Eur. Conf. Exhib. degree in computer engineering from San Fran-
(DATE), 2019, pp. 168–173, doi: 10.23919/DATE.2019.8715219. cisco State University, in 2021. He is currently
[50] O. V. S. S. Ram and S. Saurabh, ‘‘Modeling multiple-input switching pursuing the Ph.D. degree with the Electrical and
in timing analysis using machine learning,’’ IEEE Trans. Comput.-Aided Computer Engineering Department, University of
Design Integr. Circuits Syst., vol. 40, no. 4, pp. 723–734, Apr. 2021, doi: California, Davis. From 2021 to 2023, he was
10.1109/TCAD.2020.3009624. a Graduate Research Assistant with the ASEEC
[51] M. A. Savari and H. Jahanirad, ‘‘NN-SSTA: A deep neural network Laboratory and the GATE Laboratory, University
approach for statistical static timing analysis,’’ Expert Syst. Appl., vol. 149, of California, Davis. His research interests include electronic design
Jul. 2020, Art. no. 113309, doi: 10.1016/j.eswa.2020.113309. automation, hardware security, VLSI, and applied machine learning.
BANAFSHEH SABER LATIBARI (Graduate Stu- AVESTA SASAN received the B.Sc. degree
dent Member, IEEE) received the B.Sc. degree (summa cum laude) (Hons.) in computer engi-
in computer engineering from the K. N. Toosi neering and the M.Sc. and Ph.D. degrees in
University of Technology, in 2014, and the M.Sc. electrical and computer engineering from the
degree in computer architecture from the Sharif University of California, Irvine, in 2005, 2006,
University of Technology, in 2017. She is currently and 2010, respectively. In 2010, he joined the
pursuing the Ph.D. degree with the Electrical and Office of CTO in Broadcom Company, working
Computer Engineering Department, University of on the physical design and implementation of
California, Davis. From 2019 to 2021, she was ARM processors, as a Physical Designer, a Timing
a Graduate Research Assistant with the GATE Signoff Specialist, and the lead of signal and
Laboratory, George Mason University. Her research interests include applied power integrity signoff. In 2014, he was recruited by Qualcomm Office
machine learning and computer architecture. of VLSI Technology, where he has developed different methodologies
and in-house EDAs for accurate signoff and analysis of hardened ASIC
solutions. He joined George Mason University, in 2016, as an Associate
Professor with the Department of Electrical and Computer Engineering,
while simultaneously as the Associate Chair of research. In 2021, he joined
SETAREH RAFATIRAD received the M.Sc. and the Faculty at the Electrical and Computer Engineering Department,
Ph.D. degrees in computer science from the University of California, Davis. His research interests include hardware
University of California, Irvine, in 2009 and 2012, security, machine learning, neuromorphic computing, low power design and
respectively. She is an Associate Professor with methodology, approximate computing, and the Internet of Things (IoT).
the Department of Computer Science, University
of California, Davis. Prior to that, she was an
Associate Term Professor with the Department
of Information Sciences and Technology, George
Mason University. Her research interests include
applied machine learning, IoT security, and natural
language processing.