Zhou 等 - 2023 - GridNetOpt Fast Full-Chip EM-Aware Power Grid Opt
Zhou 等 - 2023 - GridNetOpt Fast Full-Chip EM-Aware Power Grid Opt
This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
Abstract—This article presents a fast full-chip electromigration which can lead to resistance increase of the wire segment,
(EM) aware IR drop constrained optimization framework, named or even open circuit, making the IR drop of the power grids
GridNetOpt, for on-chip power grid networks accelerated by increase. Therefore, EM-induced aging and IR drop changes at
deep neural networks (DNN). Compared to the existing linear
programming-based methods, the new method employs more the target lifetime have to be taken into consideration to make
flexible conjugate gradient-based optimization to size the wire the PDN more robust. We notice that EM effects may also
width of the power grids. To mitigate the high cost of sensitivity lead to hillocks or extrusion at the anode nodes of the wires,
calculation of the adjoint network using full-chip IR drop analysis which may bring about short circuits. However, the majority
at every iteration step, the sensitivity is computed via a trained of the EM failures are due to void nucleation [5] and hence
conditional generative adversarial network (CGAN). The new
method exploits the differentiable characteristics of DNNs for we focus on the void-induced EM failure in this work.
fast sensitivity computation. The sensitivity, which is the node To design robust PDNs in the physical synthesis flow, the
voltage with respect to wire resistance, will guide the search wires have to be properly sized after the topology of the PDN
direction during the optimization process. In order to consider has been determined to minimize the area and meet the IR
more accurate EM failure effects, the training data is obtained drop requirement at the target lifetime. Many research efforts
from the power grids under different wire widths and current
loads analyzed by a state-of-the-art full-chip multi-physics-based have been investigated in the past based on nonlinear or linear
coupled EM-IR drop analysis tool. This is in contrast with optimization methods [6], [7], [8], [9], [10], [11]. Early works
the existing linear programming-based methods, in which only were mainly based on Black’s EM model. This is also the
immortal wires or wires with non-zero resistance can be dealt requirement widely adopted in industry today – EM constraint
with. Numerical results on a number of synthesized power is simply represented as the maximum allowed current density
grid benchmarks from ARM Cortex-M0 processor designs show
that the proposed GridNetOpt can lead to at least an order of of individual wire segments to avoid nucleation. Recent studies
magnitude speedup over the conjugate gradient-based method indicate that we have to analyze all the wire segments of the
using the traditional adjoint network method. Compared to entire interconnect wire simultaneously [12], [13], [14], [15],
the previous localized power grid fixing work with GridNet, [16], [17], [18].
GridNetOpt leads to smaller area overhead for all the benchmarks
we tested. It can also reduce IR drops for power grid circuits To alleviate the above drawback, some works use new multi-
with immortal wires, which is not possible with the localized segment EM models to size the power grids to fix the EM
GridNet method. failures and IR drop violations came up. Zhou et al. [19]
proposed a power grid network sizing method based on a
multi-segment EM immortality check criteria. It automatically
I. I NTRODUCTION considers all the wire segments and their interactions within an
On-chip power distribution networks (PDNs) are a crucial interconnect tree. However, the EM immortality constrained
backbone for feeding power to all transistors from top metals optimization is still too conservative as it requires all the
on a chip, because they directly affect chip performance and interconnect trees to be immortal, i.e., void nucleations are not
reliability. At the same time, electromigration (EM) remains allowed. To further mitigate this issue, Moudallal et al. [11]
the top failure mechanism for copper-based interconnects in proposed to directly consider EM-induced IR drops instead
all the subnanometer technologies. The International Roadmap of EM constraints on the time-varying power grid networks.
for Devices and Systems (IRDS) [2] predicts that the allowable It can consider post-voiding resistance change of wires based
current density will continue to decrease due to EM while the on finite difference analysis of EM-induced stress in multi-
required current density to drive the gates will continue to segment wires. Then the resulting nonlinear problem is solved
increase. As a result, the EM-related aging and reliability will by applying successive linear programming. This method,
become worse for current 5nm and below technologies. however, may suffer high computational costs if the number of
EM is a physical phenomenon of the migration of metal violation nodes is large as the sensitivities of those violating
atoms along the direction of the applied electrical field. Atoms nodes needs to be computed by solving the circuit matrices.
migrate along the trajectory of conducting electrons. For prac- Furthermore, this method has the limitation in which wires
tical VLSI chips, the on-chip power supply networks are most can only be sized up, which restricts its application in many
susceptible to EM failures because of large and unidirectional practical problems.
current densities [3], [4]. Due to EM aging effects, voids may On one hand, Chang et al. [20] introduced a learning-
be formed in the interconnects of the power grid networks, based EM violation waiver system, which investigates every
EM violation and takes an expert decision to either ignore
Han Zhou ([email protected]) is with Synopsys Inc. The work was the violation (waive-off) or resolve it (must-fix) in the de-
performed at University of California, Riverside. sign. However, the proposed method cannot directly perform
Yibo Liu, Wentian Jin, and Sheldon X.-D. Tan ([email protected]) are with the EM violation fixing. On the other hand, deep neural
Department of Electrical and Computer Engineering, University of California, networks (DNN) have propelled an evolution in machine
Riverside.
This work is supported in part by NSF grants under No. CCF-1816361, in learning fields and redefined many existing applications with
part by NSF grant under No. CCF-2007135 and No. OISE-1854276. new human-level AI capabilities. DNNs such as convolution
The preliminary results of this work have been published in [1]. neural networks (CNN) have been applied to many cognitive
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
applications such as visual object recognition, object detection, It can also reduce IR drops for power grid circuits with
speech recognition, natural language understanding, etc. due immortal wires, which is not possible with the previous
to dramatic accuracy improvements in those tasks [21]. method.
Recently, generative adversarial networks (GAN) [22] This paper is organized as follows: Section II reviews the
gained much traction as they can learn features (latent related preliminary works. Section III presents the details of
representation) without extensively annotated training data. the GAN-based EM-aware IR drop prediction approach. Sec-
GAN-based methods have been applied for solving several tion IV shows the formulation of the new EM-induced voltage
EDA problems such as layout lithography analysis [23], sub- constrained optimization and its solution method. Section V
resolution assist feature generation [24], and analog layout introduces the optimization strategies, including the fast gradi-
well generation [25] and high level thermal analysis [26] and ent calculation via deep neural networks. Experimental results
electromgration analysis [27]. and discussions are summarized in Section VI. Section VII
Inspired by the modeling power of the DNN/GAN for concludes the paper.
2D images, in this article, we try to mitigate the limita-
tions on the existing EM-aware power grid optimizations.
We develop a new fast EM-aware optimization framework, II. R ELATED WORKS
called GridNetOpt, for full-chip power grid network sizing In this section, we summarize some related literature on
and fixing. It capitalizes on the power of fast GAN-based physics-based EM-induced IR analysis and machine learning-
full-chip IR drop estimation method, which not only provides based IR drop analysis methods.
fast EM-induced IR drop estimation, but also enables fast
and scalable sensitivity computation for optimization via the
inherent differential function of trained GAN models. The key A. Full-chip EM-induced IR drop analysis
contributions of this paper are as follows: EM aging process typically leads to resistance increase or
• First, the new method applies a more general and even open-wire segments. For on-chip mesh-structured power
flexible conjugate gradient based optimization framework grid networks, due to its inherent design redundancy, a few
instead of the existing sequence of linear programming wire failures may not immediately result in a significant IR
method. To be more specific, it only requires sensitivity drop increase. But as more wires nucleate, the IR drop will
information to size any given power grids, with or without eventually lead to timing violations. As a result, the power grid
mortal wires. Compared to the successive linear program- networks become time-varying networks with time-varying
ming method [11], the proposed method does not have IR drops due to the EM-induced aging process [14], [15],
the limitation of reducing the IR voltage drop by only [29], [28]. On the other hand, the failed wire segments alter
widening the wires of the given power grid, and there is the current distributions of all the interconnect wires, which
no need to solve matrices to get sensitivities. may further accelerate the failure process. Hence, one has
• Instead of using the traditional adjoint network-based to consider the interplay between the two physics: electrical
sensitivity computation method, which requires full-chip characteristics and hydrostatic stress in the interconnect wires.
IR drop analysis at every iteration step, we propose to use EMspice [28], [30] is a full-chip coupled EM-IR drop co-
a deep learning based model for sensitivity computation. simulation tool that considers the dynamic interplay between
Once the model is trained, obtaining sensitivity will be the hydrostatic stress and electrical characteristics in a power
much simpler and faster. The trained GAN model not grid network. The tool consists of a finite difference time
only provides the IR drop information at the target aging domain (FDTD) solver for EM stress and a linear network
time but also provides the critical sensitivity information DC solver for IR drop, which can be described as
of node voltage with respect to the wire resistance or Cσ̇(t) = Aσ(t) + PI(t), (1)
width. The sensitivity computation cost is marginal for
Vv (t) = ΩL σ(t)
R
any given power grid designs with the same topology by B dV, (2)
taking advantage of the auto-differentiation function of M(t) × u(t) = PI(t), (3)
the DNN model. σ(0) = [σ1 (0), σ2 (0), ..., σn (0)] , at t = 0 (4)
• We leverage the previously proposed GAN-based full-
chip IR drop analysis tool GridNet [1] for fast IR drop Specifically, in the nucleation phase, hydrostatic stress is
estimation. GridNet is trained using 2D EM-induced IR modeled by the Korhonen’s equation with zero-flux boundary
drop maps of power grid designs at different aging time condition at the terminals and initial stress condition. After the
under different wire widths and current workloads. The FDTD process [31], the partial differential equation will be
EM-induced IR drops of those power grids are simulated converted to the linear time invariant (LTI) system as shown
from a coupled EM-IR analysis tool, EMspice [28], which in Eq. (1). Suppose we have n nodes, then C is an n × n
computes time-varying EM-induced IR drop and can identity matrix and A is an n × n coefficient matrix. Note
handle both early failure (open circuit) and late failure that σ(0) denotes the initial stress at t = 0. In the incubation
(non-zero resistance) cases. phase [17], a void starts to form, the void volume and stress
• Numerical results on a number of synthesized power distribution of the remaining wire are correlated by the atom
grid benchmarks from ARM Cortex-M0 processor de- conservation equation as shown in Eq. (2), where Vv (t) is the
signs show that the proposed GridNetOpt can lead to void volume, ΩL is the volume of the remaining interconnect
an order of magnitude or more speedup over the conju- wire and V is the volume of the wire.
gate gradient-based method using the traditional adjoint In the growth phase, the void continues to grow and thus
network method. Compared to the previously proposed the wire resistance starts to increase. Modified nodal analysis
localized power grid fixing method with GridNet, Grid- (MNA) is applied to calculate IR drops as shown in Eq. (3).
NetOpt can lead to smaller area overhead for all the M(t) is the conductance matrix of the power grid network.
benchmarks we tested due to global optimization nature. It is time-varying because wire resistance changes with EM
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
failure process. P is a b×p input matrix, where p is the number information will then be utilized for power grid optimization
of inputs. u(t) represents the node voltages of the network and in the chip design flow. After the power grid is incrementally
I(t) contains the current sources from the function blocks of updated, the GridNet model predicts new EM-induced voltage.
the chips. The above equations are solved together, and finally, If the IR drop violations remain unaddressed, GridNetOpt will
the resulted IR drops and EM failure hotspots at the target perform the next round of fixing and prediction iteratively until
aging time are reported. In this work, we use data simulated all the IR drop violations are eliminated.
from the open-source tool EMspice to train the DNN models.
B. Feature selection for GridNet
B. Machine learning accelerated IR drop estimation Given a mesh-structured power network, we can look at
In general, IR drop analysis is concerned with voltage drop the node voltages u(t) and the input current sources I(t)
estimation from given current or power sources, which can be in Eq. (3). For the DNN-based modeling, the input features
time-varying for dynamic analysis. Numerical techniques are should include both I(t) and M(t). M(t) is represented by
well developed and perform IR drop analysis well on power the resistance vectors of wire segments in the power grid
grids, such as hierarchical methods, random walk methods, networks. The resistance of a wire segment depends on its
Krylov-subspace methods, multi-grid techniques, and vector- length and cross-sectional area that is proportional to wire
less verification methods. width. Since we deal with mesh-structured power grids, the
Several machine learning-based IR drop analysis meth- topology of wire connections is implicitly presented if all the
ods have been proposed based on various deep neural net- wire resistance or features are pre-ordered (as a vector) based
works [32], [33], [34], [35], [36]. Those methods typically aim on the counting order. As a result, the GridNet model is able
to replace the standard full-chip IR drop analysis tool such to deal with different workloads, i.e., I(t) and initial wire
as ANSYS RedHawk, via data-driven learning and feature resistances (different M at t = 0) under the same power grid
selection. For instance, Lin et al. [32] proposed a full-chip structure.
dynamic IR drop analysis based on some power and physical
features extracted from cells and layouts. Fang et al. [33] C. Training data preprocessing and representation
tried to improve the scalability by training the models for the
localized region of the layout. Xie et al. [35] proposed a CNN- The preprocessing step extracts the electrical features and
based model transferable across different designs that is able geometries from raw layouts. After preprocessing, the work-
to incorporate design-dependent features during preprocessing. load samples will be represented in a customized scheme.
Ho et al. [34] focused on incremental IR drop prediction 1) Data preprocessing: Given a specific design, Synopsys
and mitigation. The gradient boosting framework uses more IC Compiler II (ICC II) takes a synthesized gate-level netlist
electrical and physical features for training. Chhabria et and a standard cell library as input, and then automatically
al. [36] proposed a CNN-based generative network method, creates the circuit layout. In the preroute (design planning)
called IREDGe, to predict on-chip temperature and IR drop step, one important procedure is performing power network
contours. Temperature and power grid analyses are mapped to synthesis. As shown in Fig. 2(a), the power and ground
image-to-image and sequence-to-sequence translation tasks. A network are generated based on the constraints that the user
good summary of recent work on machine learning-based IR defines. It consists of VDD power nets, VSS ground nets, and
drop analysis can be found at [37]. These machine learning external power supplies. The results later are used to examine
methods indeed have achieved significant progress in IR drop the voltage drop, resistance, and EM effect. Fig. 2(b) shows
estimation. But none of them takes EM aging effects into the voltage drop from the same power grid and the unit is mV.
consideration. Since our goal is to obtain EM-induced IR drops which contain
aging effect, we dumped the power grid information including
III. DNN- BASED FAST EM- INDUCED IR DROP layout geometry, layer, via, as well as branch currents for later
PREDICTION
simulation.
Having a sufficient amount of training data is a crucial
A. The overall workflow of the GridNetOpt framework requirement for machine learning approaches. The DNN-based
Fig. 1 shows the overall workflow of the proposed Grid- EM-induced IR drop prediction requires a lot of power grid
NetOpt framework. The workflow consists of three phases: samples and their corresponding ground truth EM-induced IR
training, inference and optimization. The first two phases drop along with the aging time. However, synthesizing a large
are also called GridNet. The training phase is shown in number of designs and dumping their power grid information
Fig. 1(a), the yellow block shows how the power grids are is not realistic. We first synthesized three power grid designs,
generated. Then in the red block, we use EMspice [28], the and then for each design, we randomly generated 12k different
coupled EM-IR analysis tool, to simulate the EM-induced IR workloads respectively. The network samples have the same
drop for synthesized power grid network. In the blue block, topology as the synthesized designs. Although they have the
GridNet receives the EM-induced voltage from 0 to Ttarget same number of power strips, they differ in the branch width
aging years as well as the initial power grid. Electrical and and length. Note that different workloads can have different
geometrical information are extracted afterwards. The training EM impacts, thus the wires can be sized properly later on.
process is shown with dashed arrows. Fig. 1(b) illustrates the 2) Data representation: Representation of data has a
inference phase and the sensitivity-based full-chip power grid tremendous impact on the behavior of deep neural networks.
optimization flow. GridNet has two outputs, one is default - To preserve the geometric and spatial relationship, we first
the EM-induced voltage of all nodes at a specific aging year. encode the EM-induced voltage at each node into a matrix
The other is optional - the sensitivity information: sensitivity and then convert the matrix to a color image, as illustrated
of node voltages with respect to the input resistances. These in Fig. 3. Either Python API matplotlib.pyplot.imshow() or
sensitivities can be obtained as a by-product from the differ- MATLAB API image() can display the scalar data as an image.
entiable DNN model as we will show later. The sensitivity Each pixel stands for one voltage value of one node, the
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
(a) (b)
Fig. 1. Proposed GridNetOpt framework: (a) GridNet training flow; (b) GridNet prediction flow and GridNetOpt optimization flow.
(a) (b)
(a) (b) Fig. 3. Compact IR drop image of power grid networks (a) Design 2: 4k
nodes; (b) Design 3: 16k nodes.
Fig. 2. (a) Power and ground networks of Cortex-M0 DesignStart; (b) Voltage
drop map of the power network of (a).
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
between the generator’s output and the ground truth, while induced voltage image y is fed into the discriminator D alter-
the loss function of CNN preserves only the L2 difference natively together with its corresponding workloads and aging
part but discarded the discriminator-related part as there is no time x as the condition input. The output of the discriminator
discriminator in CNN-alone architecture. The results proved is denoted as D(G(x), x) or D(y, x) depending on whether the
that the CGAN model can produce higher accuracy and generated or the real EM-induced voltage image was inputted.
smoother node voltage images. In the training process, we use the Wasserstein Distance [39]
Fig. 4 shows the full model structure in training process. as the measurement of the difference between the real and
Once the model is trained, only the generator G is preserved the generated EM-induced voltage image distribution to take
for inference. To make the GAN model learn the temporal advantage of higher stability and convergence possibility.
dynamics of EM-induced IR drops, we propose to use the Note that image-based DNN IR drop analysis like GridNet
time variable as the continuous condition for both generator and IREDGe [36] are very scalable. In general, the power
and discriminator, which was demonstrated to be effective for grid meshes on a chip are very sparse. When the chip gets
financial market risk analysis [38]. bigger we can select different pixel resolution for the layout
images. We can also leverage existing highly efficient GPU-
based computation framework to train GAN or CNN models
for large images.
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
1) Objective function: We can express the total routing area where a is the network area of function (7), pt is the penalty
of the power grid network in terms of sheet resistance, branch term and β is the penalty parameter. For the voltage drop
length, width, and conductance as follows constraint violation
0, if vj,t ≥ vdd − u
X X
a= li wi = ρli2 gi (6) cj,t = (13)
i∈B i∈B vj,t − (vdd − u) , else
The objective is to minimize the area of the power grid Eq. (13) is further simplied as
network. Assume that the topology and physical locations of
the network are fixed, ρli2 will become a constant and can be cj,t = vj,t − (vdd − u) , for all j ∈ Evdrop (14)
expressed as αi , then the objective function is simplified as where Evdrop represents a set of indexes of the nodes that
X violate voltage drop constraint in the power grid network.
a= αi gi (7) Minimum width constraints are not added into penalty func-
i∈B tion (12), the reason is that the proposed algorithm simply sets
2) Constraints: The constraints that need to be satisfied for the branches that do not satisfy minimum width constraints
a reliable power grid network are shown as follows. with the minimum metal line width. The original constrained
problem P is transformed to the problem of minimizing the
1. EM-induced voltage drop constraints: When a void is
penalty function (12) with minimum width constraints (9).
nucleated and the interconnect enters into the growth
Moudallal et al. [11] observed that the IR drop vdd − vj,t
phase, an increase over time in branch resistance will
is a monotonically increasing function with respect to time, in
happen and may lead to time-varying node voltages.
other words, vj,t1 ≥ vj,t2 for 0 ≤ t1 ≤ t2 . Although branch
Suppose vj,t is the node voltage of the leaf node j at ag-
resistance increase does not necessarily lead to an IR drop
ing time t, which is a nonlinear function of conductances,
increase, this assumption holds in most cases. With this, we
the voltage drop is limited by a constant
restrict our attention to the target aging time T , then Eq. (14)
vdd − vj,t ≤ u (8) becomes
where vdd is the supply voltage and u is the bound of the cj,T = vj,T − (vdd − u) , for all j ∈ Evdrop (15)
IR drop. In real design, normally a voltage drop of less
than 10% vdd is acceptable. 2) Optimization scheme: We first analyze the network for
2. Minimum width constraints: Usually, different layers node voltages and branch currents while considering its aging
have different requirements for the width of the metal time t and then identify the constraint violations. Generally,
wires penalty method transforms the original constrained optimiza-
wi ≥ wi,min (9) tion problem into a sequence of unconstrained minimization
problems. Back to our problem, the conjugate gradient method
where wi,min is the minimum metal line width. is adopted to update branch widths during each iteration,
According to Eq. (5), the above equation can be rewritten the process stops when all the constraints are satisfied. The
as wmin solution procedure can be described as follows.
gi ≥ (10) 1. Obtain the initial conductance vector G(0) , set an initial
ρli
value of penalty parameter β and error bound εb > 0.
3. Kirchhoff’s current law (KCL): We express Kirchhoff’s 2. Solve the unconstrained minimization problem (12),
current law in terms of node voltages obtain the current conductance vector G(k) .
X
(vk − vj ) gjk = ij (11) 3. If pt < εb , then stop; else, update penalty parameter β,
set k = k + 1, and go to step 2).
(j,k)
Note that penalty parameter β cannot be a constant because
where ij is the current demand at node j and each k different power grids need different β. In addition, small β
indicates a neighboring node of node j. In our approach, may result in overconsideration of the objective function while
we view node voltages as functions of conductance, so it large β may lead to an ill-conditioning problem. If we set the
is implicitly satisfied. ratio of penalty terms to objective function as a constant r, then
we will get the initial β0 and can start minimizing the penalty
function. β is updated automatically in the next minimization
B. Penalty method iteration, i.e., βk+1 = βk · r · a/pt . The process continues until
all the constraints are satisfied.
The power grid optimization aims to minimize objective
function (7) subject to constraints (8) and (9). It will be
referred as problem P. Problem P is a constrained nonlinear V. O PTIMIZATION STRATEGIES
optimization problem. A. Conjugate gradient method
The penalty method is adopted to solve problem P . By In the penalty method, the efficiency of solving uncon-
adding a penalty term to the objective function that prescribes a strained minimization dominates the execution time. The
high cost for the constraint violations, the original constrained conjugate gradient method, which is a method between the
problem is approximated with a sequence of unconstrained steepest descent method and the Newton method, deflects the
problems. direction of the steepest descent method by adding to it a
1) Penalty function formulation: We adopt a penalty func- positive multiple of the direction used in the last step. This
tion as follows method only requires the first-order derivatives but overcomes
X
f = a + pt = a + β · c2j,t (12) the steepest descent method’s shortcoming of slow conver-
j
gence. At the same time, the method does not need to save
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
and compute the second-order derivatives that are needed by flow. From Eq. (12), the partial differential of penalty function
the Newton method. with respect to conductance can be expressed as
We notice that the conjugate gradient method has been used ∂f ∂a ∂pt
for the IR drop and current density constrained optimiza- = + (16)
tion [41] and for on-chip decap optimization as well [42]. ∂gi ∂gi ∂gi
The work in [41] shows that the gradient-based optimization The first term of Eq. (16) is equal to the constant αi and the
method is more scalable than linear programming-based meth- second term can be expanded easily
ods [8]. However, this method is still based on Black’s EM ∂f X ∂vj,t
model, which adds current density constraints for each wire = αi + β · · 2 · cj,t , for all j ∈ Evdrop (17)
segment. It cannot optimize the power grids with nucleated ∂gi j
∂gi
wires for a target lifetime. In our approach, a more complicated
physics-based EM model is applied to solve the EM-induced Since our main focus is to ensure that the EM-induced voltages
IR drop optimization problem over the target lifetime, such at target time T do not have violations, it is enough to search
problem involves extensive computation-intensive simulations for a solution that decreases voltage drops at time T .
of full-chip PDNs. ∂f X ∂vj,T
In this work, we utilize the Fletcher-Reeves (F-R) conjugate = αi + β · · 2 · cj,T , for all j ∈ Evdrop (18)
∂gi j
∂gi
gradient method. The algorithm is shown as Algorithm 1.
Thus, the gradient of penalty function f with respect to
Algorithm 1 Unconstrained power grid area minimization conductance vector G is
algorithm T
∂f ∂f ∂f ∂f
Input: Current conductance vector G. ∇f (G) = , ,..., ,..., (19)
∂g1 ∂g2 ∂gi ∂gb
Output: New conductance vector G.
1: k := 0. D. Gradient calculation via merged adjoint network
2: Set initial descent direction to negative direction of the Traditionally, the adjoint network method has been proposed
gradient P (k) = −∇f G(k) . to calculate the partial differential of the node voltages with re-
3: /*F-R conjugate gradient method*/ spect to branch conductance [40]. The adjoint network method
4: repeat can compute the sensitivity of one node voltage with respect
(k) to all resistance or conductance, but the cost of computing the
5: Line search to determine a nonnegative scalar λopt that
sensitivities for all the node voltages can be very high. Instead
minimizes f . of solving all adjoint networks separately, the merged adjoint
6: Update conductance vector G(k+1) = G(k) + λk P (k) . network method only needs to solve circuit equations twice
7: Choose new descent direction 2 P (k+1) = to calculate the final gradient of the objective function [42]:
(k+1)
∇f G one is for the original network and the other is for the merged
−∇f G(k+1) + 2 P (k) . adjoint network. In this work, we implement merged adjoint
∇f G(k)
8: k := k + 1. networks for performance comparison.
Let N and N ′ (j) be the original network and the adjoint
9: until ∇f G(k) < εF R
network, respectively. The two networks have the same topol-
ogy and conductance values. By running EMspice simulator,
we can easily obtain conductance matrices of N and N ′ (j) at
time T . The only difference between the two networks is that
B. DNN-based fast EM-induced IR drop estimation all the absorbing current of N ′ (j) is set to zero except node
The conjugate gradient optimization framework requires j. Since EMspice also tells the node voltages for N at time
the sensitivity of penalized objective with respect to wire T , we only have to build B (j) to solve the branch voltages
conductance or width. It actually requires intensive full-chip for N ′ (j).
coupled EM and voltage (IR drop) analysis using EMspice B (j) = [0, 0, . . . , −1, 0, . . . , 0]
T
(20)
as we will show later. Such circuit-level multi-physics-based
′ ′
full-chip power grid simulations are very expensive and even Let vi,T and vi,T
denote branch i’s voltage of N and N (j),
prohibitive for large problem sizes. the partial differential of node voltage vj with respect to the
In this work, we build machine learning-based models based conductance of branch i is computed by
on the physics-based simulation to accelerate the sensitivity ∂vj,T ′ ′ ′
calculation. Since we are seeking the task as an image trans- = vi,T × vi,T = (vp,T − vq,T ) × vp,T − vq,T (21)
forming problem and GAN has already been proved to be ∂gi
successful in all kinds of image applications among different Then Eq. (18) becomes
DNN candidates, we select to employ conditional GAN (Grid- ∂f
Net) to estimate EM-induced voltage maps via a supervised =αi + 2 · β · (vp,T − vq,T )
learning process based on the physics-based simulation data ∂gi
from EMspice. Details of this CGAN architecture have already X X (22)
′ ′
been introduced in Section III-D. × vp,T (j) cj,T − vq,T (j) cj,T
j j
′
C. Gradient calculation for the objective function Suppose V (j) is a vector formed by the node voltages of
N ′ (j), we have
In the first step of the F-R conjugate gradient method, we
analyze the network and derive the node voltage and current vp′ (j) = Cp V ′ (j) , vq′ (j) = Cq V ′ (j) (23)
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
where Cp = [0, 0, . . . , 0, 1, 0, . . . , 0] with 1 appears at index reasonable to employ in our problem. With this, Eq. (18) via
p, and Cq = [0, 0, . . . , 0, 1, 0, . . . , 0] with 1 appears at index deep neural networks becomes
q. ∂f X ∂vj,T
Therefore, Eq. (22) can be rewritten as = αi + β · · 2 · cj,T , for all j ∈ Evdrop (26)
∂gi j
∂gi,0
∂f
=αi + 2 · β · (vp,T − vq,T ) (Cp − Cq )
∂gi VI. E XPERIMENTAL RESULTS AND DISCUSSION
(24)
X A. Experiment setup
× cj,T V ′ (j)
The proposed EM-aware IR drop constrained power grid
j
optimization is implemented in Python with the TensorFlow
library. The experiments are carried out on a Linux server
E. Fast gradient calculation via deep neural networks with 2 Xeon E5-2698v2 2.3GHz processors and Nvidia Titan
X RTX GPU with 24 GB memory.
As mentioned earlier, sensitivity computation by adjoint In order to validate our work, we start from the power
network methods based on the detailed multi-physics EMspice grid of the Cortex-M0 DesignStart processor, which is a 32-
simulation is very computationally expensive. To mitigate this bit processor that implements the ARMv6-M architecture and
issue, we propose to use the DNN-based model for sensitivity is placed and routed using ICC II with Synopsys 32/28nm
computation. Generic Library. The power grid of Cortex has two layers,
The objective of problem P is to minimize the power grid and there are 1k nodes in total.
area while ensuring that the functional modules work properly Power grid information obtained from ICC II is then fed
at the target EM aging time T . Note that Eq. (5) holds only into the power grid parser. The information includes but is
before the interconnect enters into the growth phase. Once not limited to structure, node location, wire layer, wire length,
the growth phase starts, the resistance starts increasing as the current source, voltage source, and resistance values. The
current starts to flow through the more resistive barriers of the netlist format extracted from the grids is consistent with IBM
copper wire. In other words, the decrease in conductance gi power grid benchmarks [43]. In order to obtain enough power
does not have an impact on the wire width wi . grids with different EM conditions, we generate lots of IBM-
Let us add subscript time t to illustrate. Back to our EM- format power grid networks so that different workloads with
induced voltage drop constrained problem, the sensitivity value different EM conditions can be tested and verified.
s we expect is ∂vj,T /∂wi , which means the partial differential We train our CGAN model using three different de-
of the node voltages at aging time T with respect to the branch signs/topologies and the size of the trained model varies with
width. According to Eq. (18), what we need to calculate is the grid size. Each design has a dataset containing 12k samples
∂vj,T /∂gi . Since the width does not change during the EM (workloads and aging time, EM-induced IR drop). Design
process, i.e., wi,T = wi,0 , it indicates that gi here should 1 comes from Cortex-M0, Design 2 and Design 3 are self-
be gi,0 . The rationale behind this is that we have to update synthesized power grids with a format similar to Design 1. As
conductance matrix G(k) for the next iteration, and updating shown in Fig. 3(a), Design 2 has 4k nodes, 128 interconnect
G(k) implies updating width W (k) , however, only the initial trees and 4 external power supplies. Design 3 is demonstrated
W (k) can be modified. in Fig. 3(b), and it has 16k nodes, 256 interconnect trees and
In EMspice, the coupled EM and IR simulation undergoes 9 external power supplies. The maximum allowable IR drop
complex stress evolution and the change of EM-induced is set to 10% Vdd and the target EM lifetime T is 10 years.
voltage drop with respect to time is nonlinear. From initial For each workload, we collect the EM-induced IR drop results
time 0 to target time T , the resistance of branch i may obtained by EMspice at 11 discrete aging time instants (0 to
increase or remain unchanged, while the width of branch i 10 years).
always unchanged. It is impossible to express those partial We randomly select 15% workloads for testing and the
derivatives with equations. Therefore, by applying the above remaining 85% are assigned for the training set. Our training
merged adjoint network method, we can easily get ∂vj,t /∂gi,t , and test data are separated on design basis, which means
but cannot obtain ∂vj,T /∂gi,0 . that the designs in the test dataset were never seen by the
As presented in Section III-E, we leverage the automatic model during the training process. This ensures that the results
differentiation scheme in GridNet to compute the sensitivity of testing reflect the generalizability of the model. We have
information for GridNetOpt. Specifically, we assume that we to emphasize that the designs in test dataset are to some
have m violation nodes at time T whose node voltages are extent similar to the ones in training dataset, otherwise it is
represented by vj , j ∈ {1, .., m}. The CGAN model is able impossible for the model to generalize to these unseen designs.
to give the estimated sensitivity values in milliseconds. Then During the training phase, all samples are randomly permuted
we can compute the following partial sensitivity matrix Sm×b at the beginning of every epoch.
easily
∂v1,T B. EM-induced IR drop prediction results
∂v1,T ∂v
∂g1,0 ∂g2,0 . . . ∂g1,Tb,0 1) Accuracy: Once the GridNet model is trained, the gen-
∂v2,T ∂v2,T ∂v
erator is preserved and serves as the model for inference.
. . . ∂g2,T
∂g1,0 ∂g2,0
b,0
Sm×b = .. .. .. . (25) The model can take any power grid workload for a certain
. . . .. topology as input and give the predicted EM-induced voltage
∂vm,T ∂vm,T ∂v at a specified aging year. The predicted results from GridNet
∂g1,0 ∂g2,0 . . . ∂gm,Tb,0
are compared with the baseline, which are the simulation
More importantly, this automatic differentiation scheme is results from EMspice. To evaluate the estimation error, we
able to tell ∂vj,T /∂ri,0 (∂vj,T /∂gi,0 ) directly, which is more employ the root-mean-square error (RMSE) as the metric. We
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
TABLE I
P REDICTION RESULTS OF DIFFERENT DESIGNS
circuit # nodes # voltage sources VDD (V) RMSE (mV)
Design 1 1024 2 1.05 5.697 (a) (b)
Design 2 4096 4 1.05 6.100
Design 3 16384 9 1.05 3.922 Fig. 5. Predicted IR drop versus the baseline of (a) PG-a; (b) PG-b.
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
10
TABLE II
G LOBAL OPTIMIZATION COMPARISON : COMPARISON BETWEEN PLAIN CG METHOD USING MERGED ADJOINT NETWORK [41] AND GridNetOpt
and the area reduced ratio is similar. This demonstrates that power grid occur, here, we do not care about the earliest time
our work achieves comparable optimization results compared t but cares about if there exist IR drop violations at target
to other conjugate gradient-based optimization works. Column time T . By utilizing the by-product sensitivity information,
1 shows the optimization results from the saturation volume- we are able to get the optimization direction much easier as
based EM immortality constrained SLP method. Among the no complex numerical calculation is required. By iteratively
9 test examples, only D1-PG1 and D1-PG2 are initially EM solving the unconstrained minimization problem and updating
immortal, thus can be optimized through the SLP optimization the conductance vector and penalty parameter, the power
within 2 iterations. In contrast, the other 7 examples contain grid meets the lifetime target after 7 iterations. Even though
mortal wires and cannot be performed successfully with this GridNetOpt achieves better area reduction than using the
method. We notice that the reduced area ratio of the power adjoint network approach for this case, the optimization time
grids from Design 3 is not big, the reason is that the test cases of the former is less. There is no obvious relationship between
we used are already well-designed, the optimization space left reduced area and the number of iterations of the two methods,
is not large enough. e.g., GridNetOpt went through more iterations for the D1-PG1
With GridNetOpt, we are able to meet the power grid life- case.
time target much faster than using the adjoint network method We remark that comparison with SLP is not an apple to
with EMspice, which would be a great advantage especially apple comparison as the two methods actually have different
when the optimization space is not that large because designers constraints as we explained earlier. Here we just show that
do not want to wait for a long time to only seek for a reduction SLP can’t optimize many of the PDN circuits, which however
potential. For example, in the D1-PG4 case, the lifetime of the can be optimized by the proposed method. Furthermore, note
whole power grids is predicted to be greater than 10 years and that we did not directly compare with [11]. The reason is that
the maximum voltage at T does not exceed 10%Vdd . There is this method depends on properties in which wire width can
1 mortal wire and 44 violation nodes in total. Note that the only be sized up (increased). For the proposed method, we can
lifetime definitions of individual interconnect wire and power start with any power grid network to optimize the wire widths
grid network are different. The power grid lifetime refers to to their best possible values (size up or size down). Further,
the earliest time t that EM-induced voltage violations of a this method essentially is an SLP-based method, existing
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
11
TABLE V
C OMPARISON OF LOCALIZED FIXING WITH GridNet [1] AND GLOBAL OPTIMIZATION GridNetOpt
work has shown that the conjugate gradient optimization T . The comparison results are shown in Table V. There are
method is much more scalable than linear programming-based 6 test cases in total, and each design topology has 2 cases.
methods [41]. In addition, we can extend our approach to Design D1-PGL2 is a power grid with 6 mortal wires and
statistical based optimization using Monte Carlo or other fast its predicted lifetime is 7 years. At target aging time T ,
variational methods [45], which will be our future works. there are 13 voltage drop violations. GridNetOpt completes
Table III presents the detailed comparison on the D1- the optimization process in 1 iteration whereas the localized
PG1 case. The number of violation nodes comes from the fixing method undergoes 2 iterations.
GridNet CGAN model. In this circuit, the original area is As we can see that GridNetOpt achieves better results
1.040µm2 and it is an EM immortal case. At the beginning in terms of area overhead for all the benchmarks than the
of conjugate gradient-based optimization, the wire width is localized fixing method, because the former can perform
all set to its minimum, thus the area for optimization is global optimization versus the localized fixing in [1]. As for
0.5592µm2 . GridNet predicts that this circuit will have 1014 computation time, the two methods are similar. The D3-PGL6
voltage violation nodes at the 10th aging year while EMspice case only has 2 mortal wires, the localized method is very
simulates that it has 1003 voltage violation nodes. The con- efficient while the global method becomes more expensive
jugate gradient optimization with GridNet and merged adjoint when the chip size gets larger.
network method undergo 6 and 4 iterations respectively to We note that design D3-PGL5 has zero mortal wires but
eliminate IR drop violations. Finally, GridNetOpt achieves it has voltage violations at design time. After 10 years, the
33.95% area reduction while CG with merged adjoint network violation number is still 45. GridNetOpt is able to optimize the
makes 34.79% area reduction. However, the overall time of power grid in 1 iteration. However, the localized fixing method
the latter is more than 17 times longer than the former. In cannot perform the fixing as it needs to know vulnerable
contrast, EM immortality constrained SLP-based optimization branches (mortal branches) to start from. Of course, one can
only goes through 2 iterations, since the immortality constraint find some local branches of the violating nodes to size, but it
is more strict than the 10-year target lifetime and this method is not relevant to the EM-induced IR drop optimization.
requires that all the branches within an interconnect tree have
the same wire width, it only achieves 17.38% area reduction. VII. C ONCLUSION
Table IV shows the comparison on the D3-PG8 case,
In this paper, we proposed a novel optimization framework,
which is an EM mortal power grid with an original area
called GridNetOpt, for on-chip power distribution networks
of 0.1858µm2 . With minimum width, the area becomes
considering EM-induced IR drop constraints at the target
0.1672µm2 and GridNet predicts that the number of nodes
aging time. GridNetOpt employs a conjugate gradient-based
that violate the threshold voltage is 751. The first optimization
approach to size the wire segments, which is capable to con-
iteration takes a relatively long time, after 3 iterations, all
sider all the EM failure situations, including immortal wires
the voltage violations are eliminated. In contrast, due to the
and mortal wires, for EM-aware power grid area optimization
long simulation time of EMspice, the CG with merged adjoint
with a target lifetime. The optimization framework is further
network method takes 3.5 hours to finish the optimization
empowered by the data-driven learning-based time-varying IR
process. As a result, we achieve about 80x speedup over
drop modeling using deep neural networks. The new method
existing CG-based approach.
can naturally leverage the differentiable feature of deep neural
Consider both area reduction and computation time, Grid-
networks for fast sensitivity computation of node voltage with
NetOpt gets similar area reduction but much better speedup
respect to wire resistance or width. Numerical results on a
(about 10x or more) for all the cases, we can conclude that
number of synthesized power grid benchmarks from ARM
it outperforms the plain CG method using adjoint networks
core CPU designs show that the proposed GridNetOpt can lead
for EM-induced IR drop constrained power grid optimization
to an order of magnitude or more speedup over the conjugate
problem.
gradient-based method using the traditional adjoint network
approach. Compared to the localized power grid fixing with
D. Comparison of localized fixing with GridNet and global GridNet, GridNetOpt can lead to smaller area overhead for
optimization GridNetOpt all the benchmarks we tested. It can also reduce IR drops for
Last but not least, we compare the proposed global opti- power grid circuits with immortal wires, which is not possible
mization GridNetOpt with our previous work: localized fixing with the localized GridNet method.
with GridNet [1].
For fair comparison, we set all branches to their minimum
width and only allow a few failed nodes at target aging time
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
12
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This is the author's version which has not been fully edit
content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2022.3206397
13
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on October 25,2022 at 12:38:39 UTC from IEEE Xplore. Restrictions apply.