0% found this document useful (0 votes)

88 views

Chip Placement With Deep Reinforcement Learning

Uploaded by

xzxuan2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views

Chip Placement With Deep Reinforcement Learning

Uploaded by

xzxuan2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Chip Placement with Deep Reinforcement Learning

Azalia Mirhoseini * Anna Goldie * Mustafa Yazgan Joe Jiang Ebrahim Songhori Shen Wang Young-Joon Lee
{azalia, agoldie, mustafay, wenjiej, esonghori, shenwang, youngjoonlee}@google.com
Eric Johnson Omkar Pathak Sungmin Bae Azade Nazi Jiwoo Pak Andy Tong Kavya Srinivasa
William Hang Emre Tuncer Anand Babu Quoc Le James Laudon Richard Ho Roger Carpenter Jeff Dean
arXiv:2004.10746v1 [cs.LG] 22 Apr 2020

Abstract years to design, leaving us with the speculative task of opti-

mizing them for the machine learning (ML) models of 2-5
In this work, we present a learning-based ap- years from now. Dramatically shortening the chip design
proach to chip placement, one of the most com- cycle would allow hardware to better adapt to the rapidly
plex and time-consuming stages of the chip de- advancing field of AI. We believe that it is AI itself that
sign process. Unlike prior methods, our approach will provide the means to shorten the chip design cycle,
has the ability to learn from past experience and creating a symbiotic relationship between hardware and AI
improve over time. In particular, as we train with each fueling advances in the other.
over a greater number of chip blocks, our method
becomes better at rapidly generating optimized In this work, we present a learning-based approach to chip
placements for previously unseen chip blocks. placement, one of the most complex and time-consuming
To achieve these results, we pose placement as a stages of the chip design process. The objective is to place
Reinforcement Learning (RL) problem and train a netlist graph of macros (e.g., SRAMs) and standard cells
an agent to place the nodes of a chip netlist onto (logic gates, such as NAND, NOR, and XOR) onto a chip
a chip canvas. To enable our RL policy to gen- canvas, such that power, performance, and area (PPA) are
eralize to unseen blocks, we ground representa- optimized, while adhering to constraints on placement den-
tion learning in the supervised task of predicting sity and routing congestion (described in Sections 3.3.6
placement quality. By designing a neural archi- and 3.3.5). Despite decades of research on this problem,
tecture that can accurately predict reward across it is still necessary for human experts to iterate for weeks
a wide variety of netlists and their placements, with the existing placement tools, in order to produce solu-
we are able to generate rich feature embeddings tions that meet multi-faceted design criteria. The problem’s
of the input netlists. We then use this architec- complexity arises from the sizes of the netlist graphs (mil-
ture as the encoder of our policy and value net- lions to billions of nodes), the granularity of the grids onto
works to enable transfer learning. Our objec- which these graphs must be placed, and the exorbitant cost
tive is to minimize PPA (power, performance, of computing the true target metrics (many hours and some-
and area), and we show that, in under 6 hours, times over a day for industry-standard electronic design au-
our method can generate placements that are su- tomation (EDA) tools to evaluate a single design). Even
perhuman or comparable on modern accelerator after breaking the problem into more manageable subprob-
netlists, whereas existing baselines require hu- lems (e.g., grouping the nodes into a few thousand clusters
man experts in the loop and take several weeks. and reducing the granularity of the grid), the state space
is still orders of magnitude larger than recent problems on
which learning-based methods have shown success.
1. Introduction To address this challenge, we pose chip placement as a
Reinforcement Learning (RL) problem, where we train
Rapid progress in AI has been enabled by remarkable ad-
an agent (e.g., RL policy network) to optimize the place-
vances in computer systems and hardware, but with the end
ments. In each iteration of training, all of the macros of
of Moores Law and Dennard scaling, the world is mov-
the chip block are sequentially placed by the RL agent, af-
ing toward specialized hardware to meet AIs exponentially
ter which the standard cells are placed by a force-directed
growing demand for compute. However, todays chips take
method (Hanan & Kurtzberg, 1972; Tao Luo & Pan, 2008;
*
Equal contribution, order determined by coin flip. Bo Hu & Marek-Sadowska, 2005; Obermeier et al., 2005;
Spindler et al., 2008; Viswanathan et al., 2007b;a). Train-
Chip Placement with Deep Reinforcement Learning

ing is guided by a fast-but-approximate reward signal for particularly simulated annealing (Kirkpatrick et al., 1983;
each of the agent’s chip placements. Sechen & Sangiovanni-Vincentelli, 1986; Sarrafzadeh
et al., 2003). Simulated annealing (SA) is named for its
To our knowledge, the proposed method is the first place-
analogy to metallurgy, in which metals are first heated
ment approach with the ability to generalize, meaning that
and then gradually cooled to induce, or anneal, energy-
it can leverage what it has learned from placing previous
optimal crystalline surfaces. SA applies random perturba-
netlists to generate placements for new unseen netlists. In
tions to a given placement (e.g., shifts, swaps, or rotations
particular, we show that, as our agent is exposed to a greater
of macros), and then measures their effect on the objec-
volume and variety of chips, it becomes both faster and bet-
tive function (e.g., half-perimeter wirelength described in
ter at generating optimized placements for new chip blocks,
Section 3.3.1). If the perturbation is an improvement, it
bringing us closer to a future in which chip designers are
is applied; if not, it is still applied with some probability,
assisted by artificial agents with vast chip placement expe-
referred to as temperature. Temperature is initialized to a
rience.
particular value and is then gradually annealed to a lower
We believe that the ability of our approach to learn from ex- value. Although SA generates high-quality solutions, it
perience and improve over time unlocks new possibilities is very slow and difficult to parallelize, thereby failing to
for chip designers. We show that we can achieve superior scale to the increasingly large and complex circuits of the
PPA on real AI accelerator chips (Google TPUs), as com- 1990s and beyond.
pared to state-of-the-art baselines. Furthermore, our meth-
The 1990s-2000s were characterized by multi-level parti-
ods generate placements that are superior or comparable
tioning methods (Agnihotri et al., 2005; Roy et al., 2007),
to human expert chip designers in under 6 hours, whereas
as well as the resurgence of analytic techniques, such as
the highest-performing alternatives require human experts
force-directed methods (Tao Luo & Pan, 2008; Bo Hu &
in the loop and take several weeks for each of the dozens of
Marek-Sadowska, 2005; Obermeier et al., 2005; Spindler
blocks in a modern chip. Although we evaluate primarily
et al., 2008; Viswanathan et al., 2007b;a) and non-linear
on AI accelerator chips, our proposed method is broadly
optimizers (Kahng et al., 2005; Chen et al., 2006). The re-
applicable to any chip placement optimization.
newed success of quadratic methods was due in part to al-
gorithmic advances, but also to the large size of modern cir-
2. Related Work cuits (10-100 million nodes), which justified approximat-
ing the placement problem as that of placing nodes with
Global placement is a longstanding challenge in chip
zero area. However, despite the computational efficiency
design, requiring multi-objective optimization over cir-
of quadratic methods, they are generally less reliable and
cuits of ever-growing complexity. Since the 1960s,
produce lower quality solutions than their non-linear coun-
many approaches have been proposed, so far falling into
terparts.
three broad categories: 1) partitioning-based methods, 2)
stochastic/hill-climbing methods, and 3) analytic solvers. Non-linear optimization approximates cost using smooth
mathematical functions, such as log-sum-exp (William
Starting in the 1960s, industry and academic labs took a
et al., 2001) and weighted-average (Hsu et al., 2011) mod-
partitioning-based approach to the global placement prob-
els for wirelength, as well as Gaussian (Chen et al., 2008)
lem, proposing (Breuer, 1977; Kernighan, 1985; Fiduccia
and Helmholtz models for density. These functions are
& Mattheyses, 1982), as well as resistive-network based
then combined into a single objective function using a La-
methods (Chung-Kuan Cheng & Kuh, 1984; Ren-Song
grange penalty or relaxation. Due to the higher complexity
Tsay et al., 1988). These methods are characterized by
of these models, it is necessary to take a hierarchical ap-
a divide-and-conquer approach; the netlist and the chip
proach, placing clusters rather than individual nodes, an ap-
canvas are recursively partitioned until sufficiently small
proximation which degrades the quality of the placement.
sub-problems emerge, at which point the sub-netlists are
placed onto the sub-regions using optimal solvers. Such The last decade has seen the rise of modern analytic tech-
approaches are quite fast to execute and their hierarchi- niques, including more advanced quadratic methods (Kim
cal nature allows them to scale to arbitrarily large netlists. et al., 2010; 2012b; Kim & Markov, 2012; Brenner et al.,
However, by optimizing each sub-problem in isolation, 2008; Lin et al., 2013), and more recently, electrostatics-
partitioning-based methods sacrifice quality of the global based methods like ePlace (Lu et al., 2015) and RePlAce
solution, especially routing congestion. Furthermore, a (Cheng et al., 2019). Modeling netlist placement as an
poor early partition may result in an unsalvageable end electrostatic system, ePlace (Lu et al., 2015) proposed a
placement. new formulation of the density penalty where each node
(macro or standard cell) of the netlist is analogous to a pos-
In the 1980s, analytic approaches emerged, but were
itively charged particle whose area corresponds to its elec-
quickly overtaken by stochastic / hill-climbing algorithms,
Chip Placement with Deep Reinforcement Learning

tric charge. In this setting, nodes repel each other with a can learn across multiple experiences and transfer the ac-
force proportional to their charge (area), and the density quired knowledge to perform better on new unseen exam-
function and gradient correspond to the system’s poten- ples. In the context of chip placement, domain adaptation
tial energy. Variations of this electrostatics-based approach involves training a policy across a set of chip netlists and
have been proposed to address standard-cell placement (Lu applying that trained policy to a new unseen netlist. Re-
et al., 2015) and mixed-size placement (Lu et al., 2015; Lu cently, domain adaptation for combinatorial optimization
et al., 2016). RePlAce (Cheng et al., 2019) is a recent state- has emerged as a trend (Zhou et al., 2019; Paliwal et al.,
of-the-art mixed-size placement technique that further opti- 2019; Addanki et al., 2019). While the focus in prior work
mizes ePlace’s density function by introducing a local den- has been on using domain knowledge learned from previ-
sity function, which tailors the penalty factor for each indi- ous examples of an optimization problem to speed up pol-
vidual bin size. Section 5 compares the performance of the icy training on new problems, we propose an approach that,
state-of-the-art RePlAce algorithm against our approach. for the first time, enables the generation of higher quality
results by leveraging past experience. Not only does our
Recent work (Huang et al., 2019) proposes training a model
novel domain adaptation produce better results, it also re-
to predict the number of Design Rule Check (DRC) vi-
duces the training time 8-fold compared to training the pol-
olations for a given macro placement. DRCs are rules
icy from scratch.
that ensure that the placed and routed netlist adheres to
tape-out requirements. To generate macro placements with
fewer DRCs, (Huang et al., 2019) use the predictions from 3. Methods
this trained model as the evaluation function in simulated
3.1. Problem Statement
annealing. While this work represents an interesting di-
rection, it reports results on netlists with no more than 6 In this work, we target the chip placement optimization
macros, far fewer than any modern block, and the approach problem, in which the objective is to map the nodes of a
does not include any optimization during the place and the netlist (the graph describing the chip) onto a chip canvas
route steps. Due to the optimization, the placement and the (a bounded 2D space), such that final power, performance,
routing can change dramatically, and the actual DRC will and area (PPA) is optimized. In this section, we describe
change accordingly, invalidating the model prediction. In an overview of how we formulate the problem as a rein-
addition, although adhering to the DRC criteria is a neces- forcement learning (RL) problem, followed by a detailed
sary condition, the primary objective of macro placement description of the reward function, action and state repre-
is to optimize for wirelength, timing (e.g. Worst Negative sentations, policy architecture, and policy updates.
Slack (WNS) and Total Negative Slack (TNS)), power, and
area, and this work does not even consider these metrics. 3.2. Overview of Our Approach
To address this classic problem, we propose a new category We take a deep reinforcement learning approach to the
of approach: end-to-end learning-based methods. This type placement problem, where an RL agent (policy network)
of approach is most closely related to analytic solvers, par- sequentially places the macros; once all macros are placed,
ticularly non-linear ones, in that all of these methods op- a force-directed method is used to produce a rough place-
timize an objective function via gradient updates. How- ment of the standard cells, as shown in Figure 1. RL
ever, our approach differs from prior approaches in its abil- problems can be formulated as Markov Decision Processes
ity to learn from past experience to generate higher-quality (MDPs), consisting of four key elements:
placements on new chips. Unlike existing methods that
optimize the placement for each new chip from scratch, • states: the set of possible states of the world (e.g., in
our work leverages knowledge gained from placing prior our case, every possible partial placement of the netlist
chips to become better over time. In addition, our method onto the chip canvas).
enables direct optimization of the target metrics, such as
wirelength, density, and congestion, without having to de- • actions: the set of actions that can be taken by the
fine convex approximations of those functions as is done agent (e.g., given the current macro to place, the avail-
in other approaches (Cheng et al., 2019; Lu et al., 2015). able actions are the set of all the locations in the dis-
Not only does our formulation make it easy to incorporate crete canvas space (grid cells) onto which that macro
new cost functions as they become available, but it also al- can be placed without violating any hard constraints
lows us to weight their relative importance according to the on density or blockages).
needs of a given chip block (e.g., timing-critical or power-
constrained). • state transition: given a state and an action, this is the
probability distribution over next states.
Domain adaptation is the problem of training policies that
• reward: the reward for taking an action in a state. (e.g.,
Chip Placement with Deep Reinforcement Learning

Figure 1. The RL agent (i.e., the policy network) places macros one at a time. Once all macros are placed, the standard cells are placed
using a force-directed method. The reward, a linear combination of the approximate wirelength and congestion, are calculated and
passed to the agent to optimize its parameters for the next iteration.

in our case, the reward is 0 for all actions except the policy network architecture πθ (a|s) parameterized by θ,
last action where the reward is a negative weighted and finally the optimization method we use to train those
sum of proxy wirelength and congestion, subject to parameters.
density constraints as described in Section 3.3).
3.3. Reward
In our setting, at the initial state, s0 , we have an empty chip
Our goal in this work is to minimize power, performance
canvas and an unplaced netlist. The final state sT corre-
and area, subject to constraints on routing congestion and
sponds to a completely placed netlist. At each step, one
density. Our true reward is the output of a commercial
macro is placed. Thus, T is equal to the total number of
EDA tool, including wirelength, routing congestion, den-
macros in the netlist. At each time step t, the agent be-
sity, power, timing, and area. However, RL policies require
gins in state (st ), takes an action (at ), arrives at a new state
100,000s of examples to learn effectively, so it is critical
(st+1 ), and receives a reward (rt ) from the environment (0
that the reward function be fast to evaluate, ideally running
for t < T and negative proxy cost for t = T ).
in a few milliseconds. In order to be effective, these ap-
We define st to be a concatenation of features represent- proximate reward functions must also be positively corre-
ing the state at time t, including a graph embedding of the lated with the true reward. Therefore, a component of our
netlist (including both placed and unplaced nodes), a node cost is wirelength, because it is not only much cheaper to
embedding of the current macro to place, metadata about evaluate, but also correlates with power and performance
the netlist (Section 4), and a mask representing the feasi- (timing). We define approximate cost functions for both
bility of placing the current node onto each cell of the grid wirelength and congestion, as described in Section 3.3.1
(Section 3.3.6). and Section 3.3.5, respectively.
The action space is all valid placements of the tth macro, To combine multiple objectives into a single reward func-
which is a function of the density mask described in section, we take the weighted sum of proxy wirelength and
tion 3.3.6. Action at is the cell placement of the tth macro congestion where the weight can be used to explore the
that was chosen by the RL policy network. trade-off between the two metrics.
st+1 is the next state, which includes an updated repre- While we treat congestion as a soft constraint (i.e., lower
sentation containing information about the newly placed congestion improves the reward function), we treat density
macro, an updated density mask, and an embedding for the as a hard constraint, masking out actions (grid cells to place
next node to be placed. nodes onto) whose density exceeds the target density, as
described further in section 3.3.6.
In our formulation, rt is 0 for every time step except for
the final rT , where it is a weighted sum of approximate To keep the runtime per iteration small, we apply several
wirelength and congestion as described in Section 3.3. approximations to the calculation of the reward function:
Through repeated episodes (sequences of states, actions,
1. We group millions of standard cells into a few thou-
and rewards), the policy network learns to take actions that
sand clusters using hMETIS (Karypis & Kumar,
will maximize cumulative reward. We use Proximal Pol-
1998), a partitioning technique based on the normal-
icy Optimization (PPO) (Schulman et al., 2017) to update
ized minimum cut objective. Once all the macros are
the parameters of the policy network, given the cumulative
placed, we use force-directed methods to place the
reward for each placement.
standard cell clusters, as described in section 3.3.4.
In this section, we define the reward r, state s, actions a, Doing so enables us to achieve an approximate but fast
Chip Placement with Deep Reinforcement Learning

standard cell placement that facilitates policy network of the final placement. We limit the maximum number of
optimization. rows and columns to 128. We treat choosing the optimal
number of rows and columns as a bin-packing problem and
2. We discretize the grid to a few thousand grid cells and rank different combinations of rows and columns by the
place the center of macros and standard cell clusters amount of wasted space they incur. We use an average of
onto the center of the grid cells. 30 rows and columns in the experiments described in Sec-
3. When calculating wirelength, we make the simplify- tion 5.
ing assumption that all wires leaving a standard cell
cluster originate at the center of the cluster. 3.3.3. S ELECTION OF MACRO ORDER

4. To calculate routing congestion cost, we only consider To select the order in which the macros are placed, we sort
the average congestion of the top 10% most congested macros by descending size and break ties using a topolog-
grid cells, as described in Section 3.3.5. ical sort. By placing larger macros first, we reduce the
chance of there being no feasible placement for a later
macro. The topological sort can help the policy network
3.3.1. W IRELENGTH
learn to place connected nodes close to one another. An-
Following the literature (Shahookar & Mazumder, 1991), other potential approach would be to learn to jointly op-
we employ half-perimeter wirelength (HPWL), the most timize the ordering of macros and their placement, mak-
commonly used approximation for wirelength. HPWL is ing the choice of which node to place next part of the ac-
defined as the half-perimeter of the bounding boxes for all tion space. However, this enlarged action space would sig-
nodes in the netlist. The HPWL for a given net (edge) i is nificantly increase the complexity of the problem, and we
shown in the equation below: found that this heuristic worked in practice.

3.3.4. S TANDARD CELL PLACEMENT

HP W L(i) = (M AXb∈i {xb } − M INb∈i {xb } + 1) (1) To place standard cell clusters, we use an approach
+ (M AXb∈i {yb } − M INb∈i {yb } + 1) similar to classic force-directed methods (Shahookar &
Mazumder, 1991). We represent the netlist as a system
Here xb and yb show the x and y coordinates of the end of springs that apply force to each node, according to
points of net i. The overall HPWL cost is then calculated the weight × distance formula, causing tightly connected
by taking the normalized sum of all half-perimeter bound- nodes to be attracted to one another. We also introduce a
ing boxes, as shown in Equation 2. q(i) is a normaliza- repulsive force between overlapping nodes to reduce place-
tion factor which improves the accuracy of the estimate by ment density. After applying all forces, we move nodes in
increasing the wirelength cost as the number of nodes in- the direction of the force vector. To reduce oscillations, we
creases, where Nnetlist is the number of nets. set a maximum distance for each move.

3.3.5. ROUTING CONGESTION

Nnetlist
X We also followed convention in calculating proxy conges-
HP W L(netlist) = q(i) ∗ HP W L(i) (2) tion (Kim et al., 2012a), using a simple deterministic rout-
i=1
ing based on the locations of the driver and loads on the
net. The routed net occupies a certain amount of available
Intuitively, the HPWL for a given placement is roughly the routing resources (determined by the underlying semicon-
length of its Steiner tree (Gilbert & Pollak, 1968), which is ductor fabrication technology) for each grid cell which it
a lower bound on routing cost. passes through. We keep track of vertical and horizontal
Wirelength also has the advantage of correlating with other allocations in each grid cell separately. To smoothe the
important metrics, such as power and timing. Although congestion estimate, we run 5 × 1 convolutional filters in
we don’t optimize directly for these other metrics, we ob- both the vertical and horizontal direction. After all nets
serve high performance in power and timing (as shown in are routed, we take the average of the top 10% conges-
Table 2). tion values, drawing inspiration from the ABA10 metric in
MAPLE (Kim et al., 2012a). The congestion cost in Equa-
3.3.2. S ELECTION OF GRID ROWS AND COLUMNS tion 4 is the top 10% average congestion calculated by this
process.
Given the dimensions of the chip canvas, there are many
choices to discretize the 2D canvas into grid cells. This de-
cision impacts the difficulty of optimization and the quality
Chip Placement with Deep Reinforcement Learning

3.3.6. D ENSITY following section, we discuss how we process these fea-

tures to learn effective representations for the chip place-
We treat density as a hard constraint, disallowing the pol-
ment problem.
icy network from placing macros in locations which would
cause density to exceed the target (maxdensity ) or which
would result in infeasible macro overlap. This approach 4. Domain Transfer: Learning Better Chip
has two benefits: (1) it reduces the number of invalid place- Placements from Experience
ments generated by the policy network, and (2) it reduces
the search space of the optimization problem, making it Our goal is to develop RL agents that can generate higher
more computationally tractable. quality results as they gain experience placing chips. We
can formally define the placement objective function as fol-
A feasible standard cell cluster placement should meet the lows:
following criterion: the density of placed items in each
grid cell should not exceed a given target density threshold
(maxdensity ). We set this threshold to be 0.6 in our exper- 1 X
iments. To meet this constraint, during each RL step, we J(θ, G) = Eg,p∼πθ [Rp,g ] (3)
K
g∼G
calculate the current density mask, a binary m × n matrix
that represents grid cells onto which we can place the center
of the current node without violating the density threshold Here J(θ, G) is the cost function. The agent is parameter-
criteria. Before choosing an action from the policy net- ized by θ. The dataset of netlist graphs of size K is denoted
work output, we first take the dot product of the mask and by G with each individual netlist in the dataset written as
the policy network output and then take the argmax over g. Rp,g is the episode reward of a placement p drawn from
feasible locations. This approach prevents the policy net- the policy network applied to netlist g.
work from generating placements with overlapping macros
or dense standard cell areas.
We also enable blockage-aware placements (such as clock Rp,g = −W irelength(p, g) − λ Congestion(p, g) (4)
straps) by setting the density function of the blocked areas S.t. density(p, g) ≤ maxdensity
to 1.
Equation 4 shows the reward that we used for policy net-
3.3.7. P OSTPROCESSING work optimization, which is the negative weighted aver-
To prepare the placements for evaluation by commercial age of wirelength and congestion, subject to density con-
EDA tools, we perform a greedy legalization step to snap straints. The reward is explained in detail in Section 3.3. In
macros onto the nearest legal position while honoring the our experiments, congestion weight λ is set to 0.01 and the
minimum spacing constraints. We then fix the macro place- max density threshold is set to 0.6.
ments and use an EDA tool to place the standard cells and
evaluate the placement. 4.1. A Supervised Approach to Enable Transfer
Learning
3.4. Action Representation We propose a novel neural architecture that enables us to
For policy optimization purposes, we convert the canvas train domain-adaptive policies for chip placement. Train-
into a m × n grid. Thus, for any given state, the action ing such a policy network is a challenging task since the
space (or the output of the policy network) is the probabil- state space encompassing all possible placements of all
ity distribution of placements of the current macro over the possible chips is immense. Furthermore, different netlists
m × n grid. The action is the argmax of this probability and grid sizes can have very different properties, including
distribution. differing numbers of nodes, macro sizes, graph topologies,
and canvas widths and heights. To address this challenge,
we first focused on learning rich representations of the state
3.5. State Representation
space. Our intuition was that a policy network architecture
Our state contains information about the netlist graph (ad- capable of transferring placement optimization across chips
jacency matrix), its node features (width, height, type, should also be able to encode the state associated with a
etc.), edge features (number of connections), current node new unseen chip into a meaningful signal at inference time.
(macro) to be placed, and metadata of the netlist and the un- We therefore proposed training a neural network architec-
derlying technology (e.g., routing allocations, total number ture capable of predicting reward on new netlists, with the
of wires, macros, and standard cell clusters, etc.). In the ultimate goal of using this architecture as the encoder layer
of our policy network.
Chip Placement with Deep Reinforcement Learning

Figure 2. Policy and value network architecture. An embedding layer encodes information about the netlist adjacency, node features, and
the current macro to be placed. The policy and value networks then output a probability distribution over available placement locations
and an estimate of the expected reward for the current placement, respectively.

To train this supervised model, we needed a large dataset of rithm. We then repeatedly perform the following updates:
chip placements and their corresponding reward labels. We 1) each edge updates its representation by applying a fully
therefore created a dataset of 10,000 chip placements where connected network to an aggregated representation of in-
the input is the state associated with a given placement and termediate node embeddings, and 2) each node updates its
the label is the reward for that placement (wirelength and representation by taking the mean of adjacent edge embed-
congestion). We built this dataset by first picking 5 dif- dings. The node and edge updates are shown in Equation
ferent accelerator netlists and then generating 2,000 place- 5.
ments for each netlist. To create diverse placements for
each netlist, we trained a vanilla policy network at vari-
ous congestion weights (ranging from 0 to 1) and random e
eij = f c1 (concat(f c0 (vi )|f c0 (vj )|wij )) (5)
seeds, and collected snapshots of each placement during
the course of policy training. An untrained policy network vi = meanj∈N (vi ) (eij )
starts off with random weights and the generated place-
ments are of low quality, but as the policy network trains, Node embeddings are denoted by vi s for 1 <= i <= N ,
the quality of generated placements improves, allowing us where N is the total number of macros and standard cell
to collect a diverse dataset with placements of varying qual- clusters. Vectorized edges connecting nodes vi and vj are
ity. represented as eij . Both edge (eij ) and node (vi ) embed-
dings are randomly initialized and are 32-dimensional. f c0
To train a supervised model that can accurately predict e
is a 32 × 32, f c1 is a 65 × 32 feedforward network and wij s
wirelength and congestion labels and generalize to unseen
are learnable 1x1 weights corresponding to edges. N (vi )
data, we developed a novel graph neural network architec-
shows the neighbors of vi . The outputs of the algorithm are
ture that embeds information about the netlist. The role
the node and edge embeddings.
of graph neural networks is to distill information about
the type and connectivity of a node within a large graph Our supervised model consists of: (1) The graph neural
into low-dimensional vector representations which can be network described above that embeds information about
used in downstream tasks. Some examples of such down- node types and the netlist adjacency matrix. (2) A fully
stream tasks are node classification (Nazi et al., 2019), de- connected feedforward network that embeds the metadata,
vice placement (Zhou et al., 2019), link prediction (Zhang including information about the underlying semiconduc-
& Chen, 2018), and Design Rule Violations (DRCs) pre- tor technology (horizontal and vertical routing capacity),
diction (Zhiyao Xie Duke Univeristy, 2018). the total number of nets (edges), macros, and standard cell
clusters, canvas size and number of rows and columns in
We create a vector representation of each node by con-
the grid. (3) A fully connected feedforward network (the
catenating the node features. The node features include
prediction layer) whose input is a concatenation of the
node type, width, height, and x and y coordinates. We
netlist graph and metadata embedding and whose output
also pass node adjacency information as input to our algo-
is the reward prediction. The netlist graph embedding is
Chip Placement with Deep Reinforcement Learning

created by applying a reduce mean function on the edge 4.3. Policy Network Update: Training Parameters θ
embeddings. The supervised model is trained via regres-
In Equation 3, the objective is to train a policy network πθ
sion to minimize the weighted sum of the mean squared
that maximizes the expected value (E) of the reward (Rp,g )
loss of wirelength and congestion.
over the policy network’s placement distribution. To opti-
This supervised task allowed us to find the features and ar- mize the parameters of the policy network, we use Prox-
chitecture necessary to generalize reward prediction across imal Policy Optimization (PPO) (Schulman et al., 2017)
netlists. To incorporate this architecture into our policy net- with a clipped objective as shown below:
work, we removed the prediction layer and then used it as
the encoder component of the policy network as shown in LCLIP (θ) = Êt [min(rt (θ)Ât , clip(rt (θ), 1 − , 1 + )Ât )]
Figure 2. where Êt represents the expected value at timestep t, rt is
the ratio of the new policy and the old policy, and Ât is the
To handle different grid sizes corresponding to different
estimated advantage at timestep t.
choices of rows and columns, we set the grid size to
128 × 128, and mask the unused L-shaped section for grid
sizes smaller than 128 rows and columns. 5. Results
To place a new test netlist at inference time, we load the In this section, we evaluate our method and answer the fol-
pre-trained weights of the policy network and apply it to lowing questions: Does our method enable domain transfer
the new netlist. We refer to placements generated by a and learning from experience? What is the impact of us-
pre-trained policy network with no finetuning as zero-shot ing pre-trained policies on the quality of result? How does
placements. Such a placement can be generated in less than the quality of the generated placements compare to state-
a second, because it only requires a single inference step of of-the-art baselines? We also inspect the visual appearance
the pre-trained policy network. We can further optimize of the generated placements and provide some insights into
placement quality by finetuning the policy network. Do- why our policy network made those decisions.
ing so gives us the flexibility to either use the pre-trained
weights (that have learned a rich representation of the in- 5.1. Transfer Learning Results
put state) or further finetune these weights to optimize for
the properties of a particular chip netlist. Figure 3 compares the quality of placements generated us-
ing pre-trained policies to those generated by training the
4.2. Policy Network Architecture policy network from scratch. Zero-shot means that we ap-
plied a pre-trained policy network to a new netlist with no
Figure 2 depicts an overview of the policy network (mod- finetuning, yielding a placement in less than one second.
eled by πθ in Equation 3) and the value network architec- We also show results where we finetune the pre-trained pol-
ture that we developed for chip placement. The inputs to icy network on the details of a particular design for 2 and 12
these networks are the netlist graph (graph adjacency ma- hours. The policy network trained from scratch takes much
trix and node features), the id of the current node to be longer to converge, and even after 24 hours, the results are
placed, and the metadata of the netlist and the semiconduc- worse than what the finetuned policy network achieves af-
tor technology. The netlist graph is passed through our pro- ter 12 hours, demonstrating that the learned weights and ex-
posed graph neural network architecture as described ear- posure to many different designs are helping us to achieve
lier. This graph neural network generates embeddings of higher quality placements for new designs in less time.
(1) the partially placed graph and (2) the current node. We
Figure 4 shows the convergence plots for training from
use a simple feedforward network to embed (3) the meta-
scratch vs. training from a pre-trained policy network for
data. These three embedding vectors are then concatenated
Ariane RISC-V CPU. The pre-trained policy network starts
to form the state embedding, which is passed to a feedfor-
with a lower placement cost at the beginning of the finetun-
ward neural network. The output of the feedforward net-
ing process. Furthermore, the pre-trained policy network
work is then fed into the policy network (composed of 5
converges to a lower placement cost and does so more than
deconvolutions 1 and Batch Normalization layers) to gen-
30 hours faster than the policy network that was trained
erate a probability distribution over actions and passed to
from scratch.
a value network (composed of a feedforward network) to
predict the value of the input state.
5.2. Learning from Larger Datasets
1
The deconvolutions layers have a 3x3 kernel size with stride
2 and 16, 8, 4, 2, and 1 filter channels respectively. As we train on more chip blocks, we are able to speed
up the training process and generate higher quality results
faster. Figure 5 (left) shows the impact of a larger training
set on performance. The training dataset is created from
Chip Placement with Deep Reinforcement Learning

Figure 3. Domain adaptation results. For each block, the zero-shot results, as well as the finetuned results after 2 and 6 hours of training
are shown. We also include results for policies trained from scratch. As can be seen in the table, the pre-trained policy network
consistently outperforms the policy network that was trained from scratch, demonstrating the effectiveness of learning from training data
offline.

Figure 4. Convergence plots for training a policy network from scratch vs. finetuning a pre-trained policy network for a block of Ariane.

internal TPU blocks. The training data consists of a variety mized placements for new unseen blocks.
of blocks including memory subsystems, compute units,
and control logic. As we increase the training set from 2 5.3. Visualization Insights
blocks to 5 blocks and finally to 20 blocks, the policy net-
work generates better placements both at zero-shot and af- Figure 6 show the placement results for the Ariane RISC-
ter being finetuned for the same number of hours. Figure V CPU. On the left, placements from the zero-shot policy
5 (right) shows the placement cost on the test data, as the network and on the right, placements from the finetuned
policy network is being (pre-)trained. We can see that for policy network are shown. The zero-shot placements are
the small training dataset, the policy network quickly over- generated at inference time on a previously unseen chip.
fits to the training data and performance on the test data The zero-shot policy network places the standard cells in
degrades, whereas it takes longer for the policy network to the center of the canvas surrounded by macros, which is
overfit on largest dataset and the policy network pre-trained already quite close to the optimal arrangement. After fine-
on this larger dataset yields better results on the test data. tuning, the placements of macros become more regularized
This plot suggests that as we expose the policy network to a and the standard cell area in the center becomes less con-
greater variety of distinct blocks, while it might take longer gested.
for the policy network to pre-train, the policy network be- Figure 7 shows the visualized placements: on the left, re-
comes less prone to overfitting and better at finding opti-
Chip Placement with Deep Reinforcement Learning

Figure 5. We pre-train the policy network on three different training datasets (the small dataset is a subset of the medium one, and the
medium dataset is a subset of the large one). We then finetune this pre-trained policy network on the same test block and report cost at
various training durations (shown on the left of the figure). As the dataset size increases, both the quality of generated placements and
time to convergence on the test block improve. The right figure shows evaluation curves for policies trained on each dataset (each dot in
the right figure shows the cost of the placement generated by the policy under training).

Figure 6. Visualization of placements. On the left, zero-shot placements from the pre-trained policy and on the right, placements from
the finetuned policy are shown. The zero-shot policy placements are generated at inference time on a previously unseen chip. The
pre-trained policy network (with no fine-tuning) places the standard cells in the center of the canvas surrounded by macros, which is
already quite close to the optimal arrangement and in line with the intuitions of physical design experts.

sults from a manual placement, and on the right, results but to give an idea of the scale, each block contains up to a
from our approach are shown. The white area shows the few hundred macros and millions of standard cells.
macro placements and the green area shows the standard
Comparisons with Simulated Annealing: Simulated An-
cell placements. Our method creates donut-shaped place-
nealing (SA), is known to be a powerful, but slow, opti-
ments of macros, surrounding standard cells, which results
mization method. However, like RL, simulated annealing
in a reduction in the total wirelength.
is capable of optimizing arbitrary non-differentiable cost
functions. To show the relative sample efficiency of RL,
5.4. Comparing with Baseline Methods we ran experiments in which we replaced it with a sim-
In this section, we compare our method with 3 baselines ulated annealing based optimizer. In these experiments,
methods: Simulated Annealing, RePlAce, and human ex- we use the same inputs and cost function as before, but
pert baselines. For our method, we use a policy pre-trained in each episode, the simulated annealing optimizer places
on the largest dataset (of 20 TPU blocks) and then fine- all macros, followed by an FD step to place the standard
tune it on 5 target unseen blocks denoted by Blocks 1 to 5. cell clusters. Each macro placement is accepted according
Our dataset consists a variety of blocks including memory to the SA update rule using an exponential decay annealing
subsystems, compute units, and control logic. Due to con- schedule (Kirkpatrick et al., 1983). SA takes 18 hours to
fidentiality, we cannot disclose the details of these blocks, converge, whereas our method takes no more than 6 hours.
Chip Placement with Deep Reinforcement Learning

Figure 7. Human-expert placements are shown on the left and results from our approach are shown on the right. The white area represents
macros and the green area represents standard cells. The figures are intentionally blurred as the designs are proprietary.

Table 1. Experiments to evaluate sample efficiency of Deep RL compared to Simulated Annealing (SA). We replaced our RL policy
network with SA and ran 128 different SA experiments for each block, sweeping different hyper-parameters, including min and max
temperature, seed, and max step size. The results from the run with minimum cost is reported. The results show proxy wirelength and
congestion values for each block. Note that because these proxy metrics are relative, comparisons are only valid for different placements
of the same block.
Replacing Deep RL with SA in our framework Ours
Wirelength Congestion Wirelength Congestion
Block 1 0.048 1.21 0.047 0.87
Block 2 0.045 1.11 0.041 0.93
Block 3 0.044 1.14 0.034 0.96
Block 4 0.030 0.87 0.024 0.78
Block 5 0.045 1.29 0.038 0.88

To make comparisons fair, we ran multiple SA experiments ments, using the tool’s default settings. We then report total
that sweep different hyper-parameters, including min and wirelength, timing (worst (WNS) and total (TNS) negative
max temperature, seed, and max SA episodes, such that slack), area, and power metrics. As shown in Table 2, our
SA and RL spend the same amount of CPU-hours in sim- method outperforms RePLAce in generating placements
ulation and search a similar number of states. The results that meet the design requirements. Given constraints im-
from the experiment with minimum cost are reported in Ta- posed by the underlying semiconductor technology, place-
ble 1. As shown in the table, even with additional time, SA ments of these blocks will not be able to meet timing con-
struggles to produce high-quality placements compared to straints in the later stage of the design flow if the WNS
our approach, and produces placements with 14.4% higher is significantly above 100 ps or if the horizontal or verti-
wirelength and 24.1% higher congestion on average. cal congestion is over 1%, rendering some RePlAce place-
ments (Blocks 1, 2, 3) unusable. These results demonstrate
Comparisons with RePlAce (Cheng et al., 2019) and
that our congestion-aware approach is effective in generat-
manual baselines: Table 2 compares our results with the
ing high-quality placements that meet design criteria.
state-of-the-art method RePlAce (Cheng et al., 2019) and
manual baselines. The manual baseline is generated by RePlAce is faster than our method as it converges in 1 to 3.5
a production chip design team, and involved many itera- hours, whereas our results were achieved in 3 to 6 hours.
tions of placement optimization, guided by feedback from However, some of the fundamental advantages of our ap-
a commercial EDA tool over a period of several weeks. proach are 1) our method can readily optimize for various
non-differentiable cost functions, without the need to for-
With respect to RePlAce, we share the same optimization
mulate closed form or differentiable equivalents of those
goals, namely to optimize global placement in chip design,
cost functions. For example, while it is straightforward to
but we use different objective functions. Thus, rather than
model wirelength as a convex function, this is not true for
comparing results from different cost functions, we treat
routing congestion or timing. 2) our method has the abil-
the output of a commercial EDA tool as ground truth. To
ity to improve over time as the policy is exposed to more
perform this comparison, we fix the macro placements gen-
chip blocks, and 3) our method is able to adhere to various
erated by our method and by RePlAce and allow a commer-
design constraints, such as blockages of differing shapes.
cial EDA tool to further optimize the standard cell place-
Chip Placement with Deep Reinforcement Learning

Table 2. Comparing our method with the state-of-the-art (RePlAce (Cheng et al., 2019)) method and manual expert placements using an
industry standard electronic design automation (EDA) tool. For all metrics in this table, lower is better. For placements which violate
constraints on timing (WNS significantly greater than 100 ps) or congestion (horizontal or vertical congestion greater than 1%), we
render their metrics in gray to indicate that these placements are infeasible.
Name Method Timing Area Power Wirelength Congestion
WNS (ps) TNS (ns) Total (µm2 ) Total (W) (m) H (%) V (%)
Block 1 RePlAce 374 233.7 1693139 3.70 52.14 1.82 0.06
Manual 136 47.6 1680790 3.74 51.12 0.13 0.03
Ours 84 23.3 1681767 3.59 51.29 0.34 0.03
Block 2 RePlAce 97 6.6 785655 3.52 61.07 1.58 0.06
Manual 75 98.1 830470 3.56 62.92 0.23 0.04
Ours 59 170 694757 3.13 59.11 0.45 0.03
Block 3 RePlAce 193 3.9 867390 1.36 18.84 0.19 0.05
Manual 18 0.2 869779 1.42 20.74 0.22 0.07
Ours 11 2.2 868101 1.38 20.80 0.04 0.04
Block 4 RePlAce 58 11.2 944211 2.21 27.37 0.03 0.03
Manual 58 17.9 947766 2.17 29.16 0.00 0.01
Ours 52 0.7 942867 2.21 28.50 0.03 0.02
Block 5 RePlAce 156 254.6 1477283 3.24 31.83 0.04 0.03
Manual 107 97.2 1480881 3.23 37.99 0.00 0.01
Ours 68 141.0 1472302 3.28 36.59 0.01 0.03

Table 2 also shows the results generated by human ex- Implications for a broader class of problems: This work
pert chip designers. Both our method and human ex- is just one example of domain-adaptive policies for opti-
perts consistently generate viable placements, meaning that mization and can be extended to other stages of the chip
they meet the timing and congestion design criteria. We design process, such as architecture and logic design, syn-
also outperform or match manual placements in WNS, thesis, and design verification, with the goal of training ML
area, power, and wirelength. Furthermore, our end-to-end models that improve as they encounter more instances of
learning-based approach takes less than 6 hours, whereas the problem. A learning based method also enables fur-
the manual baseline involves a slow iterative optimization ther design space exploration and co-optimization within
process with experts in the loop and can take multiple the cascade of tasks that compose the chip design process.
weeks.
6. Conclusion
5.5. Discussions
In this work, we target the complex and impactful prob-
Opportunities for further optimization of our ap- lem of chip placement. We propose an RL-based approach
proach: There are multiple opportunities to further im- that enables transfer learning, meaning that the RL agent
prove the quality of our method. For example, the pro- becomes faster and better at chip placement as it gains ex-
cess of standard cell partitioning, row and column selec- perience on a greater number of chip netlists. We show
tion, as well as selecting the order in which the macros that our method outperforms state-of-the-art baselines and
are placed all can be further optimized. In addition, we can generate placements that are superior or comparable to
would also benefit from a more optimized approach to stan- human experts on modern accelerators. Our method is end-
dard cell placement. Currently, we use a force-directed to-end and generates placements in under 6 hours, whereas
method to place standard cells due to its fast runtime. How- the strongest baselines require human experts in the loop
ever, we believe that more advanced techniques for stan- and take several weeks.
dard cell placement such as RePlAce (Cheng et al., 2019)
and DREAMPlace (Lin et al., 2019) can yield more accu-
rate standard cell placements to guide the policy network 7. Acknowledgments
training. This is helpful because if the policy network has a This project was a collaboration between Google Research
clearer signal on how its macro placements affect standard and the Google Chip Implementation and Infrastructure
cell placement and final metrics, it can learn to make more (CI2) Team. We would like to thank Cliff Young, Ed Chi,
optimal macro placement decisions. Chip Stratakos, Sudip Roy, Amir Yazdanbakhsh, Nathan
Chip Placement with Deep Reinforcement Learning

Myung-Chul Kim, Sachin Agarwal, Bin Li, Martin Abadi, Chung-Kuan Cheng and Kuh, E. S. Module placement
Amir Salek, Samy Bengio, and David Patterson for their based on resistive network optimization. IEEE Transac-
help and support. tions on Computer-Aided Design of Integrated Circuits
and Systems, 3(3):218–225, July 1984. ISSN 1937-4151.
References doi: 10.1109/TCAD.1984.1270078.

Addanki, R., Venkatakrishnan, S. B., Gupta, S., Mao, H., Fiduccia, C. M. and Mattheyses, R. M. A linear-time
and Alizadeh, M. Placeto: Learning generalizable device heuristic for improving network partitions. In 19th De-
placement algorithms for distributed machine learning. sign Automation Conference, pp. 175–181, June 1982.
CoRR, abs/1906.08879, 2019. URL http://arxiv. doi: 10.1109/DAC.1982.1585498.
org/abs/1906.08879. Gilbert, E. N. and Pollak, H. O. Steiner minimal trees.
SIAM Journal on Applied Mathematics, 16(1):1–29,
Agnihotri, A., Ono, S., and Madden, P. Recursive bisec-
1968.
tion placement: Feng shui 5.0 implementation details. In
Proceedings of the International Symposium on Physical Hanan, M. and Kurtzberg, J. Placement techniques. In
Design, pp. 230–232, 01 2005. doi: 10.1145/1055137. Design Automation of Digital Systems, 1972.
1055186.
Hsu, M., Chang, Y., and Balabanov, V. Tsv-aware
Bo Hu and Marek-Sadowska, M. Multilevel fixed-point- analytical placement for 3d ic designs. In 2011
addition-based vlsi placement. IEEE Transactions on 48th ACM/EDAC/IEEE Design Automation Conference
Computer-Aided Design of Integrated Circuits and Sys- (DAC), pp. 664–669, June 2011.
tems, 24(8):1188–1203, Aug 2005. ISSN 1937-4151.
Huang, Y., Xie, Z., Fang, G., Yu, T., Ren, H., Fang, S.,
doi: 10.1109/TCAD.2005.850802.
Chen, Y., and Hu, J. Routability-driven macro placement
Brenner, U., Struzyna, M., and Vygen, J. Bonnplace: with embedded cnn-based prediction model. In Teich,
Placement of leading-edge chips by advanced combina- J. and Fummi, F. (eds.), Design, Automation & Test in
torial algorithms. Trans. Comp.-Aided Des. Integ. Cir. Europe Conference & Exhibition, DATE 2019, Florence,
Sys., 27(9):16071620, September 2008. ISSN 0278- Italy, March 25-29, 2019, pp. 180–185. IEEE, 2019.
0070. doi: 10.1109/TCAD.2008.927674. URL https: Kahng, A. B., Reda, S., and Qinke Wang. Architecture and
//doi.org/10.1109/TCAD.2008.927674. details of a high quality, large-scale analytical placer. In
ICCAD-2005. IEEE/ACM International Conference on
Breuer, M. A. A class of min-cut placement algorithms. In
Computer-Aided Design, 2005., pp. 891–898, Nov 2005.
Proceedings of the 14th Design Automation Conference,
doi: 10.1109/ICCAD.2005.1560188.
DAC 77, pp. 284290. IEEE Press, 1977.
Karypis, G. and Kumar, V. A hypergraph partitioning pack-
Chen, T., Jiang, Z., Hsu, T., Chen, H., and Chang, Y. Ntu- age. In HMETIS, 1998.
place3: An analytical placer for large-scale mixed-size
designs with preplaced blocks and density constraints. Kernighan, D. . A procedure for placement of standard-cell
IEEE Transactions on Computer-Aided Design of In- vlsi circuits. In IEEE TCAD, 1985.
tegrated Circuits and Systems, 27(7):1228–1240, July Kim, M. and Markov, I. L. Complx: A competitive primal-
2008. ISSN 1937-4151. doi: 10.1109/TCAD.2008. dual lagrange optimization for global placement. In DAC
923063. Design Automation Conference 2012, pp. 747–755, June
Chen, T.-C., Jiang, Z.-W., Hsu, T.-C., Chen, H.-C., and 2012.
Chang, Y.-W. A high-quality mixed-size analytical Kim, M.-C., Lee, D.-J., and Markov, I. L. Simpl: An
placer considering preplaced blocks and density con- effective placement algorithm. In Proceedings of the
straints. In Proceedings of the 2006 IEEE/ACM Interna- International Conference on Computer-Aided Design,
tional Conference on Computer-Aided Design, ICCAD ICCAD 10, pp. 649656. IEEE Press, 2010. ISBN
06, pp. 187192, New York, NY, USA, 2006. Association 9781424481927.
for Computing Machinery. ISBN 1595933891.
Kim, M.-C., Viswanathan, N., Alpert, C. J., Markov, I. L.,
Cheng, C., Kahng, A. B., Kang, I., and Wang, L. Replace: and Ramji, S. Maple: Multilevel adaptive placement for
Advancing solution quality and routability validation in mixed-size designs. In Proceedings of the 2012 ACM
global placement. IEEE Transactions on Computer- International Symposium on International Symposium
Aided Design of Integrated Circuits and Systems, 38(9): on Physical Design, ISPD, pp. 193200, New York, NY,
1717–1730, 2019. USA, 2012a. Association for Computing Machinery.
Chip Placement with Deep Reinforcement Learning

Kim, M.-C., Viswanathan, N., Alpert, C. J., Markov, Paliwal, A. S., Gimeno, F., Nair, V., Li, Y., Lubin, M.,
I. L., and Ramji, S. Maple: Multilevel adaptive place- Kohli, P., and Vinyals, O. Regal: Transfer learning
ment for mixed-size designs. In Proceedings of the for fast optimization of computation graphs. ArXiv,
2012 ACM International Symposium on International abs/1905.02494, 2019.
Symposium on Physical Design, ISPD 12, pp. 193200,
New York, NY, USA, 2012b. Association for Comput- Ren-Song Tsay, Kuh, E. S., and Chi-Ping Hsu. Proud:
ing Machinery. ISBN 9781450311670. doi: 10.1145/ a sea-of-gates placement algorithm. IEEE Design Test
2160916.2160958. URL https://doi.org/10. of Computers, 5(6):44–56, Dec 1988. ISSN 1558-1918.
1145/2160916.2160958. doi: 10.1109/54.9271.

Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. Opti- Roy, J. A., Papa, D. A., and Markov, I. L. Capo:
mization by simulated annealing. Science, 220(4598): Congestion-Driven Placement for Standard-cell and
671–680, 1983. ISSN 0036-8075. doi: 10.1126/ RTL Netlists with Incremental Capability, pp. 97–133.
science.220.4598.671. URL https://science. Springer US, Boston, MA, 2007.
sciencemag.org/content/220/4598/671. Sarrafzadeh, M., Wang, M., and Yang, X. Dragon:
A Placement Framework, pp. 57–89. Springer, 01
Lin, T., Chu, C., Shinnerl, J. R., Bustany, I., and Nedelchev,
2003. ISBN 978-1-4419-5309-4. doi: 10.1007/
I. Polar: Placement based on novel rough legaliza-
978-1-4757-3781-3 3.
tion and refinement. In Proceedings of the International
Conference on Computer-Aided Design, ICCAD 13, pp. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
357362. IEEE Press, 2013. ISBN 9781479910694. Klimov, O. Proximal policy optimization algorithms,
2017.
Lin, Y., Dhar, S., Li, W., Ren, H., Khailany, B., and Pan,
D. Z. Dreamplace: Deep learning toolkit-enabled gpu Sechen, C. and Sangiovanni-Vincentelli, A. L. Timber-
acceleration for modern vlsi placement. In Proceedings Wolf3.2: a new standard cell placement and global rout-
of the 56th Annual Design Automation Conference 2019, ing package. In DAC, pp. 432–439. IEEE Computer So-
DAC 19, 2019. ciety Press, 1986. doi: 10.1145/318013.318083.
Lu, J., Chen, P., Chang, C.-C., Sha, L., Huang, D. J.-H., Shahookar, K. and Mazumder, P. Vlsi cell placement tech-
Teng, C.-C., and Cheng, C.-K. Eplace: Electrostatics- niques. ACM Comput. Surv., 23(2):143220, June 1991.
based placement using fast fourier transform and nes- ISSN 0360-0300. doi: 10.1145/103724.103725. URL
terovs method. ACM Trans. Des. Autom. Electron. Syst., https://doi.org/10.1145/103724.103725.
20(2), 2015. ISSN 1084-4309.
Spindler, P., Schlichtmann, U., and Johannes, F. M.
Lu, J., Zhuang, H., Chen, P., Chang, H., Chang, C., Wong, Kraftwerk2a fast force-directed quadratic placement ap-
Y., Sha, L., Huang, D., Luo, Y., Teng, C., and Cheng, proach using an accurate net model. IEEE Transactions
C. eplace-ms: Electrostatics-based placement for mixed- on Computer-Aided Design of Integrated Circuits and
size circuits. IEEE Transactions on Computer-Aided De- Systems, 27(8):1398–1411, Aug 2008. ISSN 1937-4151.
sign of Integrated Circuits and Systems, 34(5):685–698, doi: 10.1109/TCAD.2008.925783.
2015.
Tao Luo and Pan, D. Z. Dplace2.0: A stable and effi-
Lu, J., Zhuang, H., Kang, I., Chen, P., and Cheng, cient analytical placement based on diffusion. In 2008
C.-K. Eplace-3d: Electrostatics based placement for Asia and South Pacific Design Automation Conference,
3d-ics. In Proceedings of the 2016 on International pp. 346–351, March 2008. doi: 10.1109/ASPDAC.2008.
Symposium on Physical Design, ISPD 16, New York, 4483972.
NY, USA, 2016. Association for Computing Machin-
ery. ISBN 9781450340397. doi: 10.1145/2872334. Viswanathan, N., Nam, G.-J., Alpert, C., Villarrubia, P.,
2872361. URL https://doi.org/10.1145/ Ren, H., and Chu, C. Rql: Global placement via re-
2872334.2872361. laxed quadratic spreading and linearization. In Pro-
ceedings - Design Automation Conference, pp. 453–458,
Nazi, A., Hang, W., Goldie, A., Ravi, S., and Mirhoseini, 07 2007a. ISBN 978-1-59593-627-1. doi: 10.1145/
A. Gap: Generalizable approximate graph partitioning 1278480.1278599.
framework, 2019.
Viswanathan, N., Pan, M., and Chu, C. FastPlace: An Ef-
Obermeier, B., Ranke, H., and Johannes, F. Kraftwerk: a ficient Multilevel Force-Directed Placement Algorithm,
versatile placement approach. In ISPD, pp. 242–244, 01 pp. 193–228. Springer, 01 2007b. doi: 10.1007/
2005. doi: 10.1145/1055137.1055190. 978-0-387-68739-1 8.
Chip Placement with Deep Reinforcement Learning

William, N., Ross, D., and Lu, S. Non-linear optimization

system and method for wire length and delay optimiza-
tion for an automatic electric circuit placer. In Patent,
2001.
Zhang, M. and Chen, Y. Link prediction based on graph
neural networks, 2018.

Zhiyao Xie Duke Univeristy, Durham, N. U. . Y.-H. H. . G.-

Q. F. . H. R. . S.-Y. F. . Y. C. . J. H. Routenet: Routabil-
ity prediction for mixed-size designs using convolutional
neural network. In IEEE/ACM International Conference
on Computer-Aided Design (ICCAD, 2018.

Zhou, Y., Roy, S., Abdolrashidi, A., Wong, D., Ma, P. C.,
Xu, Q., Zhong, M., Liu, H., Goldie, A., Mirhoseini, A.,
and Laudon, J. Gdp: Generalized device placement for
dataflow graphs, 2019.

SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
From Everand
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
i Code Academy
5/5 (4)
Wheatstone Bridge
92% (13)
Wheatstone Bridge
17 pages
Accelerating Chip Design With Machine Learning
No ratings yet
Accelerating Chip Design With Machine Learning
10 pages
Software-Defined Networks: A Systems Approach
From Everand
Software-Defined Networks: A Systems Approach
Larry Peterson
5/5 (1)
Unit 25 - Sound Recording 1
No ratings yet
Unit 25 - Sound Recording 1
72 pages
2021 Mirhoseini
No ratings yet
2021 Mirhoseini
23 pages
A Graph Placement Methodology 2021
No ratings yet
A Graph Placement Methodology 2021
24 pages
Mirhoseini et al. - 2021 - A graph placement methodology for fast chip design
No ratings yet
Mirhoseini et al. - 2021 - A graph placement methodology for fast chip design
23 pages
cs224r Practical Deep RL
No ratings yet
cs224r Practical Deep RL
77 pages
NeurIPS 2022 Maskplace Fast Chip Placement Via Reinforced Visual Representation Learning Paper Conference
No ratings yet
NeurIPS 2022 Maskplace Fast Chip Placement Via Reinforced Visual Representation Learning Paper Conference
12 pages
MaskPlace - Fast Chip Placement Via Reinforced Visual Representation Learning
No ratings yet
MaskPlace - Fast Chip Placement Via Reinforced Visual Representation Learning
21 pages
VLSI Placement Parameter Optimization Using Deep Reinforcement Learning
No ratings yet
VLSI Placement Parameter Optimization Using Deep Reinforcement Learning
9 pages
On Joint Learning For Solving Placement and Routing in Chip Design
No ratings yet
On Joint Learning For Solving Placement and Routing in Chip Design
12 pages
VLSI Placement Parameter Optimization Using Deep Reinforcement Learning
No ratings yet
VLSI Placement Parameter Optimization Using Deep Reinforcement Learning
9 pages
The Policy Gradient Placement
No ratings yet
The Policy Gradient Placement
13 pages
3-HC24.synopsys.SteliosDiamantidis.v03
No ratings yet
3-HC24.synopsys.SteliosDiamantidis.v03
42 pages
Applsci 14 02905 v2
No ratings yet
Applsci 14 02905 v2
14 pages
ICML 2023 ChiPFormer
No ratings yet
ICML 2023 ChiPFormer
19 pages
Yan et al. - 2022 - Towards Machine Learning for Placement and Routing in Chip Design a Methodological Overview
No ratings yet
Yan et al. - 2022 - Towards Machine Learning for Placement and Routing in Chip Design a Methodological Overview
9 pages
Parameter Optimization of VLSI Placement Through Deep Reinforcement Learning
No ratings yet
Parameter Optimization of VLSI Placement Through Deep Reinforcement Learning
4 pages
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
From Everand
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
William Smith
No ratings yet
Learn To Floorplan Through Acquisition of Effective Local Search
No ratings yet
Learn To Floorplan Through Acquisition of Effective Local Search
8 pages
Floorplanning With Graph Attention
No ratings yet
Floorplanning With Graph Attention
6 pages
GraphPlanner - Floorplanning With Graph Neural Network
No ratings yet
GraphPlanner - Floorplanning With Graph Neural Network
24 pages
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
Markov - 2024 - The False Dawn Reevaluating Google's Reinforcement Learning for Chip Macro Placement
No ratings yet
Markov - 2024 - The False Dawn Reevaluating Google's Reinforcement Learning for Chip Macro Placement
19 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
B E - T - E P Ai-B C P A: Enchmarking ND O ND Erformance OF Ased HIP Lacement Lgorithms
No ratings yet
B E - T - E P Ai-B C P A: Enchmarking ND O ND Erformance OF Ased HIP Lacement Lgorithms
19 pages
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
2-HC2024.nvidia.MarkRen.Intro.v04
No ratings yet
2-HC2024.nvidia.MarkRen.Intro.v04
25 pages
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet
Goldie et al. - 2024 - Addendum A graph placement methodology for fast chip design
No ratings yet
Goldie et al. - 2024 - Addendum A graph placement methodology for fast chip design
2 pages
DeepMind - Faster Sorting Algorithms Discovered Using Deep Reinforcement Learning
No ratings yet
DeepMind - Faster Sorting Algorithms Discovered Using Deep Reinforcement Learning
17 pages
Wiremaskbbo
No ratings yet
Wiremaskbbo
19 pages
Yu - 2023 - Machine Learning in EDA When and How
No ratings yet
Yu - 2023 - Machine Learning in EDA When and How
6 pages
The Architect's Guide to NestJS: Architectural Trade-Offs and Implementation Patterns with NestJS
From Everand
The Architect's Guide to NestJS: Architectural Trade-Offs and Implementation Patterns with NestJS
Aarav Joshi
No ratings yet
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
PyTorch Cookbook: 100+ Solutions across RNNs, CNNs, python tools, distributed training and graph networks
From Everand
PyTorch Cookbook: 100+ Solutions across RNNs, CNNs, python tools, distributed training and graph networks
Matthew Rosch
No ratings yet
PyTorch Cookbook
From Everand
PyTorch Cookbook
Matthew Rosch
No ratings yet
Faster_sorting_algorithms_discovered_using_deep_re
No ratings yet
Faster_sorting_algorithms_discovered_using_deep_re
18 pages
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Lecture9 Placement1
100% (1)
Lecture9 Placement1
68 pages
ML Assign Shubham
No ratings yet
ML Assign Shubham
13 pages
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
Dream Place
No ratings yet
Dream Place
6 pages
Math for Deep Learning: What You Need to Know to Understand Neural Networks
From Everand
Math for Deep Learning: What You Need to Know to Understand Neural Networks
Ronald T. Kneusel
No ratings yet
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
No ratings yet
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
12 pages
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
No ratings yet
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
12 pages
VLSI Circuit Placement With Rectilinear Modules Using Three-Layer Force-Directed Self-Organizing Maps
No ratings yet
VLSI Circuit Placement With Rectilinear Modules Using Three-Layer Force-Directed Self-Organizing Maps
16 pages
Software Defined Networking (SDN) - a definitive guide
From Everand
Software Defined Networking (SDN) - a definitive guide
Rajesh Kumar Sundararajan
2/5 (2)
DevOps for Networking
From Everand
DevOps for Networking
Steven Armstrong
4/5 (2)
Deep Reinforcement Learning Yuxi Li Itebooks download
No ratings yet
Deep Reinforcement Learning Yuxi Li Itebooks download
53 pages
The Principal-Agent Alignment Problem in Artificial
No ratings yet
The Principal-Agent Alignment Problem in Artificial
166 pages
Comprehensive Machine Learning Techniques: A Guide for the Experienced Analyst
From Everand
Comprehensive Machine Learning Techniques: A Guide for the Experienced Analyst
Adam Jones
No ratings yet
Fundamentals of Machine Learning: An Introduction to Neural Networks
From Everand
Fundamentals of Machine Learning: An Introduction to Neural Networks
Peter Johnson
No ratings yet
Deep Learning Based
No ratings yet
Deep Learning Based
16 pages
Introduction to Neural Architecture Search: Optimizing AI Models
From Everand
Introduction to Neural Architecture Search: Optimizing AI Models
Robert Johnson
No ratings yet
PrefixRL Optimization of Parallel Prefix Circuits Using Deep Reinforcement Learning
No ratings yet
PrefixRL Optimization of Parallel Prefix Circuits Using Deep Reinforcement Learning
6 pages
SmartAgent - Creating Reinforcement Learning Tetris AI
No ratings yet
SmartAgent - Creating Reinforcement Learning Tetris AI
52 pages
Reinforcement Learning For Cyber-Physical Systems: Xing Liu, Hansong Xu, Weixian Liao, and Wei Yu
No ratings yet
Reinforcement Learning For Cyber-Physical Systems: Xing Liu, Hansong Xu, Weixian Liao, and Wei Yu
10 pages
Chap 1
No ratings yet
Chap 1
30 pages
Floorplanning Challenges in Early Chip Planning: September 2011
No ratings yet
Floorplanning Challenges in Early Chip Planning: September 2011
7 pages
A Supervised Spectral Framework For Hypergraph Partitioning Solution Improvement
No ratings yet
A Supervised Spectral Framework For Hypergraph Partitioning Solution Improvement
9 pages
Unification of Partitioning Placement and Floorplanning
No ratings yet
Unification of Partitioning Placement and Floorplanning
8 pages
Assessment of Reinforcement Learning For Macro Placement
No ratings yet
Assessment of Reinforcement Learning For Macro Placement
9 pages
Gradient-Based Learning Applied To Document Recognition
No ratings yet
Gradient-Based Learning Applied To Document Recognition
47 pages
PTK40A Quick Start Guide
No ratings yet
PTK40A Quick Start Guide
16 pages
SM945
No ratings yet
SM945
2 pages
PayPass - Transaction Optimization For MChip Terminals
No ratings yet
PayPass - Transaction Optimization For MChip Terminals
17 pages
Toshiba PCB Design Manual For Colour TV Design
100% (1)
Toshiba PCB Design Manual For Colour TV Design
149 pages
Nema Ab 1-1986
No ratings yet
Nema Ab 1-1986
45 pages
1) Operation and Maintainance Manual For SVCD Rev 0
No ratings yet
1) Operation and Maintainance Manual For SVCD Rev 0
14 pages
CUDA Based Minimum Spanning Tree
No ratings yet
CUDA Based Minimum Spanning Tree
8 pages
Quantitative Estimation of Amino Acids by Ninhydrin
0% (1)
Quantitative Estimation of Amino Acids by Ninhydrin
4 pages
2N14MX - User Manual
No ratings yet
2N14MX - User Manual
15 pages
086 - Quote Bahan Komponen SBY Rev1
No ratings yet
086 - Quote Bahan Komponen SBY Rev1
2 pages
AC Contactor
No ratings yet
AC Contactor
12 pages
Headlight Switch: The Headlights Can Be Operated Manually or Automatically
No ratings yet
Headlight Switch: The Headlights Can Be Operated Manually or Automatically
8 pages
Origin 5 Combo: User Manual
No ratings yet
Origin 5 Combo: User Manual
8 pages
Application of Natural Convection in Electronic Components
No ratings yet
Application of Natural Convection in Electronic Components
8 pages
AS3000 TESTING SPREADSHEET
No ratings yet
AS3000 TESTING SPREADSHEET
6 pages
Architecture1 1 (2012)
No ratings yet
Architecture1 1 (2012)
87 pages
Magnetically Coupled Circuit
No ratings yet
Magnetically Coupled Circuit
83 pages
IOT Garbage Monitoring With Weight Sensing
No ratings yet
IOT Garbage Monitoring With Weight Sensing
3 pages
Antenna Fundamentals: Radiation From A Short Dipole Antenna (Hertz Dipole)
No ratings yet
Antenna Fundamentals: Radiation From A Short Dipole Antenna (Hertz Dipole)
19 pages
KNT-002-TS-ME6-DS-014 - 0 - Mechanical DataSheet For Pedestal Crane
No ratings yet
KNT-002-TS-ME6-DS-014 - 0 - Mechanical DataSheet For Pedestal Crane
10 pages
UD10 Dettronics Gas Detector Data Sheet
No ratings yet
UD10 Dettronics Gas Detector Data Sheet
2 pages
Acer Pricelist Singapore 2019
No ratings yet
Acer Pricelist Singapore 2019
218 pages
Interference of Wave 1
No ratings yet
Interference of Wave 1
4 pages
CC CV Regulator Schematic
100% (7)
CC CV Regulator Schematic
1 page
Encorewire THHN Sse
No ratings yet
Encorewire THHN Sse
1 page
NIST Precision Spectroscopy, Diode Lasers and Optical Frequency Measurement Technology, PDF
No ratings yet
NIST Precision Spectroscopy, Diode Lasers and Optical Frequency Measurement Technology, PDF
288 pages
Unistream™ Uni-I/O™ Modules: Analog Inputs
No ratings yet
Unistream™ Uni-I/O™ Modules: Analog Inputs
5 pages
Pressure Transmitters: PXT-K Series
No ratings yet
Pressure Transmitters: PXT-K Series
2 pages

Chip Placement With Deep Reinforcement Learning

Uploaded by

Chip Placement With Deep Reinforcement Learning

Uploaded by

Chip Placement with Deep Reinforcement Learning

Abstract years to design, leaving us with the speculative task of opti-

3.3.4. S TANDARD CELL PLACEMENT

3.3.5. ROUTING CONGESTION

3.3.6. D ENSITY following section, we discuss how we process these fea-

William, N., Ross, D., and Lu, S. Non-linear optimization

Zhiyao Xie Duke Univeristy, Durham, N. U. . Y.-H. H. . G.-

You might also like