Retrieval-Guided Reinforcement Learning For Boolean Circuit Minimization
Retrieval-Guided Reinforcement Learning For Boolean Circuit Minimization
A BSTRACT
Logic synthesis, a pivotal stage in chip design, entails optimizing chip specifica-
tions encoded in hardware description languages like Verilog into highly efficient
implementations using Boolean logic gates. The process involves a sequential
application of logic minimization heuristics (“synthesis recipe"), with their arrange-
ment significantly impacting crucial metrics such as area and delay. Addressing the
challenge posed by the broad spectrum of design complexities — from variations
of past designs (e.g., adders and multipliers) to entirely novel configurations (e.g.,
innovative processor instructions) — requires a nuanced ‘synthesis recipe’ guided
by human expertise and intuition. This study conducts a thorough examination of
learning and search techniques for logic synthesis, unearthing a surprising revela-
tion: pre-trained agents, when confronted with entirely novel designs, may veer off
course, detrimentally affecting the search trajectory. We present ABC-RL, a metic-
ulously tuned α parameter that adeptly adjusts recommendations from pre-trained
agents during the search process. Computed based on similarity scores through
nearest neighbor retrieval from the training dataset, ABC-RL yields superior syn-
thesis recipes tailored for a wide array of hardware designs. Our findings showcase
substantial enhancements in the Quality-of-result (QoR) of synthesized circuits,
boasting improvements of up to 24.8% compared to state-of-the-art techniques. Fur-
thermore, ABC-RL achieves an impressive up to 9x reduction in runtime (iso-QoR)
when compared to current state-of-the-art methodologies.
1 I NTRODUCTION
Modern chips are designed using sophisticated electronic design automation (EDA) algorithms that
automatically convert logic functions expressed in a hardware description language (HDL) like
Verilog to a physical layout that can be manufactured at a semiconductor foundry. EDA involves a
sequence of steps, the first of which is logic synthesis. Logic synthesis converts HDL into a low-level
“netlist” of Boolean logic gates that implement the desired function. A netlist is a graph whose
nodes are logic gates (e.g., ANDs, NOTs, ORs) and whose edges represent connections between
gates. Subsequent EDA steps like physical design then place gates on an x-y surface and route wires
between them. As the first step in the EDA flow, any inefficiencies in this step, e.g., redundant logic
gates, flow down throughout the EDA flow. Thus, the quality of logic synthesis—the area, power and
delay of the synthesized netlist—is crucial to the quality of the final design (Amarú et al., 2017).
As shown in Fig. 1, state-of-art logic synthesis algorithms perform a sequence of functionality-
preserving transformations, e.g., eliminating redundant nodes, reordering Boolean formulas, and
streamlining node representations, to arrive at a final optimized netlist Yang et al. (2012); Mishchenko
∗
The work was done when the first author was a graduate student at New York University
1
Published as a conference paper at ICLR 2024
et al. (2006); Riener et al. (2019); Yu (2020); Neto et al. (2022). A specific sequence of transformations
is called a “synthesis recipe.” Typically, designers use experience and intuition to pick a “good"
synthesis recipe from the solution space of all recipes and iterate if the quality of result is poor.
This manual process is costly and time-consuming, especially for modern, complex chips. Further,
the design space of synthesis recipes is large. ABC (Brayton & Mishchenko, 2010), a state-of-art
open-source synthesis tool, provides a toolkit of 7 transformations, yielding a design space of 107
recipes (assuming 10-step recipes). A growing body of work has sought to leverage machine learning
and reinforcement learning (RL) to automatically identify high-quality synthesis recipes (Yu et al.,
2018; Hosny et al., 2020; Zhu et al., 2020; Yu, 2020; Neto et al., 2022; Chowdhury et al., 2022),
showing promising results.
Prior work in this area falls within one of two cat- module fa
(in1,
In2 x1.x2.x3.x4 Nodes:
V
And-
egories. The first line of work (Yu, 2020; Neto in2,cin,
In1 Sum Invert x4
V
4
sum,cout); V Depth:V
Graph
et al., 2022) proposes efficient search heuristics, input Cin x1 x2 x3 x1
3
Cout
Monte-Carlo tree search (MCTS) in particular, in1,in2,cin;
output rwz
re-write (rw)
sum,cout; b rw Choices x1.x2.x3.x4 Nodes:
to explore the solution space of synthesis recipes (M) = 7
V
3
rf
V
assign sum = x4
for a given netlist. These methods train a policy in1^in2^cin ;
rf
x1 Depth:
V
assign cout = x2 x3 3
agent during MCTS iterations, but the agent is (in1&in2)|
rf
balance (b)
(in2&cin)|
initialized from scratch for a given netlist and (in1&cin) ;
x1.x2.x3.x4 Nodes:
3
V
does not learn from past data, e.g., repositories endmodule V V
Depth:
Technology Best Recipe
of previously synthesized designs abundant in RTL verilog
mapping QoR length (L)= 10 x4 x1x2 x3
2
10
in our experiments—with fast but approximate 8 12
QoR estimates from a pre-trained deep network. 6 10
4 MCTS 8 MCTS
2 MCTS+Learning 6 MCTS+Learning
0
Motivational Observation. Given the tree- 20 40 60 80 100 4 20 40 60 80 100
structured search space (see Fig. 2), we begin by Iterations Iterations
building and evaluating a baseline solution that: (a) square (b) cavlc
1) pre-trains an offline RL agent on a dataset Figure 2: Reduction in area-delay product (greater re-
of past designs; and then 2) performs RL-agent duction is better) over search iterations for a pure search
guided MCTS search over synthesis recipe space strategy (MCTS) and search augmented with learning
for new designs. Although details vary, this (MCTS+Learning). Learning an offline policy does not
strategy has been successful in other tree-search help in both cases.
problems, most prominently in AlphaGo (Silver et al., 2016) and AlphaZero Silver et al. (2017).
Interestingly, we found that although the agent learned on past data helps slightly on average, on
11 out of 20 designs, the baseline strategy underperformed simple MCTS search (see Table 2 for
detailed results). Fig. 2 shows two examples: in both, pure MCTS achieves better solutions faster
than learning augmented MCTS. Ideally, we seek solutions that provide consistent improvements
over search-only methods by leveraging past data.
One reason for poor performance is the large diversity of netlists in logic synthesis benchmarks.
Netlists vary significantly in size (100-46K nodes) and function. The EPFL benchmarks, for example,
partition netists into those that perform arithmetic functions (e.g., adders, dividers, square-roots) and
control functions (e.g., finite state machines, pattern detectors, etc.) because of large differences in
their graph structures. In practice, while designers often reuse components from past designs, they
2
Published as a conference paper at ICLR 2024
frequently come up with new designs and novel functions. For netlists that differ significantly from
those in the training dataset, pre-trained agents hurt by diverting search towards suboptimal recipes.
Overview of Approach. We propose ABC-RL, a new retrieval guided RL approach that adaptively
tunes the contribution of the pre-trained policy agent in the online search stage depending on the input
netlist. ABC-RL computes a tuning factor α ∈ [0, 1] by computing a similarity score of the input
netlist and its nearest neighbor retrieved from the training dataset. Similarity is computed on graph
neural network (GNN) features learned during training. If the new netlist is identical to one in the
training dataset, we set α = 0, and only the pre-trained agent is used to output the synthesis recipe.
Conversely, when α = 1 (i.e., the netlist is novel), the pre-trained agent is ignored and ABC-RL
defaults to a search strategy. Real-world netlists lie in between these extremes; accordingly ABC-RL
modulating the relative contributions of the pre-trained agent and pure MCTS search to the final result.
We make careful architectural choices in our implementation of ABC-RL, including the choice of
netlist and synthesis recipe encoders and state-space representation. They are described in Section 2.3.
Although our main contribution is ABC-RL, the MCTS+Learning baseline (i.e., without retrieval)
has not been evaluated for logic synthesis. Our evaluations highlight its benefits and drawbacks.
Snapshot of Results and Key Contributions. Across three common logic synthesis benchmark
suites ABC-RL consistently outperforms prior SOTA ML-based logic synthesis solutions, our own
MCTS+Learning baseline, and an MCTS+Learning solution for chip placement Mirhoseini et al.
(2021), a different EDA problem, adapted to logic synthesis. ABC-RL achieves upto 24.8% geo.
mean improvements in QoR (here, area-delay product) over SOTA. At iso-QoR, ABC-RL reduces
synthesis runtime by up to 9×.
In summary, our key contributions are:
• We propose ABC-RL, a new retrieval-guided RL approach for logic synthesis that learns from
past historical data, i.e., previously seen netlists, to optimize the QoR for a new netlist at test
time. In doing so, we identify distribution shift between training and test data as a key problem
in this domain, and show that a baseline strategy that augments MCTS with a pre-trained policy
agent Silver et al. (2016) fails to improve upon pure MCTS search.
• To address these concerns, we introduce new lightweight retrieval mechanism in ABC-RL that uses
the similarity score between the new test netlist and its nearest neighbor in the training set. This
score modulates the relative contribution of pre-trained RL agent using a modulation parameter α,
down-weighting the learned policy depending on the novelty of the test netlist.
• We make careful architectural choices for ABC-RL’s policy agent, including a new transformer
based synthesis encoder, and evaluate across three of common logic synthesis benchmarks. ABC-
RL consistently outperforms prior SOTA ML for logic synthesis methods on QoR and runtime,
as also a recent ML for chip-placement method (Mirhoseini et al., 2021) adapted to our problem
setting. Ablation studies establish the importance of α-tuning to ABC-RL.
• While our focus is on logic synthesis, our lightweight retrieval-guided approach might find use in
RL problems where online runtime and training-test distribution shift are key concerns.
We now describe ABC-RL, starting with a precise problem formulation and background.
2 P ROPOSED A PPROACH
2.1 P ROBLEM S TATEMENT AND BACKGROUND
We begin by formally defining the logic synthesis problem using ABC (Brayton & Mishchenko,
2010), the leading open-source synthesis tool, as an exemplar. As shown in Figure 1, ABC first
converts the Verilog description of a chip into an unoptimized And-Invert-Graph (AIG) G0 ∈ G,
where G is the set of all directed acyclic graphs. The AIG represents AND gates as nodes, wires/NOT
gates as solid/dashed edges, and implements the same Boolean function as the input Verilog. Next,
ABC performs a series of functionality-preserving transformations on G0 . Transformations are
picked from a finite set of M actions, A = {rf, rm, . . . , b}. For ABC, M = 7. Applying an action
on an AIG yields a new AIG as determined by the synthesis function S : G × A → G. A synthesis
recipe R ∈ AL is a sequence of L actions that are applied to G0 in order. Given a synthesis recipe
3
Published as a conference paper at ICLR 2024
P = {a0 , a1 , . . . , aL−1 } (ai ∈ A), we obtain Gi+1 = S(Gi , ai ) for all i ∈ [0, L − 1] where GL is
the final optimized AIG. Finally, let QoR : G → R measure the quality of graph G, for instance, its
inverse area-delay product (so larger is better). Then, we seek to solve this optimization problem:
argmax QoR(GL ), s.t. Gi+1 = S(Gi , ai ) ∀i ∈ [0, L − 1]. (1)
P ∈AL
We now discuss ABC-RL, our proposed approach to solve this optimization problem. In addition to
G0 , the AIG to be synthesized, we will assume access to a training set of AIGs to aid optimization.
The tree-structured solution space for logic synthesis motivated prior work (Yu, 2020; Neto et al.,
2022) to adopt an MCTS-based search strategy that we briefly review here. A state s here is the
current AIG after l transformations. In a given state, any action a ∈ A can be picked as described
above. Finally, the reward QoR(GL ) is delayed to the final synthesis step. In iteration k of the
search, MCTS keeps track of two functions: QkM CT S (s, a) which is measure the “goodness" of a state
k
action pair, and UM CT S (s, a) which represents upper confidence tree (UCT) factor that encourages
k
exploration of less visited states and actions. The policy πM CT S (s):
k k k
πM CT S (s) = argmax QM CT S (s, a) + UM CT S (s, a) . (2)
a∈A
balances exploitation against exploration by combining the two terms. Further details are in §A.3.
We describe our proposed solution in two steps. First, we describe MCTS+Learning, which builds
similar principles as Silver et al. (2016) by training an reinforcement learning (RL) policy agent
on prior netlists to guide Monte Carlo tree search (MCTS) search, highlighting how prior work is
adapted to the logic synthesis problem. Then, we describe our full solution, ABC-RL, that uses novel
retrieval-guided augmentation to significantly improve MCTS+Learning.
4
Published as a conference paper at ICLR 2024
Validation
1 circuits 1 4 1 2
Winner Test
? circuit Query 3
Agent Agent
Training 4 2 MCTS+ 5
2 MCTS
MCTS Nearest
corpus Guide Sample +Learning tuned learning neighbour
and Collect Query
search data 3 retrieval
Train AUROC 5
Nearest 4
Buffer Set temp. (T), 6
3 neighbour Compute
Agent retrieval threshold
Generate recipe
Figure 4: ABC-RL flow: Training the agent (left), setting temperature T and threshold δth (mid) and Recipe
generation at inference-time (right)
cross-entropy loss between the learned policy and the MCTS policy over samples picked from a
replay buffer. The learned policy is used during inference to bias the upper confidence tree (UCT in
∗k
Eq. 2) of MCTS towards favorable paths by computing a new UM CT S (s, a) as:
∗k k
UM CT S (s, a) = πθ (s, a) · UM CT S (s, a). (3)
For completeness, we outline the pseudocode for RL-training in the appendix (Algorithm 1).
As we noted, hardware designs frequently contain both familiar and entirely new components. In
the latter case, our results indicate that the learned RL-agents can sometimes hurt performance on
novel inputs by biasing search towards sub-optimal synthesis recipes. In ABC-RL, we introduce a
new term α ∈ [0, 1] in Equation 3 that strategically weights the from pre-trained agents contribution,
completely turning it off when α = 1 (novel circuit) and defaulting to the baseline approach when
α = 0.
(1) Similarity score computation: To quantify novelty of a new netlist, G0 , at test-time, we compute
a similarity score with respect to its nearest neighbor in the training dataset Dtr = {Gtr tr
0 , . . . , GNt r }.
To avoid expenseive nearest neighbor queries in the graph space, for instance, via sub-graph iso-
morphisms, we leverage the graph encodings, hG already learned by the policy agent. Specifically,
h ·h
we output the smallest cosine distance (∆cos (hG1 , hG2 ) = 1 − |hGG1||hGG2 | ) between the test AIG
1 2
embedding and all graphs in the training set: δG0 = mini ∆cos (hG0 , hGtr i
).
(2) Tuning agent’s recommendation during MCTS: To modulate the balance between the prior
learned policy and pure search, we update the prior UCT with α ∈ [0, 1] as follows:
∗k (1−α) k
UM CT S (s, a) = πθ (s, a) · UM CT S (s, a), (4)
and α is computed by passing similarity score, δG0 , through a sigmoid function, α = σδth ,T (δG0 ),
1
defined as σδth ,T (z) = −
z−δth with threshold (δth ) and temperature (T ) hyperparameters.
1+e T
Eq.4 allows α to smoothly vary in [0, 1] as intended, while threshold and temperature hyperparameters
control the shape of the sigmoid. Threshold δth controls how close new netlists have to be to the
training data to be considered “novel." In general, small thresholds bias ABC-RL towards more
search and less learning from past data. Temperature δth controls the transition from “previously
seen" to novel. Small temperatures cause ABC-RL to create a harder thresholds between previously
seen and novel designs. Both hyperparameters are chosen using validation data.
(3) Putting it all together: In Fig. 4, we present an overview of the ABC-RL. We begin by training
an RL-agent on training dataset Dtr . Then, we use a separate held-out validation dataset to tune
threshold, δth , and temperature, T , by comparing wins/losses of baseline MCTS+Learning vs. ABC-
RL and performing a grid-search. During inference on a new netlist G0 , ABC-RL retrieves the nearest
neighbor from the training data, computes α and performs online α-guided search using weighted
recommendations from the pre-trained RL agent.
3 E MPIRICAL E VALUATION
3.1 E XPERIMENTAL S ETUP
5
Published as a conference paper at ICLR 2024
Split Circuits
Datasets: We consider
alu2, apex3, apex5, b2, c1355, c5315, c2670, c6288, prom2, frg1, i7, i8, m3,
Train three datasets used by
max512, table5, adder, log2, max, multiplier, arbiter, ctrl, int2float, priority
Valid apex7, c1908, c3540, frg2, max128, apex6, c432, c499, seq, table3, i10, sin, i2c
logic synthesis com-
munity: MCNC Yang
alu4, apex1, apex2, apex4, i9, m4, prom1, b9, c880, c7552, pair, max1024 {C1-C12},
Test
bar, div, square, sqrt {A1-A4}, cavlc, mem_ctrl, router, voter {R1-R4}
(1991), EPFL arith-
metic and EPFL ran-
Table 1: Training, validation and test splits in our experiments. Netlists from each dom control bench-
benchmark are represented in each split. In the test set, MCNC netlists are relabeled marks Amarú et al.
[C1-C12], EPFL-arith to [A1-A4] and EPFL-control to [R1-R4]. (2015). MCNC bench-
marks have 38 netlists
ranging from 100–8K
nodes. EPFL arithmetic benchmarks have operations like additions, multiplications etc. and have
1000-44K nodes. EPFL random control benchmarks have finite-state machines, routing logic and
other functions with 100 –46K nodes.
Train-test split: We randomly split the 56 total netlists obtained from all three benchmarks into 23
netlists for training 13 for validation (11 MCNC, 1 EPFL-arith, 1 EPFL-rand) and remaining 20 for
test (see Table 1). We ensure that netlists from each benchmark are represented proportionally in
training, validation and test data.
Optimization objective and metrics: We seek to identify the best L = 10 synthesis recipes.
Consistent with prior works Hosny et al. (2020); Zhu et al. (2020); Neto et al. (2022), we use area-
delay product (ADP) as our QoR metric. Area and delay values are obtained using a 7nm technology
library post-technology mapping of the synthesized AIG. As a baseline, we compare against ADP of
the resyn2 synthesis recipe as is also done in prior work Neto et al. (2022); Chowdhury et al. (2022).
In addition to ADP reduction, we report runtime reduction of ABC-RL at iso-QoR, i.e., how much
faster ABC-RL is in reaching the best ADP achieved by competing methods. During evaluations on
test circuits, we give each technique a total budget of 100 synthesis runs.
Training details and hyper-parameters: To train our RL-agents, we use He initialization He et al.
(2015) for weights and following Andrychowicz et al. (2020), multiply weights of the final layer with
0.01 to prevent bias towards any one action. Agents are trained for 50 epochs using Adam with an
initial learning rate of 0.01. In each training epoch, we perform MCTS on all netlists with an MCTS
search budget K = 512 per synthesis level. After MCTS simulations, we sample L × Ntr (Ntr is
the number of training circuits) experiences from the replay buffer (size 2 × L × Ntr ) for training.
To stabilize training, we normalize QoR rewards (Appendix C.1) and clip it to [−1, +1] Mnih et al.
(2015). We set T = 100 and δth = 0.007 based on our validation data.
We performed the training on a server machine with one NVIDIA RTX A4000 with 16GB VRAM.
The major bottleneck during training is the synthesis time for running ABC; actual gradient updates
are relatively inexpensive. Fully training the RL-agent training took around 9 days.
Baselines for comparison: We compare ABC-RL with five main methods: (1) MCTS: Search-only
MCTS Neto et al. (2022); (2) DRiLLS: RL agent trained via A2C using hand-crafted AIG features
(not on past-training data) Hosny et al. (2020) (3) Online-RL: RL agent trained online via PPO
using Graph Convolutional Networks for AIG feature extraction (but not on past training data) (Zhu
et al., 2020); (4) SA+Pred: simulated annealing (SA) with QoR predictor learned from training
data Chowdhury et al. (2022); and (5) MCTS+L(earning): our own baseline MCTS+Learning
solution using a pre-trained RL-agent (Section 2.3.1). MCTS and SA+Pred are the current state-of-
the-art (SOTA) methods. DRiLLS and Online-RL are similar and under-perform MCTS. We report
results versus exisitng methods for completeness. MCTS+L is new and has not been evaluated on
logic synthesis. A final and sixth baseline for comparison is MCTS+L+FT (Mirhoseini et al., 2021)
which is proposed for chip placement, a different EDA problem, but we adapt for logic synthesis.
MCTS+FT is similar to MCTS+L but continues to fine-tune its pre-trained RL-agent on test inputs.
In Table 2, we compare ABC-RL over SOTA methods Hosny et al. (2020); Zhu et al. (2020); Neto
et al. (2022); Chowdhury et al. (2022) in terms of percentage area-delay product (ADP) reduction
6
Published as a conference paper at ICLR 2024
20.0
20 22
18 20 14
17.5
Reduction (%) 16 18 12
15.0 16 10
12.5 14 14 8
10.0 12 12 6 MCTS
7.5 10 10 SA+Pred.
5.0 8 4 ABC-RL
20 40 60 80 100 8
6 2
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
Iterations Iterations Iterations Iterations
(a) alu4 (b) apex1 (c) c880 (d) apex4
Figure 5: Area-delay product reduction (in %) compared to resyn2 on MCNC circuits.
(relative to the baseline resyn2 recipe). We also report speed-ups of ABC-RL to reach the same
QoR as the SOTA methods. Given the long synthesis runtimes, speed-up at iso-QoR as an equally
important metric. Overall, ABC-RL achieves the largest geo. mean reduction in ADP, reducing
ADP by 25% (smaller ADP is better) over resyn2. The next best method is our own MCTS+L
implementation which achieves 20.7% ADP reduction, although it is only slightly better than MCTS.
ABC-RL is also consistently the winner in all but four netlists, and in each of these cases ABC-
RL finishes second only marginally behind the winner. Interestingly, the winner in these cases is
Online+RL, a method that overall has the poorest performance. Importantly, ABC-RL is consistently
better than MCTS+L and MCTS. That is, we show that both learning on past data and using
retrieval are key to good performance. Finally, ABC-RL is also faster than competing methods
with a geo. mean speed-up of 3.8× over SOTA. We dive deeper into specific benchmark suite to
understand where ABC-RL’s improvements stem from.
MCNC benchmarks: ABC-RL provides with substantial improvements on benchmarks such as
C1 (apex1), C7 (prom1), C9 (c880), and C12 (max1024). Fig. 5 plots the ADP reductions over
search iterations for MCTS, SA+Pred, and ABC-RL over four netlists from MCNC. Note from Fig. 5
that ABC-RL in most cases achieves higher ADP reductions earlier than competing methods. Thus,
designers can terminate search when a desired ADP is achieved. This results in run-time speedups
upto 5.9× at iso-QoR compared to standard MCTS. (see Appendix D.1.1 for complete results)
Table 2: Area-delay reduction (%) compared to resyn2: DRiLLS Hosny et al. (2020), Online-RL Zhu et al.
(2020), SA+Pred. Chowdhury et al. (2022), MCTS Neto et al. (2022), MCTS+Learning (MCTS+L) and ABC-RL.
Speed-up vs. MCTS
.
ADP reduction (in %)
Methods
MCNC EPFL arith EPFL random Geo-
mean
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 A1 A2 A3 A4 R1 R2 R3 R4
DRiLLS 18.9 6.7 8.0 13.0 38.4 19.1 5.4 18.0 14.3 18.6 6.6 11.0 28.8 34.7 11.1 22.7 15.4 23.0 12.9 10.1 16.1
Online-RL 20.6 6.6 8.1 13.5 39.4 21.0 5.0 17.9 16.2 20.2 4.7 11.4 36.9 34.8 10.4 24.1 16.3 22.5 10.7 8.3 16.1
SA+Pred. 17.6 17.0 15.6 13.0 46.5 18.2 8.5 23.6 19.9 17.6 10.0 20.3 36.9 25.2 8.2 21.1 16.8 21.5 25.7 26.2 19.7
MCTS 17.1 15.9 13.1 13.0 46.9 14.9 6.5 23.2 17.7 20.5 13.1 19.7 25.4 46.0 10.7 18.7 15.9 21.6 21.6 27.1 19.8
MCTS+L 17.0 19.6 16.9 12.5 46.9 13.9 10.1 24.1 17.1 16.8 8.1 19.5 36.9 55.9 10.3 22.7 15.8 24.1 38.9 26.9 20.7
ABC-RL 19.9 19.6 16.8 15.0 46.9 19.1 12.1 24.3 21.3 21.1 13.6 21.6 36.9 56.2 14.0 23.8 19.8 30.2 38.9 30.0 25.3
Iso-QoR
1.9x 5.9x 1.8x 1.6x 1.2x 1.3x 3.2x 4.1x 2.2x 1.2x 0.9x 1.8x 3.7x 9.0x 4.2x 8.3x 5.0x 6.4x 3.1x 5.7x 3.8x
Speed-up
7
Published as a conference paper at ICLR 2024
16 20 30 25
14 18 28
Reduction (%)
12 16 20
10 14 26 15
8 12 24
6 MCTS 10 10
4 ABC-RL (w/o tune) 8 22
2 ABC-RL 6 20 5
0
20 40 60 80 100 4 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
Iterations Iterations Iterations Iterations
(a) square (b) cavlc (c) voter (d) c7552
To further examine the benefits of ABC-RL in terms of netlist diversity, we train three benchmark-
specific ABC-RL agents on each benchmark suite using, as before, the train-vaidation-test splits in
Table 1. Although trained on each benchmark individually, evaluation of benchmark-specific agents
is on the full test dataset. This has two objectives: 1) Assess how benchmark-specific agents compare
against the benchmark-wide agents on their own test inputs, and 2) Study how ABC-RL’s benchmark-
specific agents adapt when deployed on test inputs from other benchmarks. Benchmark-specific
agents are referred to as ABC-RL+X, where X is the benchmark name (MCNC, ARITH, or RC).
Table 3: Area-delay reduction (in %) obtained using benchmark specific agents (MCNC, ARITH and RC).
MCTS represents tree search adopted in Neto et al. (2022)
In Table 3, we present the performance of ABC-RL using benchmark-specific agents. Notably, ABC-
RL+X agents often outperform the general ABC-RL agent on test inputs from their own benchmark
suites. For example, ABC-RL+MCNC outperforms ABC-RL on 7 of 12 benchmarks. In return, the
performance of benchmark-specific agents drops on test inputs from other benchmarks because these
new netlists are novel for the agent. Nonetheless, our benchmark-specific agents still outperform
the SOTA MCTS approach in geo. mean ADP reduction. In fact, if except ABC-RL, each of our
benchmark-specific agents would still outperform other SOTA methods including MCTS+L. These
results emphasize ABC-RL’s ability to fine-tune α effectively, even in the presence of a substantial
distribution gap between training and test data.
In recent work, Mirhoseini et al. (2021) proposed a pre-trained PPO agent for chip placement. This
problem seeks to place blocks on the chip surface so as to reduce total chip area, wire-length and
congestion. Although the input to chip placement is also a graph, the graph only encodes connectivity
and not functionality. Importantly, an action in this setting, e.g. moving or swapping blocks, is quick,
allowing for millions of actions to be explored. In contrast, for logic synthesis, actions (synthesis
steps) involve expensive functionality-preserving graph-level transformations on the entire design
taking up to 5 minutes for larger designs. To adapt to new inputs, Mirhoseini et al. (2021) adopt a
different strategy: they continue to fine-tune (FT) their agents as they perform search on test inputs.
Here we ask if the FT strategy could work for ABC-RL instead of our retrieval-guided solution.
To test this, we fine-tune ABC-RL’s benchmark-wide agent during online MCTS within our evaluation
budget of 100 synthesis runs. Table 4 compares ABC-RL vs. the new MCTS+L+FT approach. ABC-
8
Published as a conference paper at ICLR 2024
Table 4: Area-delay reduction (in %). ABC-RL−BERT is ABC-RL trained with naive synthesis encoder instead
of BERT. MCTS+L+FT indicate MCTS+Learning with online fine-tuning.
RL outperforms MCTS+L+FT on all but one netlist, and reduces ADP by 2.66%, 2.40% and 0.33%
on MCNC, EPFL, and random control benchmarks, with 9.0% decline on C5 (i9).
We inspect the role of BERT-based recipe encoder in ABC-RL by replacing it with a fixed length
(L = 10) encoder where, using the approach from (Chowdhury et al., 2022), we directly encode the
synthesis commands in numerical form and apply zero-padding for recipe length less than L. The
results are shown in Table 4. ABC-RL reduces ADP by 2.51%, 4.04% and 4.11% on MCNC, EPFL
arithmetic and random control benchmarks, and upto -10.90% decline on R3 (router), compared
to the version without BERT. This shows the importance of transformer-based encoder in extracting
meaningful features from synthesis recipe sub-sequences for state representation.
4 R ELATED WORK
Learning-based approaches for logic synthesis: This can be classified into two sub-categories: 1)
Synthesis recipe classification (Yu et al., 2018; Neto et al., 2019) and prediction (Chowdhury et al.,
2021; 2022) based approaches, and 2) RL-based approaches (Haaswijk et al., 2018; Hosny et al.,
2020; Zhu et al., 2020). Neto et al. (2019) partition the original graph into smaller sub-networks
and performs binary classification on sub-networks to pick which recipes work best. On the other
hand, RL-based solutions Haaswijk et al. (2018); Hosny et al. (2020); Zhu et al. (2020) use online
RL algorithms to craft synthesis recipes, but do not leverage prior data. We show that ABC-RL
outperforms them.
ML for EDA: ML has been used for a range of EDA problems Mirhoseini et al. (2021); Kurin et al.
(2020); Lai et al. (2022; 2023); Schmitt et al. (2021); Yolcu & Póczos (2019); Vasudevan et al. (2021);
Yang et al. (2022). Closer to this work, Mirhoseini et al. (2021) used a deep-RL agent to optimize
chip placement, a different problem, and use the pre-trained agent (with online fine-tuning) to place
the new design. This leaves limited scope for online exploration. Additionally, each move or action in
placement, i.e., moving the x-y co-ordinates of modules in the design, is cheap unlike time-consuming
actions in logic synthesis. Thus placement agents can be fine-tuned with larger amounts of test-time
data relative to ABC-RL which has a constrained online search budget. Our ablation study shows
ABC-RL defeats search combined with fine-tuned agent for given synthesis budget. A related body of
work developed general representations of boolean circuits, for instance, DeepGate Li et al. (2022);
Shi et al. (2023), ConVERTS Chowdhury et al. (2023) and “functionality matters" Wang et al. (2022),
learned on signal probability estimation and functionality prediction, respectively. These embeddings
could enhance the quality of our GCN embeddings and are interesting avenues for future work.
RL and search for combinatorial optimization: Fusing learning and search finds applications
across diverse domains such as branching heuristics (He et al., 2014), Go and chess playing (Silver
et al., 2016; Schrittwieser et al., 2020), traveling salesman (TSP) (Xing & Tu, 2020), and common
subgraph detection (Bai et al., 2021). Each of these problems has unique structure. TSP and common
subgraph detection both have graph inputs like logic synthesis but do not perform transformations on
graphs. Branching problems have tree-structure, but do not operate on graphs. Go and Chess involve
self-play during training and must anticipate opponents. Thus these works have each developed
specialized solutions tailored to the problem domain, as we do with ABC-RL. Further, these previous
works have not identified distribution shift as a problem and operate atleast under the assumption that
train-test state distributions align closely.
9
Published as a conference paper at ICLR 2024
Retrieval guided Reinforcement learning: Recent works (Goyal et al., 2022; Humphreys et al.,
2022) have explored the benefits of retrieval in game-playing RL-agents. However, they implement
retrieval differently: trajectories from prior episodes are retrieved and the entire trajectory is an
additonal input to the policy agent. This also requires the policy-agent to be aware of retrieval during
training. In contrast, our retrieval strategy is lightweight; instead of an entire graph/netlist, we only
retrieve the similarity score from the training dataset and then fix α. In addition, we do not need to
incorporate the retrieval strategy during training, enabling off-the-shelf use of pre-trained RL agents.
ABC-RL already significantly outperforms SOTA methods with this strategy, but the approach might
be beneficial in other settings where online costs are severely constrained.
5 C ONCLUSION
We introduce ABC-RL, a novel methodology that optimizes learning and search through a retrieval-
guided mechanism, significantly enhancing the identification of high-quality synthesis recipes for new
hardware designs. Specifically, tuning the α parameter of the RL agent during MCTS search within
the synthesis recipe space effectively mitigates misguided searches toward unfavorable rewarding
trajectories, particularly when encountering sufficiently novel designs. These core concepts, substan-
tiated by empirical results, underscore the potential of ABC-RL in generating high-quality synthesis
recipes, thereby streamlining modern complex chip design processes for enhanced efficiency.
Reproducibility Statement For reproducibility we provide detailed information regarding method-
ologies, architectures, and settings in Section 3.1. We attach codebase for review. Post acceptance of
our work, we will publicly release it with detailed user’s instruction.
R EFERENCES
Luca Amarú, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli. The epfl combinational
benchmark suite. In Proceedings of the 24th International Workshop on Logic & Synthesis (IWLS),
number CONF, 2015.
Luca Amarú, Patrick Vuillod, Jiong Luo, and Janet Olson. Logic optimization and synthesis: Trends
and directions in industry. In Design, Automation & Test in Europe Conference & Exhibition
(DATE), 2017, pp. 1303–1305. IEEE, 2017.
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphael Marinier,
Léonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, et al. What matters in
on-policy reinforcement learning? a large-scale empirical study. arXiv preprint arXiv:2006.05990,
2020.
Yunsheng Bai, Derek Xu, Yizhou Sun, and Wei Wang. Glsearch: Maximum common subgraph
detection via learning to search. In International Conference on Machine Learning, pp. 588–598.
PMLR, 2021.
Robert Brayton and Alan Mishchenko. ABC: An Academic Industrial-Strength Verification Tool.
In Tayssir Touili, Byron Cook, and Paul Jackson (eds.), Computer Aided Verification, pp. 24–40,
2010.
Robert K Brayton, Gary D Hachtel, Curt McMullen, and Alberto Sangiovanni-Vincentelli. Logic
minimization algorithms for VLSI synthesis, volume 2. Springer Science & Business Media, 1984.
Animesh B Chowdhury, Jitendra Bhandari, Luca Collini, Ramesh Karri, Benjamin Tan, and Siddharth
Garg. ConVERTS: Contrastively learning structurally invariant netlist representations. In 2023
ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD), pp. 1–6. IEEE, 2023.
Animesh Basak Chowdhury, Benjamin Tan, Ramesh Karri, and Siddharth Garg. OpenABC-D:
A large-scale dataset for machine learning guided integrated circuit synthesis. arXiv preprint
arXiv:2110.11292, 2021.
Animesh Basak Chowdhury, Benjamin Tan, Ryan Carey, Tushit Jain, Ramesh Karri, and Siddharth
Garg. Bulls-eye: Active few-shot learning guided logic synthesis. IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, 2022.
10
Published as a conference paper at ICLR 2024
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep
bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
Anirudh Goyal, Abram Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puig-
domenech Badia, Arthur Guez, Mehdi Mirza, Peter C Humphreys, Ksenia Konyushova, et al.
Retrieval-augmented reinforcement learning. In International Conference on Machine Learning,
pp. 7740–7765. PMLR, 2022.
Winston Haaswijk, Edo Collins, Benoit Seguin, Mathias Soeken, Frédéric Kaplan, Sabine Süsstrunk,
and Giovanni De Micheli. Deep learning for logic optimization algorithms. In International
Symposium on Circuits and Systems (ISCAS), pp. 1–4, 2018.
He He, Hal Daume III, and Jason M Eisner. Learning to search in branch and bound algorithms.
Advances in neural information processing systems, 27, 2014.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing
human-level performance on imagenet classification. In Proceedings of the IEEE international
conference on computer vision, pp. 1026–1034, 2015.
Abdelrahman Hosny, Soheil Hashemi, Mohamed Shalan, and Sherief Reda. DRiLLS: Deep rein-
forcement learning for logic synthesis. In Asia and South Pacific Design Automation Conference
(ASP-DAC), pp. 581–586, 2020.
Peter Humphreys, Arthur Guez, Olivier Tieleman, Laurent Sifre, Théophane Weber, and Timothy
Lillicrap. Large-scale retrieval for reinforcement learning. Advances in Neural Information
Processing Systems, 35:20092–20104, 2022.
Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.
arXiv preprint arXiv:1609.02907, 2016.
Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. In Machine Learning:
ECML 2006: 17th European Conference on Machine Learning Berlin, Germany, September 18-22,
2006 Proceedings 17, pp. 282–293. Springer, 2006.
Vitaly Kurin, Saad Godil, Shimon Whiteson, and Bryan Catanzaro. Can q-learning with graph
networks learn a generalizable branching heuristic for a sat solver? Advances in Neural Information
Processing Systems, 33:9608–9621, 2020.
Yao Lai, Yao Mu, and Ping Luo. Maskplace: Fast chip placement via reinforced visual representation
learning. Advances in Neural Information Processing Systems, 35:24019–24030, 2022.
Yao Lai, Jinxin Liu, Zhentao Tang, Bin Wang, Jianye Hao, and Ping Luo. Chipformer: Transferable
chip placement via offline decision transformer. arXiv preprint arXiv:2306.14744, 2023.
Min Li, Sadaf Khan, Zhengyuan Shi, Naixing Wang, Huang Yu, and Qiang Xu. Deepgate: Learning
neural representations of logic gates. In Proceedings of the 59th ACM/IEEE Design Automation
Conference, pp. 667–672, 2022.
Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang,
Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi, et al. A graph placement methodology
for fast chip design. Nature, 594(7862):207–212, 2021.
Alan Mishchenko, Satrajit Chatterjee, and Robert Brayton. DAG-aware aig rewriting: A fresh look at
combinational logic synthesis. In Design Automation Conference (DAC), pp. 532–535, 2006.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare,
Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control
through deep reinforcement learning. nature, 518(7540):529–533, 2015.
Walter Lau Neto, Max Austin, Scott Temple, Luca Amaru, Xifan Tang, and Pierre-Emmanuel
Gaillardon. LSOracle: a logic synthesis framework driven by artificial intelligence: Invited paper.
In International Conference on Computer-Aided Design (ICCAD), pp. 1–6, 2019.
11
Published as a conference paper at ICLR 2024
Walter Lau Neto, Yingjie Li, Pierre-Emmanuel Gaillardon, and Cunxi Yu. Flowtune: End-to-
end automatic logic optimization exploration via domain-specific multi-armed bandit. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022.
Heinz Riener, Eleonora Testa, Winston Haaswijk, Alan Mishchenko, Luca Amarù, Giovanni
De Micheli, and Mathias Soeken. Scalable generic logic synthesis: One approach to rule them all.
In Proceedings of the 56th Annual Design Automation Conference 2019, pp. 1–6, 2019.
Frederik Schmitt, Christopher Hahn, Markus N Rabe, and Bernd Finkbeiner. Neural circuit synthesis
from specification patterns. Advances in Neural Information Processing Systems, 34:15408–15420,
2021.
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon
Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari,
go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
Zhengyuan Shi, Hongyang Pan, Sadaf Khan, Min Li, Yi Liu, Junhua Huang, Hui-Ling Zhen,
Mingxuan Yuan, Zhufei Chu, and Qiang Xu. Deepgate2: Functionality-aware circuit representation
learning. arXiv preprint arXiv:2305.16373, 2023.
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche,
Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering
the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez,
Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. Mastering chess and shogi
by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815,
2017.
Shailja Thakur, Baleegh Ahmad, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh
Karri, and Siddharth Garg. Verigen: A large language model for verilog code generation. arXiv
preprint arXiv:2308.00708, 2023.
Shobha Vasudevan, Wenjie Joe Jiang, David Bieber, Rishabh Singh, C Richard Ho, Charles Sut-
ton, et al. Learning semantic representations to verify hardware designs. Advances in Neural
Information Processing Systems, 34:23491–23504, 2021.
Ziyi Wang, Chen Bai, Zhuolun He, Guangliang Zhang, Qiang Xu, Tsung-Yi Ho, Bei Yu, and
Yu Huang. Functionality matters in netlist representation learning. In Proceedings of the 59th
ACM/IEEE Design Automation Conference, pp. 61–66, 2022.
Zhihao Xing and Shikui Tu. A graph neural network assisted monte carlo tree search approach to
traveling salesman problem. IEEE Access, 8:108418–108428, 2020.
Saeyang Yang. Logic synthesis and optimization benchmarks user guide: version 3.0. Citeseer, 1991.
Wenlong Yang, Lingli Wang, and Alan Mishchenko. Lazy man’s logic synthesis. In Proceedings of
the International Conference on Computer-Aided Design, pp. 597–604, 2012.
Zhihao Yang, Dong Li, Yingxueff Zhang, Zhanguang Zhang, Guojie Song, Jianye Hao, et al. Versatile
multi-stage graph neural network for circuit representation. Advances in Neural Information
Processing Systems, 35:20313–20324, 2022.
Emre Yolcu and Barnabás Póczos. Learning local search heuristics for boolean satisfiability. Advances
in Neural Information Processing Systems, 32, 2019.
Cunxi Yu. Flowtune: Practical multi-armed bandits in boolean optimization. In International
Conference On Computer Aided Design (ICCAD), pp. 1–9, 2020.
Cunxi Yu, Houping Xiao, and Giovanni De Micheli. Developing synthesis flows without human
knowledge. In Design Automation Conference (DAC), pp. 1–6, 2018.
Keren Zhu, Mingjie Liu, Hao Chen, Zheng Zhao, and David Z. Pan. Exploring logic optimizations
with reinforcement learning and graph convolutional network. In Workshop on Machine Learning
for CAD (MLCAD), pp. 145–150, 2020.
12
Published as a conference paper at ICLR 2024
A A PPENDIX 1
A.1 L OGIC S YNTHESIS
Logic synthesis transforms a hardware design in register transfer level (RTL) to a Boolean gate-level
network, optimizes the number of gates/depth, and then maps it to standard cells in a technology
library Brayton et al. (1984). Well-known representations of Boolean networks include sum-of-
product form, product-of-sum, Binary decision diagrams, and AIGs which are a widely accepted
format using only AND (nodes) and NOT gates (dotted edges). Several logic minimization heuristics
(discussed in Section A.2)) have been developed to perform optimization on AIG graphs because of its
compact circuit representation and directed acyclic graph (DAG)-based structuring. These heuristics
are applied sequentially (“synthesis recipe”) to perform one-pass logic optimization reducing the
number of nodes and depth of AIG. The optimized network is then mapped using cells from
technology library to finally report area, delay and power consumption.
We now describe optimization heuristics provided by industrial strength academic tool ABC Brayton
& Mishchenko (2010):
1. Balance (b) optimizes AIG depth by applying associative and commutative logic function tree-
balancing transformations to optimize for delay.
2. Rewrite (rw, rw -z) is a directed acyclic graph (DAG)-aware logic rewriting technique that
performs template pattern matching on sub-trees and encodes them with equivalent logic functions.
3. Refactor (rf, rf -z) performs aggressive changes to the netlist without caring about logic sharing.
It iteratively examines all nodes in the AIG, lists out the maximum fan-out-free cones, and replaces
them with equivalent functions when it improves the cost (e.g., reduces the number of nodes).
4. Re-substitution (rs, rs -z) creates new nodes in the circuit representing intermediate functionalities
using existing nodes; and remove redundant nodes. Re-substitution improves logic sharing.
The zero-cost (-z) variants of these transformation heuristics perform structural changes to the
netlist without reducing nodes or depth of AIG. However, previous empirical results show circuit
transformations help future passes of other logic minimization heuristics reduce the nodes/depth and
achieve the minimization objective.
We discuss in detail the MCTS algorithm. During selection, a search tree is built from the current
state by following a search policy, with the aim of identifying promising states for exploration.
where QkM CT S (s, a) denotes estimated Q value (discussed next) obtained after taking action a from
state s during the k th iteration of MCTS simulation. UM
k
CT S (s, a) represents upper confidence tree
(UCT) exploration factor of MCTS search.
s
k
P
k log a NM CT S (s, a)
UM CT S (s, a) = cUCT k
, (5)
NM CT S (s, a)
k
NM CT S (s, a) denotes the visit count of the resulting state after taking action a from state s. cU CT
denotes a constant exploration factor Kocsis & Szepesvári (2006).
The selection phase repeats until a leaf node is reached in the search tree. A leaf node in MCTS tree
denotes either no child nodes have been created or it is a terminal state of the environment. Once
a leaf node is reached the expansion phase begins where an action is picked randomly and its roll
out value is returned or R(sL ) is returned for the terminal state sL . Next, back propagation happens
where all parent nodes Qk (s, a) values are updated according to the following equation.
k
NM CT S (s,a)
X
QkM CT S (s, a) = i
RM k
CT S (s, a)/NM CT S (s, a). (6)
i=1
13
Published as a conference paper at ICLR 2024
As discussed in Section 2.3, we pre-train an agent using available past data to help with choosing
which logic minimization heuristic to add to the synthesis recipe. The process is shown as Algorithm 1.
B N ETWORK ARCHITECTURE
B.1 AIG N ETWORK ARCHITECTURE
AIG encoding in ABC: An AIG graph is a directed acyclic graph representing the circuit’s boolean
functionality. We read in the same AIG format introduced in Mishchenko et al. (2006) and commonly
used in literature: nodes in the AIG represent AND gates, Primary Inputs (PIs) or Primary Outputs
(POs). On the other hand, NOT gates are represented by edges: dashed edges represent NOT gates
(i.e., the output of the edge is a logical negation of its input) and solid edges represent a simple wire
whose output equals its input.
GCN-based AIG embedding: Starting with a graph G = (V, E) that has vertices V and edges E, the
GCN aggregates feature information of a node with its neighbors’ node information. The output is then
normalized using Batchnorm and passed through a non-linear LeakyReLU activation function.
This process is repeated for k layers to obtain information for each node based on information from
its neighbours up to a distance of k-hops. A graph-level READOUT operation produces a graph-level
embedding. Formally:
X hik−1
hku = σ(Wk p p + bk ), k ∈ [1..K] (7)
i∈u∪N (u)
N (u) × N (v)
hG = READOU T ({hku ; u ∈ V })
Here, the embedding for node u, generated by the k th layer of the GCN, is represented by hku .The
parameters Wk and bk are trainable, and σ is a non-linear ReLU activation function. N (·) denotes
the 1-hop neighbors of a node. The READOUT function combines the activations from the k th layer
of all nodes to produce the final output by performing a pooling operation.
Each node in the AIG read in from ABC is translated to a node in our GCN. For the initial embeddings,
h0u , We use two-dimensional vector to encode node-level features: (1) node type (AND, PI, or PO)
and (2) number of negated fan-in edges Chowdhury et al. (2021; 2022). we choose k = 3 and global
average and max pooling concatenated as the READOUT operation.
Architectural choice of GNN: We articulate our rationale for utilizing a simple Graph Convolutional
Network (GCN) architecture to encode AIGs for the generation of synthesis recipes aimed at
optimizing the area-delay product. We elucidate why this approach is effective and support our
argument with an experiment that validates its efficacy:
14
Published as a conference paper at ICLR 2024
Several recent GNN-based architectures Li et al. (2022); Shi et al. (2023) have been proposed
recently to capture functionality of AIG-based hardware representations. This remains an active
exploration direction to further enhance the benefits of ABC-RL to distinguish structurally similar
yet substantially varying in functionality space.
C E XPERIMENTAL DETAILS
C.1 R EWARD NORMALIZATION
In our work, maximizing QoR entails finding a recipe P which is minimizing the area-delay product
of transformed AIG graph. We consider as a baseline recipe an expert-crafted synthesis recipe
resyn2 Mishchenko et al. (2006) on top of which we improve our ADP.
(
ADP (S(G,P ))
1− ADP (S(G,resyn2)) ADP (S(G, P )) < 2 × ADP (S(G, P )),
R=
−1 otherwise.
We present the characterization of circuits used in our dataset. This data provides a clean picture on
size and level variation across all the AIGs.
15
Published as a conference paper at ICLR 2024
Table 5: Benchmark characterization: Primary inputs, outputs, number of nodes and level of AIGs
16
Published as a conference paper at ICLR 2024
25.0 17.5 14 50
22.5 15.0 12
Reduction (%)
20.0 12.5 10 40
17.5 8
15.0 10.0 30
12.5 7.5 6 20
10.0 5.0 4
7.5 2.5 2 10
5.0 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
Iterations Iterations Iterations Iterations
(a) b9 (b) apex2 (c) prom1 (d) i9
20.0 15.0 22.5 25
17.5 12.5 20.0 20
Reduction (%)
Figure 7: Area-delay product reduction (in %) compared to resyn2 on MCNC circuits. GREEN:
SA+Pred. Chowdhury et al. (2022), BLUE: MCTS Neto et al. (2022), RED: ABC-RL
D R ESULTS
D.1 P ERFORMANCE OF ABC-RL AGAINST PRIOR WORKS AND BASELINE MCTS+L EARNING
Figure 7 plots the ADP reductions over search iterations for MCTS, SA+Pred, and ABC-RL. In m4,
ABC-RL’s agent explores paths with higher rewards whereas standard MCTS continues searching
without further improvement. A similar trend is observed for prom1 demonstrating that a pre-trained
agent helps bias search towards better parts of the search space. SA+Pred. Chowdhury et al. (2022)
also leverages past history, but is unable to compete (on average) with MCTS and ABC-RL in part
because SA typically underperforms MCTS on tree-based search spaces. Also note from Figure 5
that ABC-RL in most cases achieves higher ADP reductions earlier than competing methods (except
pair). This results in significant geo. mean run-time speedups of 2.5× at iso-QoR compared to
standard MCTS on MCNC benchmarks.
16 25
35 50 14
12 20
Reduction (%)
30 40
25 30 10 15
8
20 20 6 10 MCTS
15 10 4 SA+Pred.
2 5 ABC-RL
20 40 60 80 100 0 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
Iterations Iterations Iterations Iterations
(a) bar (b) div (c) square (d) sqrt
Figure 8: Area-delay product reduction (in %) compared to resyn2 on EPFL arithmetic benchmarks. GREEN:
SA+Pred. Chowdhury et al. (2022), BLUE: MCTS Neto et al. (2022), RED: ABC-RL
17
Published as a conference paper at ICLR 2024
20 30 40 30
18 35 28
30
Reduction (%)
16 25 26
14 25 24
12 20 20 22
10 15 20 MCTS
8 15 10 18 SA+Pred.
6 5 16 ABC-RL
4 20 40 60 80 100
10 0
Iterations 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
Iterations Iterations Iterations
(a) cavlc (b) mem_ctrl (c) router (d) voter
Figure 9: Area-delay product reduction (in %) compared to resyn2 on EPFL random control benchmarks. On
cavlc and router, ABC-RL perform better than MCTS where baseline MCTS+Learning under-perform.
GREEN: SA+Pred. Chowdhury et al. (2022), BLUE: MCTS Neto et al. (2022), RED: ABC-RL.
ABC-RL+MCNC agent: For 6 out of 12 MCNC benchmarks, ABC-RL guided by the MCNC agent
demonstrated improved performance compared to the benchmark-wide agent. This suggests that
the hyper-parameters (δth and T ) derived from the validation dataset led to optimized α values for
MCNC benchmarks. However, the performance of the MCNC agent was comparatively lower on
EPFL arithmetic and random control benchmarks.
ABC-RL+ARITH agent: Our EPFL arith agent resulted in better ADP reduction compared to
benchmark-wide agent only on A4(sqrt). This indicate that benchmark-wide agent is able to learn
more from diverse set of benchmarks resulting in better ADP reduction. On MCNC benchmarks, we
observe that ARITH agent performed the best amongst all on C6(m4) and C10 (c7552) because
these are arithmetic circuits.
ABC-RL+RC agent: Our RC agent performance on EPFL random control benchmarks are not that
great compared to benchmark-wide agent. This is primarily because of the fact that EPFL random
control benchmarks have hardware designs performing unique functionality and hence learning from
history doesn’t help much. But, ABC-RL ensures that performance don’t detoriate compared to pure
MCTS.
MCNC Benchmarks: In Fig. 10, we depict the performance comparison among MCTS+finetune
agent, ABC-RL, and pure MCTS. Remarkably, ABC-RL outperforms MCTS+finetune on 11 out of
12 benchmarks, approaching MCTS+finetune’s performance on b9. A detailed analysis of circuits
where MCTS+finetune performs worse than pure MCTS (i9, m4, pair, c880, max1024,
and c7552) reveals that these belong to 6 out of 8 MCNC designs where MCTS+learning performs
suboptimally compared to pure MCTS. This observation underscores the fact that although finetuning
contributes to a better geometric mean over MCTS+learning (23.3% over 20.7%), it still falls short
on 6 out of 8 benchmarks. For the remaining two benchmarks, alu4 and apex4, MCTS+finetune
performs comparably to pure MCTS for alu4 and slightly better for apex4. Thus, ABC-RL
emerges as a more suitable choice for scenarios where fine-tuning is resource-intensive, yet we seek
a versatile agent capable of appropriately guiding the search away from unfavorable trajectories.
EPFL Benchmarks: In Fig. 11 and 12, we present the performance comparison with MCTS+finetune.
Notably, for designs bar and div, MCTS+finetune achieved equivalent ADP as ABC-RL, main-
taining the same iso-QoR speed-up compared to MCTS. These designs exhibited strong perfor-
mance with baseline MCTS+Learning, thus aligning with the expectation of favorable results with
MCTS+finetune. On square, MCTS+finetune nearly matched the ADP reduction achieved by pure
MCTS. This suggests that fine-tuning contributes to policy improvement from the pre-trained agent,
resulting in enhanced performance compared to baseline MCTS+Learning. In the case of sqrt,
MCTS+finetune approached the performance of ABC-RL. Our fine-tuning experiments affirm its
ability to correct the model policy, although it require more samples to converge towards ABC-RL
performance.
18
Published as a conference paper at ICLR 2024
20 17.5
20.0 18 14
17.5 15.0
Reduction (%) 15.0 16 12.5 12
10
12.5 14 10.0 8
10.0 12 7.5 MCTS
7.5 5.0 6 MCTS+finetune
5.0 10 2.5 4 ABC-RL
20 40 60 80 100 8 20 40 60 80 100 20 40 60 80 100 2 20 40 60 80 100
Iterations Iterations Iterations Iterations
(a) alu4 (b) apex1 (c) apex2 (d) apex4
22 14 55
24 20 50
12
Reduction (%)
22 18 10 45
16 40
20 14 8 35
12 6 30
18 10 4 25
8 2 20
16 6
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 15 20 40 60 80 100
Iterations Iterations Iterations Iterations
(e) b9 (f) c880 (g) prom1 (h) i9
20 16 25
14 22.5
12
Reduction (%)
15 20.0 20
10 17.5
10 8 15.0 15
6 12.5
5 4 10
2 10.0
0 0 7.5 5
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
Iterations Iterations Iterations Iterations
(i) m4 (j) pair (k) max1024 (l) c7552
Figure 10: Area-delay product reduction (in %) compared to resyn2 on MCNC benchmarks. YELLOW:
MCTS+Finetune, BLUE: MCTS Neto et al. (2022), RED: ABC-RL
16 25.0
35 50 14 22.5
Reduction (%)
30 40 12 20.0
10 17.5
25 30 8 15.0
20 20 6 12.5 MCTS
MCTS+finetune
4 10.0 ABC-RL
15 10 2 7.5
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
Iterations Iterations Iterations Iterations
(a) bar (b) div (c) square (d) sqrt
Figure 11: Area-delay product reduction (in %) compared to resyn2 on EPFL arithmetic benchmarks. YELLOW:
MCTS+Finetune, BLUE: MCTS Neto et al. (2022), RED: ABC-RL
20 30 40 30
18 35
30 28
Reduction (%)
16 25
14 25 26
12 20 20 24
10 15 MCTS
8 15 10 22 MCTS+finetune
6 5 20 ABC-RL
4 20 40 60 80 100 10 20 40 60 80 100
0
20 40 60 80 100 20 40 60 80 100
Iterations Iterations Iterations Iterations
(a) cavlc (b) mem_ctrl (c) router (d) voter
Figure 12: Area-delay product reduction (in %) compared to resyn2 on EPFL random control benchmarks.
YELLOW: MCTS+FT, BLUE: MCTS Neto et al. (2022), RED: ABC-RL
19
Published as a conference paper at ICLR 2024
Next, we report nearest neighbor retrieval performance of ABC-RL which is a key mechanism in
setting α to tune pre-trained agent’s recommendation during MCTS search. We report similarity
score which is the cosine distance between test AIG and nearest neighbour retrieved via similarity.
We also report the training circuit which the test AIG is closest to. Based on our validation dataset,
we set T = 100 and δth = 0.007.
Table 6: Similarity score (×10−2 ) of nearest neighbour retrieved using ABC-RL for test designs. Nearest
neighbour denotes the training design closest to test-time design
ABC-RL uses BERT-based recipe encoder to extract two meaningful information: 1) Contextual
relationship between current synthesis transformations and previous ones and 2) The synthesis
transformations which needs more attention compared to others depending on its position. For e.g.
rewrite operation at the start of a synthesis recipe tend to optimize more number of nodes in AIG
compared to its position later in the recipe Yu (2020); Neto et al. (2022). Similarly, transformations
like balance are intended towards reducing delay of the design wheras transformations like
rewrite, refactor, resub are intended towards area optimization. Thus, selective attention
and placement of their positions with respect to other synthesis transformations needs to be learned
which makes BERT an ideal choice to encode synthesis recipe. As part of additional ablation study,
we encode our synthesis recipe with LSTM network with input sequence length (L = 10) and apply
zero-padding for recipe length less than L.
Table 7: Area-delay reduction (in %). ABC-RL−BERT is ABC-RL trained with naive synthesis encoder instead
of BERT. MCTS+L+FT indicate MCTS+Learning with online fine-tuning.
20
Published as a conference paper at ICLR 2024
We now present wall-time comparison of ABC-RL versus SOTA methods on test designs for 100
iterations. We note that for all online search schemes, the runtime is dominated by that the number
of online synthesis runs (typically 9.5 seconds per run) as opposed to inference cost of the deep
network (e.g. 11 milli-seconds for ABC-RL) . Thus, as observed in the table below, ABC-RL
runtime is within 1.5% of MCTS and SA+Pred. Table 8 presents wall-time comparison of ABC-RL
versus existing SOTA methods for 100 iterations. Overall, ABC-RL run time overhead over MCTS
and SA+Pred. (over 100 iterations) has a geometric mean of 1.51% and 2.09%, respectively. In terms
of wall-time, ABC-RL achieves 3.75x geo. mean iso-QoR speed-up.
Table 8: Wall-time overhead of ABC-RL over SOTA methods for 100 iterations (Budget: 100 synthesis runs).
We report iso-QoR wall-time speed-up with respect to baseline MCTS Neto et al. (2022).
Next, we present ABC-RL performance on training and validation circuits and compare it with
baseline MCTS. For training circuits, ABC-RL sets α = 0 indicating the search with augmented with
full recommendation from pre-trained agent. For validation circuits, ABC-RL sets α and performs
search with tuned α recommendation from pre-trained agent.
21
Published as a conference paper at ICLR 2024
Table 9: Area-delay reduction compared to resyn2 on MCNC training and validation circuits. We compare
results of MCTS and ABC-RL approach.
Table 10: Area-delay reduction over resyn2 on EPFL arithmetic (left) and random control (right) training and
validation benchmarks using MCTS and ABC-RL
22