Li et al. - 2022 - DeepGate learning neural representations of logic gates
Li et al. - 2022 - DeepGate learning neural representations of logic gates
667
1 2 3 wherein 𝑁 (𝑣) denotes neighboring nodes of node 𝑣 and 𝐿 is the
6
number of layers. The parameterized function AGGREGATEℓ ag-
3 gregates messages from neighboring nodes 𝑁 (𝑣), and COMBINEℓ
4 8 5 4
2 obtains an updated hidden state after aggregation. Finally, the func-
tion READOUTℓ retrieves the states of all nodes V and produces
7 7 6
1 5 the graph neural representation. A notable GNN architecture is the
graph attention network (GAT) [23] that considers the importance
8 of different neighbors during aggregation.
Circuit DAG Directed acyclic graphs (DAGs) are a special type of graphs, yet
broadly seen across many domains, including circuit modeling (see
Figure 1: The Circuit Representation as DAG
Fig. 1). Recently, few studies have been dedicated to DAG-GNN
designs [21, 27], which propagate the message following the topo-
Last but most important, existing GNN models are general solu- logical ordering between nodes and only consider the predecessors
tions designed to extract information from all kinds of graphs, while in the AGGREGATEℓ function, as demonstrated in Equation (3).
circuit graphs are a unique type of graph with logic relationships
between nodes. In this work, we design a dedicated GNN model for hℓ𝑣 = COMBINEℓ (hℓ𝑣−1 , AGGREGATEℓ ( {h𝑢ℓ |𝑢 ∈ P (𝑣) })), ℓ = 1, ..., 𝐿 (3)
circuit graphs, significantly enhancing the learning effectiveness.
The major difference between Eq. (3) and Eq. (1) is that in DAG-
We summarize the contributions of this work as follows:
GNN, the aggregation function for 𝑣 will be only executed after all
• To the best of our knowledge, DeepGate is the first work for of its predecessors’ hidden states have already been computed.
the general and effective circuit representation learning prob- Besides stacking 𝐿 layers to increase the depth of the network,
lem. Specially, we propose a novel design flow to tackle this one can also apply the same model for 𝑇 times in the recurrent
problem: (i). circuit transformation in AIG form; (ii). supervi- fashion to generate the final embedding [3]:
sion with logic-simulated probabilities; (iii). representation
learning with a dedicated GNN model for circuit graphs. h𝑡𝑣 = COMBINE(h𝑡𝑣−1 , AGGREGATE( {h𝑢𝑡 |𝑢 ∈ P (𝑣) })), 𝑡 = 1, ...,𝑇 (4)
• We propose a novel GNN model for circuit graphs that ex- Using the taxonomy defined in [25], we name the two variants
ploits unique circuit properties, including attention mech- of DAG-GNNs described in Equations (3)–(4) as DAG-ConvGNNs
anisms that mimic the logic computation procedure and and DAG-RecGNNs, respectively.
reversed propagation layers that consider logic implication
effects. 2.2 GNN-Based Solutions for EDA Problems
• Reconvergence structures are inevitable due to logic sharing Existing GNN-based EDA solutions use an end-to-end flow for
in multi-level logic networks, and they are the main chal- specific EDA tasks wherein the labels are usually extracted from
lenges for logic analysis [17]. We treat them as first-class commercial EDA tools.
citizen and introduce novel solutions in our GNN model. To the best of our knowledge, the first GNN-based EDA tech-
We learn the representations of logic gates with many small sub- nique is applied to the test point insertion (TPI) problem, which is
circuits extracted from benchmark circuits. Experimental results formulated as a node binary classification problem and solved with
performed on large circuits show the efficacy and generalization a graph convolutional network [16]. The ground-truth labels are
capability of DeepGate. We organize the remainder of this paper as collected from commercial TPI tools, revealing whether a partic-
follows. We review related works in Section 2. Section 3 introduces ular node is "easy to observe" or not. CongestionNet [15] models
the DeepGate architecture, while in Section 4, we present the ex- the circuit as an undirected graph and trains a GAT model to pre-
perimental results on various circuits. Finally, Section 5 concludes dict the congestion of the final physical design on a per-cell basis.
this paper. GRANNITE [28] conducts power estimation using a DAG-GNN
model. Gate netlists are mapped onto graphs with per-node (gate)
2 RELATED WORKS and per-edge (net) features. They achieved good accuracy (less than
5.5% error across a diverse set of benchmarks) for fast (<1 second)
2.1 Graph Neural Networks average power estimation on designs up to 50k gate. Recently, [26]
Graph neural networks [10, 14] have received a lot of attention for proposes a GAT-based model named 𝑁 𝑒𝑡 2 for pre-placement net
their effectiveness in modeling non-structured data. By learning vec- length estimation.
torial representations on graphs via feature propagation and aggre- To solve a particular EDA problem, the above techniques typi-
gation, GNNs show convincing results in various domains [9, 12, 24]. cally pre-compute many node/edge features (e.g., SCOAP testability
The most popular GNN model employs a message-passing neural measures in [16]) and use existing GNN models to aggregate these
network architecture, which computes representation/hidden states features for solution findings. Consequently, the learned node fea-
hℓ𝑣 for node 𝑣 in a graph G at every layer ℓ and a final graph repre- tures cannot be transferred among related tasks, despite using the
sentation h G , as in [9]: same circuit graphs as inputs. More importantly, an effective rep-
resentation for circuits should be aware of their logic functions.
hℓ𝑣 = COMBINEℓ (hℓ𝑣−1 , AGGREGATEℓ ( {h𝑢ℓ −1 |𝑢 ∈ N (𝑣) })), ℓ = 1, ., 𝐿 (1)
However, existing solutions ignore it and only consider the struc-
hG = READOUT( {h𝐿𝑣 , 𝑣 ∈ V }) (2) tural information in their learning procedure.
668
Simulated
Logic Simulation Probability
𝑦1 , …, 𝑦5
𝑥2 | ℎ20
𝑥1 |ℎ10 ℎ1𝑇
𝑥1 | ℎ10 Forward Reverse
𝑦1 1 2 𝑦2 Layer Layer 𝑦ො 1 𝑦ො 2
𝑥2 |ℎ20 ℎ2𝑇
𝑥3 | ℎ30 𝑥4 | ℎ40
Mapping to Logic 𝑥3 |ℎ30 ℎ3𝑇 𝑦ො 3 𝑦ො 4
𝑦 3
3
4 𝑦4
AIG Optimization
𝑥4 |ℎ40 ℎ4𝑇
5 𝑥5 | ℎ50 𝑦ො 5
𝑦5 𝑥5 |ℎ50 ℎ5𝑇
DeepGate Model Regressor Predicted
Optimized Values
Circuits Collection Circuit Graphs
Circuits Collection Initialized Node
with labels
Vector Embeddings
Motivated by the above, we propose to learn a general and ef- There are many possibilities to annotate a circuit, e.g., the satis-
fective circuit representation without pre-computing any specific fiability of the circuit [3]. However, a good supervision task should
features, as detailed in the following section. satisfy the following condition: the labels should be easily obtained
while retaining rich information for both the logic function and
the structural information of the circuits. In DeepGate, we propose
3 PROPOSED SOLUTION
to use the signal probability on every node as supervision, which
3.1 Overview of DeepGate satisfies the above requirements: (i). It is relatively easy to obtain
Figure 2 presents the overview of the proposed DeepGate solution, highly-accurate probability values by running logic simulations on
consisting of two stages for the neural representation learning of many random input patterns, especially when the circuit size is
logic gates: limited; (ii). A unique yet important property of logic circuits that
makes circuit analysis challenging is the reconvergence structures,
• Circuit Data Preparation: Given a pool of circuit designs, and logic simulation is arguably the only way to obtain the actual
we use logic synthesis tools to transform them into a uni- value for such structures; (iii). The logic probability of each gate
fied AIG format. We then perform logic simulations on the itself plays an essential role in many EDA tasks.
circuits with sufficient random patterns to obtain the sig-
nal probability (i.e., the probability of node being logic ‘1‘) 3.3 GNN Model in DeepGate
on every node as supervision. We elaborate the details in
Given circuit graphs in AIG form, the objective of our GNN model
Section 3.2.
is to estimate the probability of every node such that it would be
• Probabilities Prediction with DeepGate: Given a circuit as close to the genuine signal probability as possible. Different
dataset and the logic-simulated probabilities as the supervi- from existing DAG-ConvGNNs [21, 27] and DAG-RecGNNs [3]
sion task, we introduce a novel GNN model dedicated for models that focus on learning the topological information in the
circuit graph analysis to learn the neural representations of graph, DeepGate is designed to learn both the circuit structural
logic gates, as detailed in Section 3.3. information and the computational behaviour of logic circuits, and
embed them as vectors on every logic gate.
We now elaborate on the detailed GNN model design in Deep-
3.2 Circuit Data Preparation
Gate. Given a circuit graph G, we embed the gate type of each
Some circuits are at the register-transfer level, while others are gate- node 𝑣 with one-hot encoding in x𝑣 . To be specific, as only primary
level netlists mapped with various libraries. Such heterogeneity inputs (PIs), And gates and Not gates are present in AIGs, we assign
across circuits is a challenge for GNN model development. To tackle a 3-d vector for each node according to its gate type. It should be
this problem, we resort to the logic synthesis tool ABC [5] and noted that instead of relying on the probability-based measure-
transform all circuits into the unified AIG format. If the original ments in previous works [16, 28], our model only requires gate type
circuit is too large, we extract small sub-circuits with circuit sizes information for the representation learning. We also have hidden
ranging from 30 to 3𝑘 gates. Note that, we test the effectiveness of states h𝑣 for every node, which is initialized randomly. Given these,
DeepGate on much larger circuits for its generalization capabilities. DeepGate resorts to attention-based aggregation design [21, 23]
The benefits of such circuit pre-processing flow include: (i). Only and the gated recurrent unit (GRU) [27] as the update function.
two logic gate types (i.e., 2-input AND gate and 1-input Not gate)
are considered, which would dramatically reduce representation Aggregation. We use the attention mechanism in the additive
learning difficulty; (ii). Applying logic synthesis introduces strong form to instantiate the AGGREGATE function in Equation (3),
relational inductive bias into the resulting circuit graphs; (iii). The wherein the aggregated message m𝑡𝑣 for a node 𝑣 at 𝑡 𝑡ℎ iteration is
constraint on circuit size facilitates efficient GNN training with computed by:
both reduced sizes of circuit graphs and less time for preparing
∑︁
m𝑡𝑣 = 𝑡
𝛼𝑢𝑣 h𝑢𝑡 𝑡
where 𝛼𝑢𝑣 = 𝑠𝑜 𝑓 𝑡𝑚𝑎𝑥 (𝑤1⊤ h𝑡𝑣−1 + 𝑤2⊤ h𝑢𝑡 ) (5)
supervision labels. 𝑢∈P (𝑣) 𝑢∈P (𝑣)
669
1 2 3 1 2 3 3.4 Skip Connection for Reconvergence
Fan-out
Node
Structure with Positional Encoding
5 4 5 4
In the previous section we have described the core components
Reconvergence 7
of DeepGate necessary for predicting the logic probabilities of
6 7 6
Node nodes. However, the logic inference on reconvergence nodes is
Skip
8 Connection 8
different from normal nodes and such structures are inevitable due
to logic sharing in multi-level logic networks. Hence, they are the
main challenge for logic probability analysis. To accommodate their
Figure 3: Information Propagation at Reconvergence Node
impact, we introduce the improvement into DeepGate to enable
special processing for reconvergence nodes as shown in Figure 3.
Firstly, we maintain the information of reconvergence nodes
𝑡 is a weighting coefficient that is computed by following during circuit data preparation, including its corresponding source
where 𝛼𝑢𝑣
fan-out node, and the logic level difference between the source
the query-key design as in usual attention mechanisms. To be spe-
nodes and reconvergence nodes. Secondly, we add direct edges
cific, h𝑡𝑣−1 serves as query, and representation of predecessors from
between the fan-out node and the reconvergence node, named skip
current iteration 𝑡, h𝑢𝑡 , serves as key. The intuition behind using
connection here. The new edges facilitate the information exchange
the attention mechanism for aggregation is that when we do the
from fan-out nodes to reconvergence nodes. Last but not least, we
logic computation in digital circuits, the controlling value of a logic
leverage the positional encoding technique [22] to differentiate the
gate determines the output of that gate. Therefore, controlling val-
skip connection and the normal connection. To be specific, the func-
ues are far more important than non-controlling values. To mimic
tion 𝛾 (𝐷) is a mapping of logic level difference 𝐷 between source
this behaviour, the attention mechanism can learn to assign high
fan-out node and reconvergence node into a higher dimensional
weights for controlling inputs of gates and give less importance to
space R2𝐿 :
the rest of the inputs.
Combine. We then use the GRU to instantiate the COMBINE 𝛾 (𝐷) = (sin(20 𝜋𝐷), cos(20 𝜋𝐷), · · · , sin(2𝐿−1 𝜋𝐷), cos(2𝐿−1 𝜋𝐷)) (7)
function for updating the hidden state of target node 𝑣:
The impact of the fanout node on the reconvergence nodes depends
h𝑡𝑣 = 𝐺𝑅𝑈 ([m𝑡𝑣 , x𝑣 ], h𝑡𝑣−1 ) (6) upon the distance between them. Generally speaking, the longer
the distance is, the lesser impact it has on the reconvergence node.
wherein m𝑡𝑣 , x𝑣 are concatenated together and treated as input, The above function can induce the knowledge of how much fanout
while h𝑡𝑣−1 is the past state of GRU. node can impact the result of reconvergence node into the model.
On the one hand, DeepGate adopts the recursive DAG-GNNs We assign the encoded vector as the edge attributes to skip connec-
functional defined in Equation (4). The reasons for using the re- tion and incorporate it into the coefficient calculation described in
current architecture are two-fold: (i). it is unrealistic for GNNs to Equation (5) as the third input.
capture the circuit’s functional and structural information with a
single forward propagation; (ii). the recurrent learning procedure
4 EXPERIMENTS
facilitates reaching stabilized node embeddings quickly.
On the other hand, our proposed GNN model differs from pre- 4.1 Datasets
vious DAG-GNNs [3, 21, 27] that initialize h0𝑣 as x𝑣 and treat the We extract many sub-circuits from four circuit benchmark suites:
aggregated message as the state of recurrent function. In contrast, ITC’99 [7], IWLS’05 [1], EPFL [2] and OpenCore [20], and follow the
we fix the gate type information of nodes x𝑣 as inputs for all iter- circuit data preparation flow described in Section 3.2 to transform
ations. Such employment can avoid the information vanishing of all circuits into a unified AIG format. We conduct logic simulations
gate properties during the long-term recursive propagation. with up to 100𝑘 random input patterns to obtain an accurate signal
Reversed Propagation Layer. In DeepGate, we also consider back- probability on every node.
ward information propagation, i.e., processing the graph in reversed Table 1 presents the statistics of the circuit dataset. #Subcircuits
topological order. One of the main reasons to introduce the back- shows the total number of subcircuits extracted from each bench-
ward layers in our framework is that logic implication and back- mark. As shown in the table, the constructed circuit dataset covers
tracking in the reversed order can be highly useful for predicting circuit sizes ranging from tens to thousands of nodes with different
the states of nodes. It also helps stabilize training, as proved in logic levels. Finally, there are 10, 824 circuits in total, and we create
sequence-to-sequence learning tasks [19]. 90/10 training/test splits for model training and evaluation.
670
Table 1: The Statistics of Circuit Training Dataset Table 2: The Performance Comparison of DeepGate with
other GNN models for Logic Probability Prediction
Benchmark #Subcircuits #Node #Level
Model Aggregator Avg. Prediction Error
EPFL 828 [52–341] [4–17] GCN Conv. Sum 0.1386
ITC99 7,560 [36–1,947] [3–23] Attention 0.1840
IWLS 1,281 [41–2,268] [5–24] DeepSet 0.2541
Opencores 1,155 [51–3,214] [4–18] GatedSum 0.1995
Total 10,824 [36–3,214] [3–24] DAG-ConvGNN Conv. Sum 0.2215
Attention 0.2398
DeepSet 0.2431
the better the model performs. GatedSum 0.2333
DAG-RecGNN Conv. Sum 0.0328
1 ∑︁ (T=10) DeepSet 0.0302
𝐴𝑣𝑔. 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝐸𝑟𝑟𝑜𝑟 = |𝑦 𝑣 − 𝑦ˆ𝑣 | (8)
𝑁
𝑣 ∈V GatedSum 0.0329
DeepGate Attention w/o SC 0.0234
We consider three GNN models: GCN, DAG-ConvGNN, and (T=10) Attention w/ SC 0.0204
DAG-RecGNN. GCN model treats the circuit graphs as undirected
graphs in representation learning. DAG-ConvGNN model follows Table 3: The Performance Comparison of DeepGate and
the settings defined in Equation (3). For DAG-RecGNN model, we DeepSet on Five Large Circuits
adopt the same COMBINE function and the reversed propagation
layer design in DeepGate, as depicted in Section 3.3. As for the GNN Design #Nodes Levels DeepSet DeepGate Reduction
model in DeepGate, it contains additional attention mechanism and Arbiter 23.7K 173 0.0277 0.0073 73.56%
skip connection (SC). Under every setting, we evaluate 4 different Squarer 36.0K 373 0.0495 0.0346 30.16%
aggregator designs, which include representative works for DAG Multiplier 47.3K 521 0.0220 0.0159 27.94%
learning, i.e., Convolutional Sum (abbreviated as Conv. Sum) [18], 80386 Processor 13.2K 122 0.0534 0.0387 27.56%
Attention [21, 23], GatedSum [27] and DeepSet [3]. Viper Processor 40.5K 133 0.0520 0.0389 25.18%
In order to make the comparison fair, we instantiate all models
with 𝑑 = 64 for the node hidden states h𝑣 and design the other
parameterized functions to have a similar number of tunable param- DeepSet aggregator. Hence, using the dedicated attention mecha-
eters. For DAG-RecGNNs and our DeepGate model, a forward layer nism benefits logic representation learning. Third, equipped with
is followed by a reversed layer, and 𝑇 = 10 iterations of message skip connection design, DeepGate can further reduce the prediction
passing are performed to obtain the final embeddings. We choose error from 0.0234 to 0.0204, which reveals the efficacy of introducing
𝐿 = 8 in Equation (7) for the skip connection setting. For training, the reconvergence knowledge into the model design. To summarize,
all models are optimized for 60 epochs using the ADAM optimizer with only the gate type information and the connectivity between
with a learning rate of 1 × 10−4 . We use the topological batching gates, DeepGate learns to predict highly-accurate probabilities for
technique introduced in [21] to accelerate the training. logic gates.
As we observe that DAG-RecRNN with DeepSet aggregator (ab-
breviated as DeepSet for the following discussion) performs better
4.3 Probability Prediction
than the other baselines, in later results, we only compare DeepGate
(w/ skip connection) with it.
4.3.1 Comparison of DeepGate with Baseline Solutions.
4.3.2 Results on Large Circuits.
Table 2 compares DeepGate with other baseline solutions in
terms of prediction error. From this table, we have several obser- Furthermore, we evaluate DeepGate on five circuit designs that
vations: First, both GCN and DAG-ConvGNN are subject to poor are substantially larger than the circuits it saw during training. The
performance for probability prediction, mainly due to their lack of circuit statistics and the prediction error of both DeepGate and
ability to model the computational behaviours of circuits. For in- DeepSet are shown in Table 3. The number of gates in these designs
stance, the best GCN model, equipped with Conv. Sum, gives 0.1386 is two orders of magnitude more than that of the training circuits.
of prediction error, which in turn is even higher than the worst per- We can observe that DeepGate achieves similar prediction accu-
forming model of DAG-RecGNN. Therefore, only by incorporating racy as that on small circuits, and it outperforms DeepSet in these
the logical ordering into the model design and conducting the prop- large circuits by a large margin. Such results clearly demonstrate
agation recurrently, will make the model perform well. It shows the the generalization capability of DeepGate. In particular, DeepGate
advantage of DAG-RecGNN implementation with dedicated recur- achieves 73.56% prediction error reduction on Arbiter. This is be-
rent scheme and reversed layer design discussed in Section 3.3 over cause, the Arbiter circuit is designed to accommodate access from
simpler GNN architectures. Second, among all models, DeepGate multiple requests, and it contains repetitive logic units with many
with attention alone achieves significant prediction error reduc- reconvergence structures. As DeepGate treats such structures as
tion. It brings 22.76% relative improvement compared with the best a first-class citizen in the GNN model, it can generate much more
baseline solution, which is the DAG-RecGNN model equipped with accurate predictions.
671
Table 4: The Performance of DeepGate with and without unified AIG format and introduce a novel GNN model with circuit
Circuit Transformation knowledge as priors for effective representation learning. Using
informative signal probability as supervision tasks on small sub-
w/o Tran. w/ Tran. Pre-trained circuits, we show DeepGate can generalize to large circuits with
EPFL 0.0442 0.0292 0.0142 accurate predictions without any pre-computed features.
IWLS 0.0447 0.0342 0.0209 While showing promising results, the current DeepGate model
is still in its infancy. For example, we could introduce other infor-
4.4 Discussion mative supervision tasks (e.g., logic inference and Boolean satisfia-
4.4.1 Effectiveness of Circuit Transformation. bility) to achieve better representations for logic gates. We could
also add more circuits for training to build a large-scale founda-
DeepGate uses the logic synthesis tool to transform the circuits tion model for logic circuits [4]. Moreover, in our future work, we
from different sources into unified AIG forms. One may wonder plan to apply the representations learned in DeepGate onto many
the performance of DeepGate if the network is directly trained on downstream EDA tasks (e.g., power estimation, logic reduction, and
the original circuits, wherein other gate types (e.g., XOR, NAND, equivalence checking). These tasks are directly related to signal
NOR, and OR) are also included. To investigate the effectiveness of probability analysis, and we believe DeepGate can achieve satisfac-
the circuit transformation in DeepGate, we conduct the controlled tory results without much effort in finetuning the model.
experiments on EPFL and IWLS benchmarks, as shown in Table 4.
Take EPFL as an example, we extract 375 sub-circuits from the 6 ACKNOWLEDGEMENT
original designs, and develop two versions: the ones with the origi-
This work was supported in part by Huawei Technologies Co. Ltd.
nal 6 gates types and the other with AIG transformation. For each
version, we train the DeepGate model from scratch. The only dif- REFERENCES
ference is that for the former version of the dataset, we assign 7-d [1] Christoph Albrecht. 2005. IWLS 2005 benchmarks. In IWLS.
one-hot encoding for the node feature x𝑣 . As can be observed from [2] Luca Amarú, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli. 2015. The
Table 4, DeepGate trained on AIGs performs better than the one EPFL combinational benchmark suite. In IWLS.
[3] Saeed Amizadeh, Sergiy Matusevych, and Markus Weimer. 2019. Learning To
trained on the original circuits by a large margin (33.94% relative Solve Circuit-SAT: An Unsupervised Differentiable Approach. In ICLR.
prediction error reduction on EPFL). The same observation can be [4] Rishi Bommasani et al. 2021. On the opportunities and risks of foundation models.
obtained from the results on IWLS circuits. arXiv preprint arXiv:2108.07258 (2021).
[5] Robert Brayton and Alan Mishchenko. 2010. ABC: An academic industrial-
Such improvements originate from the benefit of circuit trans- strength verification tool. In CAV. Springer, 24–40.
formation because when only two logic gate types are considered, [6] Tom B Brown et al. 2020. Language models are few-shot learners. arXiv preprint
arXiv:2005.14165 (2020).
representation learning difficulty is reduced dramatically without [7] Scott Davidson. 1999. Characteristics of the ITC’99 benchmark circuits. In ITSW.
any impact on circuit functionalities. Also, we manually check the [8] Jacob Devlin et al. 2018. Bert: Pre-training of deep bidirectional transformers for
usage frequency of different gate types in the original formats, language understanding. arXiv preprint arXiv:1810.04805 (2018).
[9] J. Gilmer et al. 2017. Neural message passing for quantum chemistry. In ICML.
and observe that some gates types (e.g., XOR and NAND) are used [10] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation
much less frequently. Such in-balanced gate distributions may lead learning on large graphs. In NIPS.
to insufficient training, causing higher prediction errors. [11] Xu Han et al. 2021. Pre-trained models: Past, present and future. AI Open (2021).
[12] Weihua Hu et al. 2020. Open graph benchmark: Datasets for machine learning
Additionally, we directly apply the pre-trained DeepGate model on graphs. arXiv preprint arXiv:2005.00687 (2020).
on the merged AIG dataset, as shown in Section 4.1 for comparison. [13] Guyue Huang et al. 2021. Machine learning for electronic design automation: A
survey. TODAES (2021).
As can be observed, DeepGate trained on the dataset consisting of [14] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph
different benchmarks can further reduce the prediction errors by convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
51.37%. It supports the claim that unifying different circuit designs [15] Robert Kirby et al. 2019. CongestionNet: Routing congestion prediction using
deep graph neural networks. In VLSI-SoC. IEEE.
into a common intermediate representation can help the model [16] Yuzhe Ma et al. 2019. High performance graph convolutional networks with
learn a better representation of logic gates. applications in testability analysis. In DAC.
[17] MW Roberts and PK Lala. 1987. Algorithm to detect reconvergent fanouts in
4.4.2 Impact of Recurrence Iterations. logic circuits. IEEE Proceedings Computers and Digital Techniques (1987).
[18] Daniel Selsam et al. 2018. Learning a SAT Solver from Single-Bit Supervision. In
During inference, the number of iterations 𝑇 can be set as differ- International Conference on Learning Representations.
[19] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence
ent values. The higher the value, the higher the computational cost. Learning with Neural Networks. arXiv:1409.3215
We enumerate 𝑇 from 1 to 50, and we observe that our GNN model [20] Opencores Team. [n. d.]. Opencores. https://opencores.org/.
is able to decrease the prediction loss as 𝑇 increases. However, [21] V. Thost and J. Chen. 2021. Directed Acyclic Graph Neural Networks. In ICLR.
[22] Ashish Vaswani et al. 2017. Attention is all you need. In NIPS.
the prediction error converges quickly at around 𝑇 = 10, despite [23] Petar Veličković et al. 2017. Graph Attention Networks. ICLR (2017).
the circuit size. Such experimental results further demonstrate the [24] Le Wu et al. 2018. Socialgcn: An efficient graph convolutional network based
scalability of the proposed DeepGate solution. model for social recommendation. arXiv preprint arXiv:1811.02815 (2018).
[25] Zonghan Wu et al. 2020. A comprehensive survey on graph neural networks.
IEEE transactions on neural networks and learning systems (2020).
5 CONCLUSION AND FUTURE WORK [26] Zhiyao Xie et al. 2021. Net2: A Graph Attention Network Method Customized
for Pre-Placement Net Length Estimation. In ASP-DAC. IEEE.
This paper proposes DeepGate, a novel representation learning [27] Muhan Zhang et al. 2019. D-VAE: A Variational Autoencoder for Directed Acyclic
solution that effectively embeds both logic functions and structural Graphs. arXiv:1904.11088
[28] Yanqing Zhang, Haoxing Ren, and Brucek Khailany. 2020. GRANNITE: Graph
information of a circuit as vectors on each gate. In DeepGate, we neural network inference for transferable power estimation. In DAC. IEEE.
construct easy-to-learn circuit graphs by transforming circuits into
672