0% found this document useful (0 votes)
44 views8 pages

Wu Et Al. - 2022 - AI-assisted Synthesis in Next Generation EDA Promises, Challenges, and Prospects

The document discusses the integration of AI techniques, specifically graph neural networks (GNNs) and reinforcement learning (RL), into electronic design automation (EDA) to enhance hardware development efficiency. It highlights the challenges of current EDA tools in providing fast and accurate quality-of-result evaluations and proposes methods to improve performance predictions and design exploration. The authors emphasize the potential of machine learning to drive the next generation of EDA tools, ultimately aiming to reduce chip development cycles significantly.

Uploaded by

yangkunkuo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views8 pages

Wu Et Al. - 2022 - AI-assisted Synthesis in Next Generation EDA Promises, Challenges, and Prospects

The document discusses the integration of AI techniques, specifically graph neural networks (GNNs) and reinforcement learning (RL), into electronic design automation (EDA) to enhance hardware development efficiency. It highlights the challenges of current EDA tools in providing fast and accurate quality-of-result evaluations and proposes methods to improve performance predictions and design exploration. The authors emphasize the potential of machine learning to drive the next generation of EDA tools, ultimately aiming to reduce chip development cycles significantly.

Uploaded by

yangkunkuo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2022 IEEE 40th International Conference on Computer Design (ICCD)

AI-assisted Synthesis in Next Generation EDA:


Promises, Challenges, and Prospects
2022 IEEE 40th International Conference on Computer Design (ICCD) | 978-1-6654-6186-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICCD56317.2022.00039

Nan Wu1 , Yuan Xie2 , Cong Hao3


1
University of California - Santa Barbara, 2 Alibaba DAMO Academy, 3 Georgia Institute of Technology
[email protected], [email protected], [email protected]

Abstract—Despite the great advance achieved by electronic Applications in


 

  !  !  ! 

design automation (EDA) tools, there is still a long way towards C/C++/SystemC ! + ! * '
 & + ! ! ! * )
hardware agile development, whose ultimate goal is to reduce  ' + & * !

Evaluation
 ( + ! ! '
chip development cycles from years to months or even weeks. High-Level Synthesis ! + & * ' * (

Hardware development typically involves many optimization- Intermediate


evaluation iterations, indicating that (1) fast and accurate quality- Representation (IR) Optimization HLS Reports
Fast but inaccurate
of-result (QoR) evaluation and (2) efficient optimization, either Scheduling, Binding, etc.
QoR estimations

independently or integrally, will conspicuously improve the de-


  " "  "" 
velopment efficiency. Specifically, targeting high-level synthesis  " 
 (&%
and logic synthesis, we investigate (1) the power of exploiting RTL (Verilog/VHDL)  (&%

graph neural networks (GNNs) for generalizable and accurate " "  $  "&
performance predictions, (2) the efficacy of applying reinforce- Logic Synthesis

  #  "  
ment learning (RL) for design exploration, and (3) the superiority 

Evaluation

of combining GNN and RL to solve EDA problems. Experimental   
 
results demonstrate the promises of infusing intelligence into Technology Mapping Optimization

design synthesis and EDA tools. On top of current endeavors,


we summarize the challenges in the respective EDA contexts and Physical Synthesis
the prospects toward next generation EDA tools.
Post-P&R QoRs
Layout/Bitstream Accurate but time-
consuming
Index Terms—machine learning for EDA, high-level synthesis,
logic synthesis, graph neural network, reinforcement learning,
performance modeling, design space exploration Fig. 1. Development flow from behavioral programs in high-level languages
to circuit designs.

I. I NTRODUCTION
Moore’s law [1], [2] has been powering the integrated circuit signs (e.g., layout/bitstream), including high-level synthesis
revolutions since 1960s, which doubles the transistor density (HLS), logic synthesis, and physical implementation. Notably,
every 18 months. Even if the target cadence of Moore’s law every design stage associates with evaluation phases that
is slipping [3], the electronic industry continues to move should accurately assess the quality of results (QoRs) of
to larger-scale, more complex, and heterogeneous designs circuit designs to guide design explorations and optimization
and systems, to keep pace with the exponentially growing phases that should sufficiently explore design knobs to meet
compute demand of different applications [4]. However, with specified performance requirements. Traditional EDA tools
the increasing complexity in hardware designs, from the time- usually provide either accurate yet time-consuming or fast
to-market aspect, near 70% of application-specific integrated yet inaccurate QoR estimations [9]–[12], and extensive man-
circuit (ASIC) or field-programmable gate array (FPGA) ual efforts are required for design space exploration (DSE)
projects are completed behind schedule in 2020 [5]; from the to satisfy diverse performance, resource, and power targets.
cost aspect, the development costs of leading-edge electronic These all result in long time-to-market, which is further
designs are skyrocketing [6]; from the tool aspect, existing exacerbated by the explosion of modern hardware system
electronic design automation (EDA) tools cannot adequately complexity and technology scaling. Given the avidity toward
address emerging hardware development [7]. These all herald hardware agile development and productivity boost, it is highly
the necessity of hardware agile development, with the ultimate expected to infuse more intelligence into EDA tools to enable
goal to reduce chip development cycles from years to months fast and accurate QoR evaluation and efficient optimization,
or even weeks. One example is the Intelligent Design of either independently or integrally, so that design iterations are
Electronic Assets (IDEA) program [8], aiming to accelerate conspicuously sped up to improve development efficiency.
development cycle of next-generation electronic systems with Recent years have witnessed the emergence of machine
reduced labors, costs, and design complexity barriers. learning (ML) applied for computer architecture and sys-
Hardware development is an iterative process involving tems [13], revealing the great potentials of ML-based perfor-
many optimization-evaluation iterations. Fig. 1 depicts the mance modeling and ML-assisted design optimization. In this
typical design flow from behavioral programs to circuit de- paper, we investigate how ML techniques can be embraced

2576-6996/22/$31.00 ©2022 IEEE 207


DOI 10.1109/ICCD56317.2022.00039
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
in HLS and logic synthesis for 1 fast and accurate QoR there awaits a comprehensive comparison among prediction
evaluation and 2 efficient optimization. strategies at different HLS stages in terms of prediction
• In Section II and Section III, we explore the power of accuracy and timeliness.
various graph neural networks (GNNs) for fast, accurate, B. Three Prediction Strategies using GNNs
and generalizable performance predictions in HLS and logic
synthesis [12], [14], since data are naturally represented in In response to existing limitations (i.e., weak generalizabil-
graphs for many EDA tasks. ity, unreconciled accuracy and timeliness), we propose three
• In Section IV, we study the efficacy of flexible and auto-
GNN-based prediction approaches [12] with various trade-offs
matic design exploration in HLS enabled by reinforcement between timeliness and accuracy (as shown in Fig. 2). To make
learning (RL), whose superiority is further enhanced by it timely, we perform predictions based on the IR graph of a
the assistance of GNNs to transfer knowledge and past program, i.e., data flow graph (DFG) and control data flow
experiences to new or unseen applications [11], [15]. graph (CDFG), since these IR graphs can be quickly extracted
• In Section V, we discuss the challenges and prospects on
after the front-end compilation [28] within seconds. To make it
top of current endeavors, regarding data collection, model generalizable, we exploit the inductiveness of GNNs to make
selection, and deployment scenarios. We envision ML-based accurate predictions for completely unseen designs without
techniques could be the impetus to next-generation EDA and retraining. Specifically, the three approaches are:
hardware agile development. • Off-the-shelf approach. The first approach directly pre-
dicts post-implementation performance metrics based on IR
II. HLS P ERFORMANCE P REDICTIONS WITH GNN S graphs. The features are extracted right after HLS front-
HLS expedites ASIC/FPGA development by automatic end compilation, leading to the earliest predictions but with
transformation from behavioral descriptions in high-level lan- compromised accuracy due to the ignorance of hardware-
guages (C/C++, etc.) to functionally equivalent RTL designs specific information.
with various resource/performance trade-offs. • Knowledge-rich approach. The second approach draws
support from domain information distilled from interme-
A. Why, How, and When to Predict diate HLS results (i.e., partial execution of HLS but no
There are three fundamental questions for HLS performance implementation): the resource usage associated with each
predictions: why, how, and when. node in IR graphs. Armed with rich domain knowledge,
Why to predict? Though HLS tools can greatly speed this approach emphasizes more on prediction accuracy, es-
up circuit design, they still require minutes to hours for pecially for resource estimation, yet compromises timeliness
design synthesis before place and route, and the reported and efficiency since HLS tools do take some time to generate
hardware performance metrics are far from accurate [9]–[12]. intermediate results.
This prevents designers from sufficient design exploration and • Knowledge-infused approach. The third approach is a hier-
optimization. Thus, a quick and accurate QoR evaluation at archical GNN-based prediction strategy that reaps advan-
early design stages, even before HLS, is highly expected. tages of the previous two approaches: not only does it
How to predict? For HLS-based hardware designs, post- make the earliest prediction but also benefits from domain
implementation metrics (e.g, resource usage and timing) are knowledge with almost zero overhead during inference. The
of high interest. Classic approaches generally use analytical knowledge infusion is achieved by decoupling the prediction
models [16]–[18], which are more suitable for well structured task into two steps: the first step is node-level classification
data flows. Existing ML-based approaches attempt linear re- for resource types, in which the domain knowledge is
gression, ANN, support vector machine, random forest, and infused during the training phase; the second step is graph-
ensemble models [9], [19]–[21], promising but requiring heavy level regression that estimates the numerical resource usage
feature engineering after HLS execution. One major concern and timing on top of the self-inferred domain knowledge.
of these approaches is the limited generalization capability,
which have to re-run HLS or implementation for each new C. Promises in Accuracy, Timeliness, and Generalizability
and unseen design to collect features. In this regard, more Predicting post-implementation resource/timing from IR
advanced methods are expected for generalizable and accurate graphs is using GNNs to approximate the set of sophisticated
HLS performance predictions. heuristics and mapping rules used by HLS scheduling/binding
When to predict? HLS performance predictions can be and logic/physical synthesis during the design flow. We launch
conducted at different synthesis stages, based on the features discussions from three aspects: 1 how different applications
employed in the ML-based predictors. There are three major (i.e., graph structures) influence prediction accuracy; 2 which
sources for feature extraction: HLS directives [22]–[24], inter- properties of existing GNN models would help improve ac-
mediate representations (IRs) after HLS front-end compilation curacy; 3 what domain-specific insights can be derived to
[10], [25], [26], and HLS synthesis reports [9], [20], [27]. facilitate future graph representation learning on fast and
In general, an early and timely prediction benefits agile generalizable QoR evaluation.
development, but little domain-specific knowledge is exposed Different graph structures. In the off-the-shelf approach,
at this stage, which probably hurts prediction accuracy. Thus, we screen 14 state-of-the-art GNN models [12]. Table I

208

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
 
&' % % &'"%! &'"%
&  ' &  ' &" !#$ ! '
     

                              
    %    % $  $   %    % *
  %  &   &   ! 
 
    %  &
 
 

  
  ( 
    
     
 
  (   )
   (   )     
    )
 ) & &
! "    
   ' # ##(   
 ) & &     ) & &
 ) & &    
                    

Fig. 2. Our three proposed approaches: (a) off-the-shelf approach, which makes predictions at the earliest stage based on IR graphs; (b) knowledge-infused
approach, which breaks the prediction task into two steps, node-level resource type classification and graph-level resource usage and timing regression, striving
a balance between timeliness (i.e., making predictions at the earliest stage with IR graphs) and accuracy (i.e., using self-inferred domain-specific information);
(c) knowledge-rich approach, which needs to obtain auxiliary information after partial execution of HLS, producing accurate but relatively late predictions.

exhibits the mean absolute percentage error (MAPE) of pre- TABLE I


dictions on resource usage (DSP/LUT/FF) and critical path T ESTING MAPE WITH DIFFERENT GNN MODELS ON SYNTHETIC
PROGRAMS . T HE TOP TWO PERFORMANT MODELS ARE MARKED IN BOLD .
timing (CP) of synthetic programs. In general, CDFGs have
larger MAPE, since message-passing-based GNN models have DFG CDFG
DSP LUT FF CP DSP LUT FF CP
limited expressiveness on graphs with many loops [29], and
GCN 16.31% 16.49% 21.27% 6.12% 25.30% 28.64% 38.34% 8.79%
the additional nodes/edges representing control states and GCN-V 15.72% 15.93% 21.64% 6.36% 17.31% 33.93% 39.94% 8.13%
dependency introduced by control signals easily confuse GNN SGC 42.12% 23.93% 30.61% 7.92% 44.01% 60.87% 53.50% 10.32%
SAGE 15.18% 14.01% 17.11% 6.12% 17.01% 28.09% 39.11% 8.25%
models during resource prediction. ARMA 19.12% 13.46% 16.87% 6.50% 18.47% 25.21% 32.15% 8.42%
GNN model analysis. PNA and RGCN generally show su- PAN 15.24% 14.13% 17.23% 6.38% 16.88% 32.65% 44.36% 8.54%
GIN 15.52% 16.10% 22.08% 6.58% 15.47% 28.48% 38.82% 8.76%
perior performance, implying two takeaways: 1 the relational GIN-V 15.04% 16.17% 23.09% 6.40% 17.94% 29.40% 48.64% 8.59%
information (i.e., edge information) is important in IR graphs, PNA 12.65% 11.64% 14.41% 6.26% 14.71% 22.86% 26.47% 8.87%
GAT 26.22% 22.64% 27.74% 8.30% 28.66% 46.19% 54.73% 10.32%
since it represents data or control dependency, or a mix of GGNN 15.40% 13.64% 16.94% 6.47% 16.28% 28.05% 31.88% 8.50%
both, which is a critical basis in logic synthesis and impacts RGCN 13.27% 13.03% 15.09% 6.14% 15.03% 26.33% 25.52% 8.72%
UNet 18.40% 14.90% 19.17% 6.61% 18.92% 32.83% 53.06% 9.02%
resource allocation; 2 equipped with multiple aggregators, FiLM 20.05% 12.50% 16.94% 6.27% 17.42% 26.97% 27.35% 8.67%
PNA is more powerful to characterize different neighborhood
information, thus providing better prediction accuracy. TABLE II
T ESTING MAPE ON REALISTIC APPLICATIONS . -I IS
Domain-specific insights. 1 Resource. The key to making KNOWLEDGE - INFUSED APPROACH ; -R IS KNOWLEDGE - RICH APPROACH .
precise DSP prediction is to distinguish major computation
nodes most likely to use DSPs. Similarly, effective extraction HLS RGCN RGCN-I RGCN-R PNA PNA-I PNA-R

of memory-related nodes will greatly benefit FF predictions. DSP 26.07% 45.61% 40.89% 32.90% 40.06% 21.95% 15.20%
LUT 871.56% 66.23% 30.91% 24.08% 56.34% 21.45% 16.96%
As LUTs are involved in the entire graph (as compute units and FF 322.86% 101.20% 38.75% 27.72% 47.65% 20.10% 17.42%
glue logic to circuit components), graph-level understanding CP 32.09% 8.13% 5.35% 5.83% 8.68% 4.80% 3.97%

is important. 2 Timing. Compared with resource predictions,


CP predictions show relatively lower MAPE and better consis-
diction accuracy from the earliest design stage.
tency between DFGs and CDFGs, which is probably because
Table II shows MAPE of the three proposed approaches and
timing is local information and insensitive to graph sizes.
Vitis HLS [30] on real-case applications [31]–[33]. Compared
Accurate, timely, and generalizable. Intuitively, the more
with Vitis HLS, PNA-based knowledge-infused approach (de-
domain information is leveraged, the more accurate predictions
noted as PNA-I) and knowledge-rich approach (denoted as
are provided, whereas the longer time would be taken for
PNA-R) reduce prediction errors by 1.2× to 40.6× and 1.7×
feature collection. The knowledge-infused approach strikes a
to 51.4×, respectively. Such results empirically demonstrate
great balance between accuracy and timeliness. As shown in
1 generalization capability not only from seen to unseen
Fig. 2(b), with hierarchical training both the node-level and the
designs but also from synthetic to realistic applications, and 2
graph-level GNN models are approximating simplified design
accuracy and timeliness conspicuously surpassing HLS tools.
heuristics: the node-level classification aims to understand the
preference of resource types on different nodes; the graph-level
III. L OGIC S YNTHESIS Q O R P REDICTIONS WITH
regression focuses on globally estimating resource sharing and
M ULTI -M ODAL G RAPH L EARNING
interference among nodes. With hierarchical inference, the
domain knowledge infused during training can be self-inferred Logic synthesis converts RTL designs into optimized gate-
when encountering unseen designs, leading to improved pre- level representations. The goal is to reduce the amount of

209

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
required hardware or the critical path delay by sequences of Example flow Example design
logic transformations, referred to as logic synthesis flows.  

A. Challenges in ML-assisted Logic Optimization GNN QoR


Recently, ML-based approaches are employed to predict
QoRs of logic synthesis flows [34], [35], but there remain Temporal Spatial
Information Supernode Information
unresolved challenges and requirements.
(a) Circuit as a graph and synthesis flow as a supernode.
• There is no one-for-all solution. Commercial EDA tools
Example design
usually provide reference synthesis flows [36] developed by graph
pooling
experts, but such flows do not uniformly perform well. This GNN
suggests the importance of design-specific synthesis flows.
QoR
• The transformation order in synthesis flows should be well Spatial Information
captured, which majorly determines final QoRs. Example flow Temporal Information o5
• Existing approaches do not generalize across designs nor
h0 h1 h2 h3 h4 h5
flow lengths [34], [35]. Aiming at a practical use of ML-
based performance modeling, the generalization across b rw rwz b rwz b

different designs and flow lengths is a necessity. (b) Circuit as a graph and synthesis flow as a sequence.

B. Hybrid Graph Models using Spatio-Temporal Information Fig. 3. The proposed approach to predicting QoR after applying logic
synthesis flows on hardware designs. (a) GNN-S: the proposed GNN with
To address the aforementioned issues, we present a fast, ac- supernode. (b) GNN-H: the proposed hybrid GNN with LSTM.
curate, and generalizable ML approach for QoR estimations of TABLE III
logic synthesis flows, exploiting spatio-temporal information, C OMPARISON WITH LSTM [35] IN THE INDUCTIVE SCENARIO .
namely LOSTIN [14]. Two models are explored: 1 a GNN
for spatial information learning, armed with a supernode to Area (MAPE) Delay (MAPE)
LSTM GNN-S GNN-H LSTM GNN-S GNN-H
encode temporal information (denoted as GNN-S); 2 a hybrid
multiplier 57.82% 9.39% 2.45% 38.21% 17.89% 1.75%
model, composed of a GNN for spatial learning and an LSTM sin 66.09% 64.48% 2.34% 45.94% 54.44% 2.32%
for temporal learning (denoted as GNN-H). sqrt 29.03% 39.25% 4.83% 38.03% 15.75% 2.09%
square 38.59% 13.96% 2.86% 47.52% 31.34% 2.41%
GNN-S: GNN with supernode. Inspired by the idea that voter 27.38% 76.49% 3.08% 42.19% 46.54% 0.96%
introducing a supernode to graphs can collect and redistribute MEAN 43.78% 40.71% 3.11% 42.38% 33.20% 1.91%
global information [37], we propose to leverage a supernode
to represent synthesis flows. As the supernode is virtually
connected to all the nodes in the original graph, temporal in- GNN-S vs. GNN-H. We compare GNN-S and GNN-H
formation is directly injected into the circuit graph (Fig. 3(a)). regarding temporal information characterization. In GNN-S,
GNN-H: GNN with LSTM. Since synthesis flows are first, the supernode embedding is insensitive to the order
naturally represented in sequences, an alternative is to leverage of logic transformations; second, with message passing, the
a sequence processing model to distill the temporal informa- original temporal information in the supernode is gradually
tion. The specific model employed is LSTM, which excels faded in other nodes; third, simply adding a supernode to
at handling order dependence and variable-length flows. As original graphs may not be an efficient approach to fusing
shown in Fig. 3(b), we separately generate a sequence embed- information from different modalities. By contrast, GNN-H
ding for synthesis flow representation and a graph embedding takes advantages of both GNN and LSTM to extract spatio-
for circuit representation, which are then concatenated for temporal information in a decoupled manner: the LSTM
downstream predictions. This scheme not only significantly directly characterizes temporal information from synthesis
reduces the training complexity and memory overheads but is flows; the GNN focuses on representing spatial structures of
more efficient to fuse each source of input information. circuit designs. These separately learned embeddings have
better expressiveness for each source of input information, thus
C. Promises with Multi-Modal Graph Learning providing a better foundation for downstream tasks.
We select circuit designs from the EPFL benchmark [38]. Multi-modal graph representation learning. Graph rep-
The logic synthesis flows are generated by the logic synthesis resentation learning has evolved from single-modal to multi-
tool ABC [39]. Table III shows MAPE of QoR predictions modal [40], which inspires the adoption of multi-model graph
on designs unseen during training. 1 The LSTM-based learning for circuit quality evaluation, since the final QoR of
model [35] suffers from a large accuracy degradation for circuit designs is dependent on both circuit structures and logic
unseen designs, indicating limited generalization capability. synthesis flows. Our investigation with GNN-S and GNN-H
2 GNN-S slightly outperforms the LSTM-based model by shows that efficient approaches to extracting features and fus-
3% and 9% in area and delay prediction, respectively. 3 ing information from different modalities can conspicuously
GNN-H maintains its high prediction accuracy, demonstrating improve representation power. Multi-modal graph represen-
extraordinary generalization capability. tation learning, which integrates the knowledge from other

210

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
learning schemes with the conventional graph representation Applications in User-defined resource
C/C++/SystemC constraints (e.g., # of DSPs)
learning, is expected to provide more versatility for EDA tasks. Transformed C/C++
/
optimized directives
with

Generalization to other transformations. Though the High-Level Synthesis


IronMan Inference
main focus is the generalization across different circuit de- Intermediate
HLS IR
Representation (IR)
CT
signs, a more practical case as synthesis tools usually hold a DFG without DFG with
directives optimized
Scheduling, Binding, etc.
fixed set of transformations to be applied [36], [41], we briefly Graph
embeddings
directives

GPP RLMD
discuss the feasibility of generalizing to additional transfor- RTL (Verilog/VHDL)

mations. First, one of the preprocessing steps for LSTM- IronMan Training
Logic Synthesis
based models, i.e., the tokenization of logic transformations, CT: Code Transformer

DFG without DFG with


includes a special token designed for transformations that are Technology Mapping
directives
Actual resource
optimized
directives
predictions
unknown during training yet met in testing. Second, some out- GPP: GNN-based
Performance
RLMD: Reinforcement
Learning based Multi-
Physical Synthesis Predictor DFG with objective DSE
of-vocabulary techniques [42] can be adopted to improve the directives

generalization capability to new transformations. Layout/Bitstream Actual resource usage, timing, etc.

IV. F LEXIBLE , F INE - GRAINED , AND E FFICIENT DSE IN


HLS USING GNN AND RL Fig. 4. I RON M AN is a learning-based framework composed of CT, GPP, and
RLMD. During training, I RON M AN takes HLS C/C++ code and IRs as inputs
Despite the great success of HLS tools, we observe sev- and the actual RTL performance (e.g., resource and timing) as the ground truth
eral unresolved challenges: 1 hard-to-predict quality of the to train GPP and RLMD. Specifically, GPP is trained to predict LUT/DSP/CP
based on DFGs and applied directives as node features; RLMD concatenates
generated RTL designs, 2 concealed optimization opportu- the graph embeddings generated by GPP with the metadata of input DFGs
nities due to the high-level abstractions, and 3 inflexible or to compose state representations, outputs a binary probability distribution of
non-automatic design exploration among different objectives. whether to apply resource pragmas, and receives predicted LUT/DSP/CP from
GPP to derive rewards. During inference, the well-trained GPP provides graph
The GNN-based performance predictors (GPP) discussed in embeddings and performance predictions to RLMD; the trained RLMD either
Section II well address the first challenge. In this section, we finds optimized directives that satisfy user-specified design constraints such
introduce how the rest challenges are solved by RL. as available resources, or generates Pareto-solutions with various trade-offs
between different resource types.
A. Challenges in HLS Design Exploration
Concealed optimization opportunities. The high-level ab- engine for optimal resource allocation strategies under user-
straction in HLS conceals further optimization opportunities. specified constraints, which can also provide Pareto solutions
While guidelines of HLS code optimization toward different between different objectives.
design objectives are well investigated [43], they often focus
on coarse-grained optimization in the loop/array/function-level B. I RON M AN: GNN + RL for DSE
and manual efforts for fine-grained exploration (such as in the As shown in Fig. 4, we integrate CT, GPP, and RLMD
operator-level) are still required. Motivated by the necessity into a framework, I RON M AN [11], [15]. CT is the interface
of fine-grained DSE, we propose a code transformer (CT), between GPP/RLMD and HLS tools, which extracts DFGs
which breaks up the high-level abstractions by exposing op- after HLS front-end compilation to release more optimization
erations in behavioral descriptions. opportunities and then re-generates synthesizable C/C++ code
Inflexible DSE among different objectives. Traditional based on the optimized DFGs. GPP is a highly accurate GNN-
HLS DSE uses meta-heuristics, such as genetic algorithms based performance predictor as introduced in Section II.
(GA) [44], simulated annealing (SA) [45], particle swarm opti- RLMD is an RL-based DSE engine. As a case study of
mization (PSO) [46], and ant colony optimization (ACO) [47]. I RON M AN, the specific problem solved is to find a resource
For ML-based DSE, active learning [23] and Bayesian opti- allocation solution that strictly meets DSP constraints, or to
mization [48], [49] are popular options. However, there are find Pareto solutions between DSPs and LUTs (or CP timing)
several limitations. 1 Meta-heuristics require explorations on FPGAs, without sacrificing the compute latency. Thus,
from scratch for every new design and do not benefit from RLMD observes the raw DFG, its graph embedding, and user-
previous experiences, leading to long search time and degraded specified constraints as states, and takes actions to decide
solution quality. 2 Many DSE approaches need to invoke whether to use LUTs to implement a multiplication node by
HLS and implementation process to validate newly generated assigning resource pragmas. The reward function is defined
solutions during optimization, which is time-consuming. 3 as a negative weighted sum of multiple metrics, such as the
Not all DSE approaches are suitable for large design spaces. 4 predicted LUT usage, the predicted CP timing, and the dis-
Current DSE usually sacrifices design latency for less resource crepancy between predicted DSP usage and target DSP usage,
or vise versa [50], leaving flexible trade-offs among other which automatically enables multi-objective optimization by
objectives unexplored. One potential alternative is to trade one adjusting the weights of different metrics. The optimization
type of resource for another (e.g., LUT and DSP in FPGA), goal is to maximize the average expected rewards over all
which only can be done through tedious manual efforts. Mo- training DFGs, such that RLMD can figure out various trade-
tivated by the necessity and difficulty of efficient and flexible offs across various DFGs and DSP constraints. RLMD is
DSE, we propose an RL-based Multi-objective DSE (RLMD) equipped with two different RL methods, actor-critic (AC) and

211

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
     


  


  

  


Fig. 5. Pareto solutions found by RLMD, PSO, GA, SA, and ACO on four real-case benchmarks, gemm, kernel 2mm, spmv, and kernel adi, with unchanged
latency (i.e., the number of clock cycles of the synthesized design). The toolbox of RLMD involves AC, PG, either with or without a fine-tuning (FT) step.
Different settings of μ indicate that different importance is assigned to LUT utilization and CP timing during the optimization.

policy gradient (PG) [51], providing the flexibility to choose These promising results show great potentials of applying
a more proper optimization scheme for different cases. RL for DSE in HLS. We briefly explain the reasons for
RLMD fine-tuning. Given a new DFG, the simplest way is the superiority of I RON M AN: 1 the design space grows
to directly apply the pre-trained RLMD for inference. When exponentially with the size of DFGs, different graph topolo-
higher quality solutions are desired, the pre-trained RLMD can gies, and various data precisions; RL agents can explore
be further fine-tuned on a particular DFG. The fine-tuning step design space proactively and learn from past experiences; after
provides the flexibility to balance between a quick solution training, it can generalize to new problems with minimal fine-
with the pre-trained RLMD (which has learned rich knowledge tuning efforts, revealing better scalability and efficacy. 2 by
of resource allocation strategies on other DFGs) and a longer carefully defining reward functions, RL agents can achieve
yet better one for a particular DFG. multi-objective optimization automatically, eliminating manual
efforts to craft useful heuristics. 3 with the help of CT, RLMD
can conduct the fine-grained DSE that are not supported by
C. Promises in Flexible, Fine-grained, and Efficient DSE
any of the existing DSE approaches; with the help of GPP, the
We demonstrate the end-to-end benefits on benchmarks informative state representations not only significantly benefit
from real-world applications in Fig. 5. Obviously, RLMD, the learning process of RLMD but also enable RLMD to better
either with AC or PG method, outperforms GA, SA, PSO, generalize across different DFG topologies.
and ACO by a large margin. In terms of multi-objective V. C HALLENGE AND P ROSPECT
optimization, given DSP usage constraints, the solutions found
In this section, we discuss challenges and prospects of
with μ = 0.9 often consume fewer LUTs, compared with
exploiting ML techniques for EDA problems, which span data,
those found with μ = 0.1. This indicates that RLMD can
algorithms/models, deployment, and long-term targets.
properly balance between LUT usage and CP timing when
different importance is assigned to different metrics, whereas A. Data Collection
the heuristic-based methods cannot explicitly leverage the Data scarcity. In some EDA problems, such as place and route
trade-offs among multiple objectives. in physical synthesis, the simulation is extremely expensive.

212

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
As ML models usually require enough data to learn underlying updating to meet performance expectations. 1 ML models
statistics and make decisions, this gap between small data and can be retrained either at a regular interval or when key
big data often limits the capability of ML-based techniques. performance indicators are below certain thresholds. Retrain-
From the algorithm side, algorithms that can work with small ing models regularly, regardless of their performance, is a
data await to be developed. From the data side, generative more direct way, but it requires a clear understanding of how
methods can be used to generate synthetic data [52]. frequently a model should be updated under its own scenario.
Non-perfect data. Even if some EDA tools produce a lot of The model performance will decline if retraining intervals are
data (such as simulation-based testing), they are not always too spaced out in the interim. Monitoring key performance
properly labeled nor presented in the form suitable to ML indicators relies on a comprehensive panel of measurements
models. Thus, possible alternatives are unsupervised learning, that explicitly demonstrate model drift, whereas this may
semi-supervised learning [53], self-supervised learning [54], introduce additional hardware/software overhead and incorrect
or to combine supervised with unsupervised techniques [55]. selection of measurements often defeats the intention of this
Generalization to out-of-distribution data. Though synthetic method. 2 During the retraining of ML models, there is often
data can help with mitigating the data gap, it is noteworthy that a trade-off between newly collected data and previous data.
data distribution varies between synthetic and real-case data Properly assigning importance of input data would improve
[56], which often causes data drift or concept drift [57]. This retraining efficacy [65].
appeals for incorporating out-of-distribution methods [58].
D. General, Portable, and Agile Hardware Development
B. Model/Algorithm Development
Infusing more intelligence into EDA will make great strides
Multi-level abstraction and optimization. Classical EDA toward the landing of hardware agile development.
methods usually adopt a bottom-up or top-down procedure, General. We envisage an ML-based system-wise and holistic
encouraging ML-based techniques to distill hierarchical struc- framework with a panoramic vision: it should be able to
tures of hardware designs. Potential methods toward multi- leverage information from different levels of hardware designs
level design abstraction and optimization are 1 hierarchical in synergy, to thoroughly characterize the behaviors as well as
RL [59] that has flexible goal specifications and can learn their intrinsically hierarchical abstractions; it should also be
goal-directed behaviors in complex environments with sparse able to make decisions in different granularity, to control and
feedback and 2 multi-agent RL [60] where agents can be fully improve the hardware precisely and comprehensively.
cooperative, fully competitive, or a mix of the two, enabling Portable. The well-designed interfaces between EDA tools
versatility of system optimization. and ML-based techniques are expected to facilitate the porta-
Interpretability. The absence of interpretation regarding bility across different platforms, since ML models can perform
model behaviors and decisions limits wider adoption of ML for well without explicit descriptions of the target domain.
EDA tasks, since these explanations are important to identify Agile. The proliferation of ML-based techniques has more or
and expose potential problems during training and ensure less transformed the EDA workflow. We expect GNNs make
fidelity of models/algorithms. Thus, efforts in interpretable better use of naturally graphical data in the EDA field; we
ML [61], [62] are highly expected to promote production- expect deep RL be a powerful and general-purpose tool for
ready applications of ML for EDA. many EDA optimization problems, especially when the exact
C. Implementation and Deployment Improvement heuristic or objective is obscure; we expect more intelligence
will be infused into next-generation EDA tools, to enhance
Online vs. offline. When deploying ML-based techniques designers’ productivity and to thrive in the community.
for EDA tasks, it is crucial to deliberate design constraints
under different scenarios. 1 ML-based techniques are de- VI. C ONCLUSION
ployed online or during runtime, no matter the training phase
In this paper, we target HLS and logic synthesis, and discuss
is online or offline. Obviously, the model complexity and
1 the power of GNNs for fast, accurate, and generalizable
runtime overheads are strictly limited by specific constraints.
QoR predictions, and 2 the efficacy of RL-enabled flexible
If online learning is further desired, the design constraint will
and automatic design exploration. Standing on current endeav-
be more stringent. One promising approach is to employ semi-
ors, we provide a future vision of challenges and prospects of
online learning models, which have been applied to solve
infusing more intelligence for next-generation EDA.
some classical combinatorial optimization problems, such as
bipartite matching [63] and caching [64]. These models enable R EFERENCES
smooth interpolation between the best possible online and of-
fline training algorithms. 2 ML-based techniques are applied [1] G. E. Moore, “Cramming more components onto integrated circuits,”
Proc. IEEE, 1998.
offline to guide hardware design, and once the designing phase [2] C. A. Mack, “Fifty years of moore’s law,” IEEE Trans. Semicond.
is completed, ML models will not be invoked again. Thus, the Manuf., 2011.
offline applications can tolerate relatively higher overheads. [3] M. M. Waldrop, “The chips are down for moore’s law,” Nature News,
2016.
Model maintenance. In the case of offline training and online [4] OpenAI. (Accessed: 2022-08) Al and compute. [Online]. Available:
deployment, ML models require regular maintenance and https://openai.com/blog/ai-and-compute/

213

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
[5] H. Foster. (Accessed: 2022-08) The 2020 wilson research [34] C. Yu et al., “Developing synthesis flows without human knowledge,”
group functional verification study. [Online]. Available: in Proc. DAC, 2018.
https://blogs.sw.siemens.com/verificationhorizons/2020/10/27/ [35] C. Yu and W. Zhou, “Decision making in synthesis cross technologies
prologue-the-2020-wilson-research-group-functional-verification-study/ using lstms and transfer learning,” in Proc. MLCAD, 2020.
[6] F. Schirrmeister et al. (Accessed: 2022-08) Next generation [36] Synopsys. (Accessed: 2022-08) Lynx design system. [Online]. Available:
verification for the era of ai/ml and 5g. Design and https://www.synopsys.com/content/dam/synopsys/implementation&
Verification Conference and Exhibition, US (DVCon), 2020. signoff/datasheets/lynx-design-system-ds.pdf
[Online]. Available: https://dvcon-proceedings.org/document/ [37] J. Gilmer et al., “Neural message passing for quantum chemistry,” in
next-generation-verification-for-the-era-of-ai-ml-and-5g/ Proc. ICML, 2017.
[7] M. Rosker. (Accessed: 2022-08) Evolving the electronics [38] L. Amarú et al., “The epfl combinational benchmark suite,” in Proc.
resurgence initiative (eri 2.0). [Online]. Available: https: 24th Int. Workshop on Logic & Synthesis, 2015.
//www.ndia.org/-/media/sites/ndia/divisions/electronics/eri2 ndia [39] R. Brayton and A. Mishchenko, “Abc: an academic industrial-strength
20210421 releaseapproved 34584.ashx verification tool,” in Proc. CAV, 2010.
[8] J. Wilson. (Accessed: 2022-08) Intelligent design of electronic [40] A. Holzinger et al., “Towards multi-modal causability with graph neural
assets (idea). [Online]. Available: https://www.darpa.mil/program/ networks enabling information fusion for explainable ai,” Information
intelligent-design-of-electronic-assets Fusion, 2021.
[9] S. Dai et al., “Fast and accurate estimation of quality of results in high- [41] Cadence, “Genus synthesis solution,” Accessed: 2022-08.
level synthesis with machine learning,” in Proc. FCCM, 2018. [Online]. Available: https://www.cadence.com/en US/home/tools/
[10] E. Ustun et al., “Accurate operation delay prediction for fpga hls using digital-design-and-signoff/synthesis/genus-synthesis-solution.html
graph neural networks,” in Proc. ICCAD, 2020. [42] Z. Hu et al., “Few-shot representation learning for out-of-vocabulary
[11] N. Wu et al., “Ironman: Gnn-assisted design space exploration in high- words,” in Proc. ACL, 2019.
level synthesis via reinforcement learning,” in Proc. GLSVLSI, 2021. [43] J. de Fine Licht et al., “Transformations of high-level synthesis codes
[12] N. Wu et al., “High-level synthesis performance prediction using gnns: for high-performance computing,” IEEE Trans. Parallel Distrib. Syst.,
Benchmarking, modeling, and advancing,” in Proc. DAC, 2022. 2020.
[13] N. Wu and Y. Xie, “A survey of machine learning for computer [44] B. C. Schafer, “Parallel high-level synthesis design space exploration
architecture and systems,” ACM Comput. Surveys, 2022. for behavioral ips of exact latencies,” ACM TODAES, 2017.
[14] N. Wu et al., “Lostin: Logic optimization via spatio-temporal informa- [45] B. C. Schafer et al., “Adaptive simulated annealer for high level synthesis
tion with hybrid graph models,” in Proc. ASAP, 2022. design space exploration,” in Proc. VLSI-DAT, 2009.
[15] N. Wu et al., “Ironman-pro: Multi-objective design space exploration [46] Y. Zhang et al., “A comprehensive survey on particle swarm optimization
in hls via reinforcement learning and graph neural network based algorithm and its applications,” Math. Problems in Eng., 2015.
modeling,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., [47] D. Liu and B. C. Schafer, “Efficient and reliable high-level synthesis
2022. design space explorer for fpgas,” in Proc. FPL, 2016.
[16] J. Zhao et al., “Comba: A comprehensive model-based analysis frame- [48] Q. Sun et al., “Correlated multi-objective multi-fidelity optimization for
work for high level synthesis of real applications,” in Proc. ICCAD, hls directives design,” in Proc. DATE, 2021.
2017. [49] A. Mehrabi et al., “Prospector: synthesizing efficient accelerators via
[17] A. B. Perina et al., “Lina: Timing-constrained high-level synthesis statistical learning,” in Proc. DATE, 2020.
performance estimator for fast dse,” in Proc. ICFPT, 2019. [50] B. C. Schafer and Z. Wang, “High-level synthesis design space explo-
[18] J. Zhao et al., “Performance modeling and directives optimization for ration: Past, present, and future,” IEEE Trans. Comput.-Aided Design
high-level synthesis on fpga,” IEEE Trans. Comput.-Aided Design Integr. Integr. Circuits Syst., 2019.
Circuits Syst., 2019. [51] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
[19] K. O’Neal et al., “Hlspredict: Cross platform performance prediction MIT press, 2018.
for fpga high-level synthesis,” in Proc. ICCAD, 2018. [52] Y. Ding et al., “Generative and multi-phase learning for computer
[20] H. M. Makrani et al., “Pyramid: Machine learning framework to estimate systems optimization,” in Proc. ISCA, 2019.
the optimal timing and resource usage of a high-level synthesis design,” [53] J. E. Van Engelen and H. H. Hoos, “A survey on semi-supervised
in Proc. FPL, 2019. learning,” Mach. Learn., 2020.
[21] H. M. Makrani et al., “Xppe: cross-platform performance estimation [54] D. Hendrycks et al., “Using self-supervised learning can improve model
of hardware accelerators using machine learning,” in Proc. ASP-DAC, robustness and uncertainty,” Proc. NeurIPS, 2019.
2019. [55] M. Alawieh et al., “Efficient hierarchical performance modeling for
[22] H.-Y. Liu and L. P. Carloni, “On learning-based methods for design- integrated circuits via bayesian co-learning,” in Proc. DAC, 2017.
space exploration with high-level synthesis,” in Proc. 50th DAC, 2013. [56] N. Wu et al., “Program-to-circuit: Exploiting gnns for program represen-
[23] P. Meng et al., “Adaptive threshold non-pareto elimination: Re-thinking tation and circuit translation,” arXiv preprint arXiv:2109.06265, 2021.
machine learning for system level design space exploration on fpgas,” [57] A. Tsymbal, “The problem of concept drift: definitions and related
in Proc. DATE, 2016. work,” Computer Science Department, Trinity College Dublin, 2004.
[24] J. Kwon and L. P. Carloni, “Transfer learning for design-space explo- [58] H. Li et al., “Ood-gnn: Out-of-distribution generalized graph neural
ration with high-level synthesis,” in Proc. MLCAD, 2020. network,” IEEE Trans. Knowl. Data Eng., 2022.
[25] D. Koeplinger et al., “Automatic generation of efficient accelerators for [59] T. D. Kulkarni et al., “Hierarchical deep reinforcement learning: inte-
reconfigurable hardware,” in Proc. ISCA, 2016. grating temporal abstraction and intrinsic motivation,” in Proc. NeurIPS,
[26] J. Zhao et al., “Machine learning based routing congestion prediction 2016.
in fpga high-level synthesis,” in Proc. DATE, 2019. [60] K. Zhang et al., “Multi-agent reinforcement learning: A selective
[27] Z. Lin et al., “Hl-pow: A learning-based power modeling framework for overview of theories and algorithms,” Handbook of Reinforcement
high-level synthesis,” in Proc. ASP-DAC, 2020. Learning and Control, 2021.
[28] A. V. Aho et al., Compilers: principles, techniques, & tools. Pearson [61] L. H. Gilpin et al., “Explaining explanations: An overview of inter-
Education India, 2007. pretability of machine learning,” in Proc. DSAA, 2018.
[29] H. Maron et al., “Provably powerful graph networks,” Proc. NeurIPS, [62] D. V. Carvalho et al., “Machine learning interpretability: A survey on
2019. methods and metrics,” Electronics, 2019.
[30] Vitis, Vitis High-Level Synthesis User Guide (UG1399), Accessed: 2022- [63] R. Kumar et al., “Semi-online bipartite matching,” in Proc. 10th Inno-
08, https://docs.xilinx.com/r/en-US/ug1399-vitis-hls. vations in Theor. Comput. Sci. Conf., 2019.
[31] B. Reagen et al., “Machsuite: Benchmarks for accelerator design and [64] R. Kumar et al., “Interleaved caching with access graphs,” in Proc. 14th
customized architectures,” in Proc. IISWC, 2014. ACM-SIAM Symp. on Discrete Algorithms. SIAM, 2020.
[32] Y. Hara et al., “Proposal and quantitative analysis of the chstone [65] J. Byrd and Z. Lipton, “What is the effect of importance weighting in
benchmark program suite for practical c-based high-level synthesis,” deep learning?” in Proc. ICML., 2019.
Journal of Information Processing, 2009.
[33] L.-N. Pouchet and T. Yuki. (2016) Polyhedral benchmark suite. [Online].
Available: http://web.cs.ucla.edu/∼pouchet/software/polybench/

214

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.

You might also like