Wu Et Al. - 2022 - AI-assisted Synthesis in Next Generation EDA Promises, Challenges, and Prospects
Wu Et Al. - 2022 - AI-assisted Synthesis in Next Generation EDA Promises, Challenges, and Prospects
design automation (EDA) tools, there is still a long way towards C/C++/SystemC ! + ! * '
& + ! ! ! * )
hardware agile development, whose ultimate goal is to reduce ' + & * !
Evaluation
( + ! ! '
chip development cycles from years to months or even weeks. High-Level Synthesis ! + & * ' * (
Evaluation
of combining GNN and RL to solve EDA problems. Experimental
results demonstrate the promises of infusing intelligence into Technology Mapping Optimization
I. I NTRODUCTION
Moore’s law [1], [2] has been powering the integrated circuit signs (e.g., layout/bitstream), including high-level synthesis
revolutions since 1960s, which doubles the transistor density (HLS), logic synthesis, and physical implementation. Notably,
every 18 months. Even if the target cadence of Moore’s law every design stage associates with evaluation phases that
is slipping [3], the electronic industry continues to move should accurately assess the quality of results (QoRs) of
to larger-scale, more complex, and heterogeneous designs circuit designs to guide design explorations and optimization
and systems, to keep pace with the exponentially growing phases that should sufficiently explore design knobs to meet
compute demand of different applications [4]. However, with specified performance requirements. Traditional EDA tools
the increasing complexity in hardware designs, from the time- usually provide either accurate yet time-consuming or fast
to-market aspect, near 70% of application-specific integrated yet inaccurate QoR estimations [9]–[12], and extensive man-
circuit (ASIC) or field-programmable gate array (FPGA) ual efforts are required for design space exploration (DSE)
projects are completed behind schedule in 2020 [5]; from the to satisfy diverse performance, resource, and power targets.
cost aspect, the development costs of leading-edge electronic These all result in long time-to-market, which is further
designs are skyrocketing [6]; from the tool aspect, existing exacerbated by the explosion of modern hardware system
electronic design automation (EDA) tools cannot adequately complexity and technology scaling. Given the avidity toward
address emerging hardware development [7]. These all herald hardware agile development and productivity boost, it is highly
the necessity of hardware agile development, with the ultimate expected to infuse more intelligence into EDA tools to enable
goal to reduce chip development cycles from years to months fast and accurate QoR evaluation and efficient optimization,
or even weeks. One example is the Intelligent Design of either independently or integrally, so that design iterations are
Electronic Assets (IDEA) program [8], aiming to accelerate conspicuously sped up to improve development efficiency.
development cycle of next-generation electronic systems with Recent years have witnessed the emergence of machine
reduced labors, costs, and design complexity barriers. learning (ML) applied for computer architecture and sys-
Hardware development is an iterative process involving tems [13], revealing the great potentials of ML-based perfor-
many optimization-evaluation iterations. Fig. 1 depicts the mance modeling and ML-assisted design optimization. In this
typical design flow from behavioral programs to circuit de- paper, we investigate how ML techniques can be embraced
208
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
&' % % &'"%! &'"%
& ' & ' &" !#$ ! '
% % $ $ % % *
% & & !
% &
(
( )
( )
)
) & &
! "
' # ##(
) & & ) & &
) & &
Fig. 2. Our three proposed approaches: (a) off-the-shelf approach, which makes predictions at the earliest stage based on IR graphs; (b) knowledge-infused
approach, which breaks the prediction task into two steps, node-level resource type classification and graph-level resource usage and timing regression, striving
a balance between timeliness (i.e., making predictions at the earliest stage with IR graphs) and accuracy (i.e., using self-inferred domain-specific information);
(c) knowledge-rich approach, which needs to obtain auxiliary information after partial execution of HLS, producing accurate but relatively late predictions.
of memory-related nodes will greatly benefit FF predictions. DSP 26.07% 45.61% 40.89% 32.90% 40.06% 21.95% 15.20%
LUT 871.56% 66.23% 30.91% 24.08% 56.34% 21.45% 16.96%
As LUTs are involved in the entire graph (as compute units and FF 322.86% 101.20% 38.75% 27.72% 47.65% 20.10% 17.42%
glue logic to circuit components), graph-level understanding CP 32.09% 8.13% 5.35% 5.83% 8.68% 4.80% 3.97%
209
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
required hardware or the critical path delay by sequences of Example flow Example design
logic transformations, referred to as logic synthesis flows.
different designs and flow lengths is a necessity. (b) Circuit as a graph and synthesis flow as a sequence.
B. Hybrid Graph Models using Spatio-Temporal Information Fig. 3. The proposed approach to predicting QoR after applying logic
synthesis flows on hardware designs. (a) GNN-S: the proposed GNN with
To address the aforementioned issues, we present a fast, ac- supernode. (b) GNN-H: the proposed hybrid GNN with LSTM.
curate, and generalizable ML approach for QoR estimations of TABLE III
logic synthesis flows, exploiting spatio-temporal information, C OMPARISON WITH LSTM [35] IN THE INDUCTIVE SCENARIO .
namely LOSTIN [14]. Two models are explored: 1 a GNN
for spatial information learning, armed with a supernode to Area (MAPE) Delay (MAPE)
LSTM GNN-S GNN-H LSTM GNN-S GNN-H
encode temporal information (denoted as GNN-S); 2 a hybrid
multiplier 57.82% 9.39% 2.45% 38.21% 17.89% 1.75%
model, composed of a GNN for spatial learning and an LSTM sin 66.09% 64.48% 2.34% 45.94% 54.44% 2.32%
for temporal learning (denoted as GNN-H). sqrt 29.03% 39.25% 4.83% 38.03% 15.75% 2.09%
square 38.59% 13.96% 2.86% 47.52% 31.34% 2.41%
GNN-S: GNN with supernode. Inspired by the idea that voter 27.38% 76.49% 3.08% 42.19% 46.54% 0.96%
introducing a supernode to graphs can collect and redistribute MEAN 43.78% 40.71% 3.11% 42.38% 33.20% 1.91%
global information [37], we propose to leverage a supernode
to represent synthesis flows. As the supernode is virtually
connected to all the nodes in the original graph, temporal in- GNN-S vs. GNN-H. We compare GNN-S and GNN-H
formation is directly injected into the circuit graph (Fig. 3(a)). regarding temporal information characterization. In GNN-S,
GNN-H: GNN with LSTM. Since synthesis flows are first, the supernode embedding is insensitive to the order
naturally represented in sequences, an alternative is to leverage of logic transformations; second, with message passing, the
a sequence processing model to distill the temporal informa- original temporal information in the supernode is gradually
tion. The specific model employed is LSTM, which excels faded in other nodes; third, simply adding a supernode to
at handling order dependence and variable-length flows. As original graphs may not be an efficient approach to fusing
shown in Fig. 3(b), we separately generate a sequence embed- information from different modalities. By contrast, GNN-H
ding for synthesis flow representation and a graph embedding takes advantages of both GNN and LSTM to extract spatio-
for circuit representation, which are then concatenated for temporal information in a decoupled manner: the LSTM
downstream predictions. This scheme not only significantly directly characterizes temporal information from synthesis
reduces the training complexity and memory overheads but is flows; the GNN focuses on representing spatial structures of
more efficient to fuse each source of input information. circuit designs. These separately learned embeddings have
better expressiveness for each source of input information, thus
C. Promises with Multi-Modal Graph Learning providing a better foundation for downstream tasks.
We select circuit designs from the EPFL benchmark [38]. Multi-modal graph representation learning. Graph rep-
The logic synthesis flows are generated by the logic synthesis resentation learning has evolved from single-modal to multi-
tool ABC [39]. Table III shows MAPE of QoR predictions modal [40], which inspires the adoption of multi-model graph
on designs unseen during training. 1 The LSTM-based learning for circuit quality evaluation, since the final QoR of
model [35] suffers from a large accuracy degradation for circuit designs is dependent on both circuit structures and logic
unseen designs, indicating limited generalization capability. synthesis flows. Our investigation with GNN-S and GNN-H
2 GNN-S slightly outperforms the LSTM-based model by shows that efficient approaches to extracting features and fus-
3% and 9% in area and delay prediction, respectively. 3 ing information from different modalities can conspicuously
GNN-H maintains its high prediction accuracy, demonstrating improve representation power. Multi-modal graph represen-
extraordinary generalization capability. tation learning, which integrates the knowledge from other
210
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
learning schemes with the conventional graph representation Applications in User-defined resource
C/C++/SystemC constraints (e.g., # of DSPs)
learning, is expected to provide more versatility for EDA tasks. Transformed C/C++
/
optimized directives
with
GPP RLMD
discuss the feasibility of generalizing to additional transfor- RTL (Verilog/VHDL)
mations. First, one of the preprocessing steps for LSTM- IronMan Training
Logic Synthesis
based models, i.e., the tokenization of logic transformations, CT: Code Transformer
generalization capability to new transformations. Layout/Bitstream Actual resource usage, timing, etc.
211
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Pareto solutions found by RLMD, PSO, GA, SA, and ACO on four real-case benchmarks, gemm, kernel 2mm, spmv, and kernel adi, with unchanged
latency (i.e., the number of clock cycles of the synthesized design). The toolbox of RLMD involves AC, PG, either with or without a fine-tuning (FT) step.
Different settings of μ indicate that different importance is assigned to LUT utilization and CP timing during the optimization.
policy gradient (PG) [51], providing the flexibility to choose These promising results show great potentials of applying
a more proper optimization scheme for different cases. RL for DSE in HLS. We briefly explain the reasons for
RLMD fine-tuning. Given a new DFG, the simplest way is the superiority of I RON M AN: 1 the design space grows
to directly apply the pre-trained RLMD for inference. When exponentially with the size of DFGs, different graph topolo-
higher quality solutions are desired, the pre-trained RLMD can gies, and various data precisions; RL agents can explore
be further fine-tuned on a particular DFG. The fine-tuning step design space proactively and learn from past experiences; after
provides the flexibility to balance between a quick solution training, it can generalize to new problems with minimal fine-
with the pre-trained RLMD (which has learned rich knowledge tuning efforts, revealing better scalability and efficacy. 2 by
of resource allocation strategies on other DFGs) and a longer carefully defining reward functions, RL agents can achieve
yet better one for a particular DFG. multi-objective optimization automatically, eliminating manual
efforts to craft useful heuristics. 3 with the help of CT, RLMD
can conduct the fine-grained DSE that are not supported by
C. Promises in Flexible, Fine-grained, and Efficient DSE
any of the existing DSE approaches; with the help of GPP, the
We demonstrate the end-to-end benefits on benchmarks informative state representations not only significantly benefit
from real-world applications in Fig. 5. Obviously, RLMD, the learning process of RLMD but also enable RLMD to better
either with AC or PG method, outperforms GA, SA, PSO, generalize across different DFG topologies.
and ACO by a large margin. In terms of multi-objective V. C HALLENGE AND P ROSPECT
optimization, given DSP usage constraints, the solutions found
In this section, we discuss challenges and prospects of
with μ = 0.9 often consume fewer LUTs, compared with
exploiting ML techniques for EDA problems, which span data,
those found with μ = 0.1. This indicates that RLMD can
algorithms/models, deployment, and long-term targets.
properly balance between LUT usage and CP timing when
different importance is assigned to different metrics, whereas A. Data Collection
the heuristic-based methods cannot explicitly leverage the Data scarcity. In some EDA problems, such as place and route
trade-offs among multiple objectives. in physical synthesis, the simulation is extremely expensive.
212
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
As ML models usually require enough data to learn underlying updating to meet performance expectations. 1 ML models
statistics and make decisions, this gap between small data and can be retrained either at a regular interval or when key
big data often limits the capability of ML-based techniques. performance indicators are below certain thresholds. Retrain-
From the algorithm side, algorithms that can work with small ing models regularly, regardless of their performance, is a
data await to be developed. From the data side, generative more direct way, but it requires a clear understanding of how
methods can be used to generate synthetic data [52]. frequently a model should be updated under its own scenario.
Non-perfect data. Even if some EDA tools produce a lot of The model performance will decline if retraining intervals are
data (such as simulation-based testing), they are not always too spaced out in the interim. Monitoring key performance
properly labeled nor presented in the form suitable to ML indicators relies on a comprehensive panel of measurements
models. Thus, possible alternatives are unsupervised learning, that explicitly demonstrate model drift, whereas this may
semi-supervised learning [53], self-supervised learning [54], introduce additional hardware/software overhead and incorrect
or to combine supervised with unsupervised techniques [55]. selection of measurements often defeats the intention of this
Generalization to out-of-distribution data. Though synthetic method. 2 During the retraining of ML models, there is often
data can help with mitigating the data gap, it is noteworthy that a trade-off between newly collected data and previous data.
data distribution varies between synthetic and real-case data Properly assigning importance of input data would improve
[56], which often causes data drift or concept drift [57]. This retraining efficacy [65].
appeals for incorporating out-of-distribution methods [58].
D. General, Portable, and Agile Hardware Development
B. Model/Algorithm Development
Infusing more intelligence into EDA will make great strides
Multi-level abstraction and optimization. Classical EDA toward the landing of hardware agile development.
methods usually adopt a bottom-up or top-down procedure, General. We envisage an ML-based system-wise and holistic
encouraging ML-based techniques to distill hierarchical struc- framework with a panoramic vision: it should be able to
tures of hardware designs. Potential methods toward multi- leverage information from different levels of hardware designs
level design abstraction and optimization are 1 hierarchical in synergy, to thoroughly characterize the behaviors as well as
RL [59] that has flexible goal specifications and can learn their intrinsically hierarchical abstractions; it should also be
goal-directed behaviors in complex environments with sparse able to make decisions in different granularity, to control and
feedback and 2 multi-agent RL [60] where agents can be fully improve the hardware precisely and comprehensively.
cooperative, fully competitive, or a mix of the two, enabling Portable. The well-designed interfaces between EDA tools
versatility of system optimization. and ML-based techniques are expected to facilitate the porta-
Interpretability. The absence of interpretation regarding bility across different platforms, since ML models can perform
model behaviors and decisions limits wider adoption of ML for well without explicit descriptions of the target domain.
EDA tasks, since these explanations are important to identify Agile. The proliferation of ML-based techniques has more or
and expose potential problems during training and ensure less transformed the EDA workflow. We expect GNNs make
fidelity of models/algorithms. Thus, efforts in interpretable better use of naturally graphical data in the EDA field; we
ML [61], [62] are highly expected to promote production- expect deep RL be a powerful and general-purpose tool for
ready applications of ML for EDA. many EDA optimization problems, especially when the exact
C. Implementation and Deployment Improvement heuristic or objective is obscure; we expect more intelligence
will be infused into next-generation EDA tools, to enhance
Online vs. offline. When deploying ML-based techniques designers’ productivity and to thrive in the community.
for EDA tasks, it is crucial to deliberate design constraints
under different scenarios. 1 ML-based techniques are de- VI. C ONCLUSION
ployed online or during runtime, no matter the training phase
In this paper, we target HLS and logic synthesis, and discuss
is online or offline. Obviously, the model complexity and
1 the power of GNNs for fast, accurate, and generalizable
runtime overheads are strictly limited by specific constraints.
QoR predictions, and 2 the efficacy of RL-enabled flexible
If online learning is further desired, the design constraint will
and automatic design exploration. Standing on current endeav-
be more stringent. One promising approach is to employ semi-
ors, we provide a future vision of challenges and prospects of
online learning models, which have been applied to solve
infusing more intelligence for next-generation EDA.
some classical combinatorial optimization problems, such as
bipartite matching [63] and caching [64]. These models enable R EFERENCES
smooth interpolation between the best possible online and of-
fline training algorithms. 2 ML-based techniques are applied [1] G. E. Moore, “Cramming more components onto integrated circuits,”
Proc. IEEE, 1998.
offline to guide hardware design, and once the designing phase [2] C. A. Mack, “Fifty years of moore’s law,” IEEE Trans. Semicond.
is completed, ML models will not be invoked again. Thus, the Manuf., 2011.
offline applications can tolerate relatively higher overheads. [3] M. M. Waldrop, “The chips are down for moore’s law,” Nature News,
2016.
Model maintenance. In the case of offline training and online [4] OpenAI. (Accessed: 2022-08) Al and compute. [Online]. Available:
deployment, ML models require regular maintenance and https://openai.com/blog/ai-and-compute/
213
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.
[5] H. Foster. (Accessed: 2022-08) The 2020 wilson research [34] C. Yu et al., “Developing synthesis flows without human knowledge,”
group functional verification study. [Online]. Available: in Proc. DAC, 2018.
https://blogs.sw.siemens.com/verificationhorizons/2020/10/27/ [35] C. Yu and W. Zhou, “Decision making in synthesis cross technologies
prologue-the-2020-wilson-research-group-functional-verification-study/ using lstms and transfer learning,” in Proc. MLCAD, 2020.
[6] F. Schirrmeister et al. (Accessed: 2022-08) Next generation [36] Synopsys. (Accessed: 2022-08) Lynx design system. [Online]. Available:
verification for the era of ai/ml and 5g. Design and https://www.synopsys.com/content/dam/synopsys/implementation&
Verification Conference and Exhibition, US (DVCon), 2020. signoff/datasheets/lynx-design-system-ds.pdf
[Online]. Available: https://dvcon-proceedings.org/document/ [37] J. Gilmer et al., “Neural message passing for quantum chemistry,” in
next-generation-verification-for-the-era-of-ai-ml-and-5g/ Proc. ICML, 2017.
[7] M. Rosker. (Accessed: 2022-08) Evolving the electronics [38] L. Amarú et al., “The epfl combinational benchmark suite,” in Proc.
resurgence initiative (eri 2.0). [Online]. Available: https: 24th Int. Workshop on Logic & Synthesis, 2015.
//www.ndia.org/-/media/sites/ndia/divisions/electronics/eri2 ndia [39] R. Brayton and A. Mishchenko, “Abc: an academic industrial-strength
20210421 releaseapproved 34584.ashx verification tool,” in Proc. CAV, 2010.
[8] J. Wilson. (Accessed: 2022-08) Intelligent design of electronic [40] A. Holzinger et al., “Towards multi-modal causability with graph neural
assets (idea). [Online]. Available: https://www.darpa.mil/program/ networks enabling information fusion for explainable ai,” Information
intelligent-design-of-electronic-assets Fusion, 2021.
[9] S. Dai et al., “Fast and accurate estimation of quality of results in high- [41] Cadence, “Genus synthesis solution,” Accessed: 2022-08.
level synthesis with machine learning,” in Proc. FCCM, 2018. [Online]. Available: https://www.cadence.com/en US/home/tools/
[10] E. Ustun et al., “Accurate operation delay prediction for fpga hls using digital-design-and-signoff/synthesis/genus-synthesis-solution.html
graph neural networks,” in Proc. ICCAD, 2020. [42] Z. Hu et al., “Few-shot representation learning for out-of-vocabulary
[11] N. Wu et al., “Ironman: Gnn-assisted design space exploration in high- words,” in Proc. ACL, 2019.
level synthesis via reinforcement learning,” in Proc. GLSVLSI, 2021. [43] J. de Fine Licht et al., “Transformations of high-level synthesis codes
[12] N. Wu et al., “High-level synthesis performance prediction using gnns: for high-performance computing,” IEEE Trans. Parallel Distrib. Syst.,
Benchmarking, modeling, and advancing,” in Proc. DAC, 2022. 2020.
[13] N. Wu and Y. Xie, “A survey of machine learning for computer [44] B. C. Schafer, “Parallel high-level synthesis design space exploration
architecture and systems,” ACM Comput. Surveys, 2022. for behavioral ips of exact latencies,” ACM TODAES, 2017.
[14] N. Wu et al., “Lostin: Logic optimization via spatio-temporal informa- [45] B. C. Schafer et al., “Adaptive simulated annealer for high level synthesis
tion with hybrid graph models,” in Proc. ASAP, 2022. design space exploration,” in Proc. VLSI-DAT, 2009.
[15] N. Wu et al., “Ironman-pro: Multi-objective design space exploration [46] Y. Zhang et al., “A comprehensive survey on particle swarm optimization
in hls via reinforcement learning and graph neural network based algorithm and its applications,” Math. Problems in Eng., 2015.
modeling,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., [47] D. Liu and B. C. Schafer, “Efficient and reliable high-level synthesis
2022. design space explorer for fpgas,” in Proc. FPL, 2016.
[16] J. Zhao et al., “Comba: A comprehensive model-based analysis frame- [48] Q. Sun et al., “Correlated multi-objective multi-fidelity optimization for
work for high level synthesis of real applications,” in Proc. ICCAD, hls directives design,” in Proc. DATE, 2021.
2017. [49] A. Mehrabi et al., “Prospector: synthesizing efficient accelerators via
[17] A. B. Perina et al., “Lina: Timing-constrained high-level synthesis statistical learning,” in Proc. DATE, 2020.
performance estimator for fast dse,” in Proc. ICFPT, 2019. [50] B. C. Schafer and Z. Wang, “High-level synthesis design space explo-
[18] J. Zhao et al., “Performance modeling and directives optimization for ration: Past, present, and future,” IEEE Trans. Comput.-Aided Design
high-level synthesis on fpga,” IEEE Trans. Comput.-Aided Design Integr. Integr. Circuits Syst., 2019.
Circuits Syst., 2019. [51] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
[19] K. O’Neal et al., “Hlspredict: Cross platform performance prediction MIT press, 2018.
for fpga high-level synthesis,” in Proc. ICCAD, 2018. [52] Y. Ding et al., “Generative and multi-phase learning for computer
[20] H. M. Makrani et al., “Pyramid: Machine learning framework to estimate systems optimization,” in Proc. ISCA, 2019.
the optimal timing and resource usage of a high-level synthesis design,” [53] J. E. Van Engelen and H. H. Hoos, “A survey on semi-supervised
in Proc. FPL, 2019. learning,” Mach. Learn., 2020.
[21] H. M. Makrani et al., “Xppe: cross-platform performance estimation [54] D. Hendrycks et al., “Using self-supervised learning can improve model
of hardware accelerators using machine learning,” in Proc. ASP-DAC, robustness and uncertainty,” Proc. NeurIPS, 2019.
2019. [55] M. Alawieh et al., “Efficient hierarchical performance modeling for
[22] H.-Y. Liu and L. P. Carloni, “On learning-based methods for design- integrated circuits via bayesian co-learning,” in Proc. DAC, 2017.
space exploration with high-level synthesis,” in Proc. 50th DAC, 2013. [56] N. Wu et al., “Program-to-circuit: Exploiting gnns for program represen-
[23] P. Meng et al., “Adaptive threshold non-pareto elimination: Re-thinking tation and circuit translation,” arXiv preprint arXiv:2109.06265, 2021.
machine learning for system level design space exploration on fpgas,” [57] A. Tsymbal, “The problem of concept drift: definitions and related
in Proc. DATE, 2016. work,” Computer Science Department, Trinity College Dublin, 2004.
[24] J. Kwon and L. P. Carloni, “Transfer learning for design-space explo- [58] H. Li et al., “Ood-gnn: Out-of-distribution generalized graph neural
ration with high-level synthesis,” in Proc. MLCAD, 2020. network,” IEEE Trans. Knowl. Data Eng., 2022.
[25] D. Koeplinger et al., “Automatic generation of efficient accelerators for [59] T. D. Kulkarni et al., “Hierarchical deep reinforcement learning: inte-
reconfigurable hardware,” in Proc. ISCA, 2016. grating temporal abstraction and intrinsic motivation,” in Proc. NeurIPS,
[26] J. Zhao et al., “Machine learning based routing congestion prediction 2016.
in fpga high-level synthesis,” in Proc. DATE, 2019. [60] K. Zhang et al., “Multi-agent reinforcement learning: A selective
[27] Z. Lin et al., “Hl-pow: A learning-based power modeling framework for overview of theories and algorithms,” Handbook of Reinforcement
high-level synthesis,” in Proc. ASP-DAC, 2020. Learning and Control, 2021.
[28] A. V. Aho et al., Compilers: principles, techniques, & tools. Pearson [61] L. H. Gilpin et al., “Explaining explanations: An overview of inter-
Education India, 2007. pretability of machine learning,” in Proc. DSAA, 2018.
[29] H. Maron et al., “Provably powerful graph networks,” Proc. NeurIPS, [62] D. V. Carvalho et al., “Machine learning interpretability: A survey on
2019. methods and metrics,” Electronics, 2019.
[30] Vitis, Vitis High-Level Synthesis User Guide (UG1399), Accessed: 2022- [63] R. Kumar et al., “Semi-online bipartite matching,” in Proc. 10th Inno-
08, https://docs.xilinx.com/r/en-US/ug1399-vitis-hls. vations in Theor. Comput. Sci. Conf., 2019.
[31] B. Reagen et al., “Machsuite: Benchmarks for accelerator design and [64] R. Kumar et al., “Interleaved caching with access graphs,” in Proc. 14th
customized architectures,” in Proc. IISWC, 2014. ACM-SIAM Symp. on Discrete Algorithms. SIAM, 2020.
[32] Y. Hara et al., “Proposal and quantitative analysis of the chstone [65] J. Byrd and Z. Lipton, “What is the effect of importance weighting in
benchmark program suite for practical c-based high-level synthesis,” deep learning?” in Proc. ICML., 2019.
Journal of Information Processing, 2009.
[33] L.-N. Pouchet and T. Yuki. (2016) Polyhedral benchmark suite. [Online].
Available: http://web.cs.ucla.edu/∼pouchet/software/polybench/
214
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 10,2024 at 09:10:52 UTC from IEEE Xplore. Restrictions apply.