100% found this document useful (2 votes)
28 views

Download complete Domain Specific Computer Architectures for Emerging Applications Machine Learning and Neural Networks 1st Edition Chao Wang ebook PDF file all chapters

The document promotes the ebook 'Domain Specific Computer Architectures for Emerging Applications: Machine Learning and Neural Networks' by Chao Wang, detailing its focus on domain-specific architectures (DSAs) for future computing. It discusses the design methodology of DSAs, their applications in machine learning and neural networks, and includes various suggested related ebooks. The book is aimed at both industry professionals and academics interested in cutting-edge computing technologies.

Uploaded by

schemetiurma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
28 views

Download complete Domain Specific Computer Architectures for Emerging Applications Machine Learning and Neural Networks 1st Edition Chao Wang ebook PDF file all chapters

The document promotes the ebook 'Domain Specific Computer Architectures for Emerging Applications: Machine Learning and Neural Networks' by Chao Wang, detailing its focus on domain-specific architectures (DSAs) for future computing. It discusses the design methodology of DSAs, their applications in machine learning and neural networks, and includes various suggested related ebooks. The book is aimed at both industry professionals and academics interested in cutting-edge computing technologies.

Uploaded by

schemetiurma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Visit https://ebookultra.

com to download the full version and


explore more ebooks

Domain Specific Computer Architectures for


Emerging Applications Machine Learning and
Neural Networks 1st Edition Chao Wang

_____ Click the link below to download _____


https://ebookultra.com/download/domain-specific-
computer-architectures-for-emerging-applications-
machine-learning-and-neural-networks-1st-edition-chao-
wang/

Explore and download more ebooks at ebookultra.com


Here are some suggested products you might be interested in.
Click the link to download

Beginning Mathematica and Wolfram for Data Science


Applications in Data Analysis Machine Learning and Neural
Networks 2nd Edition Jalil Villalobos Alva
https://ebookultra.com/download/beginning-mathematica-and-wolfram-for-
data-science-applications-in-data-analysis-machine-learning-and-
neural-networks-2nd-edition-jalil-villalobos-alva/

Machine analysis with computer applications for mechanical


engineers 1st Edition James Doane

https://ebookultra.com/download/machine-analysis-with-computer-
applications-for-mechanical-engineers-1st-edition-james-doane/

Domain Specific Model Driven Testing 1st Edition Stefan


Baerisch (Auth.)

https://ebookultra.com/download/domain-specific-model-driven-
testing-1st-edition-stefan-baerisch-auth/

Accelerators for Convolutional Neural Networks 1st Edition


Arslan Munir

https://ebookultra.com/download/accelerators-for-convolutional-neural-
networks-1st-edition-arslan-munir/
Strategies and Technologies for Developing Online Computer
Labs for Technology based Courses 1st Edition Lee Chao

https://ebookultra.com/download/strategies-and-technologies-for-
developing-online-computer-labs-for-technology-based-courses-1st-
edition-lee-chao/

Intelligent Sensor Networks The Integration of Sensor


Networks Signal Processing and Machine Learning 1st
Edition Fei Hu (Editor)
https://ebookultra.com/download/intelligent-sensor-networks-the-
integration-of-sensor-networks-signal-processing-and-machine-
learning-1st-edition-fei-hu-editor/

Implementing Domain Specific Languages with Xtext and


Xtend 2nd Edition Lorenzo Bettini

https://ebookultra.com/download/implementing-domain-specific-
languages-with-xtext-and-xtend-2nd-edition-lorenzo-bettini/

Model Driven Domain Analysis and Software Development


Architectures and Functions 1st Edition Janis Osis

https://ebookultra.com/download/model-driven-domain-analysis-and-
software-development-architectures-and-functions-1st-edition-janis-
osis/

Computer Enhanced and Mobile Assisted Language Learning


Emerging Issues and Trends 1st Edition Felicia Zhang

https://ebookultra.com/download/computer-enhanced-and-mobile-assisted-
language-learning-emerging-issues-and-trends-1st-edition-felicia-
zhang/
Domain Specific Computer Architectures for Emerging
Applications Machine Learning and Neural Networks 1st
Edition Chao Wang Digital Instant Download
Author(s): Chao Wang
ISBN(s): 9780429355080, 0429355084
Edition: 1
File Details: PDF, 37.90 MB
Year: 2024
Language: english
Domain‑Specific ­Computer
Architectures for Emerging
Applications
With the end of Moore’s Law, domain‑specific architecture (DSA) has become
a crucial mode of implementing future computing architectures. This book
discusses the system‑level design methodology of DSAs and their applica‑
tions, providing a unified design process that guarantees functionality,
performance, energy efficiency, and real‑time responsiveness for the target
application.
DSAs often start from domain‑specific algorithms or applications, ana‑
lyzing the characteristics of algorithmic applications, such as computation,
memory access, and communication, and proposing the heterogeneous accel‑
erator architecture suitable for that particular application. This book places
particular focus on accelerator hardware platforms and distributed systems
for various novel applications, such as machine learning, data mining, neural
networks, and graph algorithms, and also covers RISC‑V open‑source instruc‑
tion sets. It briefly describes the system design methodology based on DSAs
and presents the latest research results in academia around domain‑specific
acceleration architectures.
Providing cutting‑edge discussion of big data and artificial intelligence
scenarios in contemporary industry and typical DSA applications, this book
appeals to industry professionals as well as academicians researching the
future of computing in these areas.

Dr. Chao Wang is a Professor with the University of Science and Technology
of China, and also the Vice Dean of the School of Software Engineering.
He serves as the Associate Editor of ACM TODAES and IEEE/ACM TCBB.
Dr. Wang was the recipient of ACM China Rising Star Honorable Mention,
and best IP nomination of DATE 2015, Best Paper Candidate of CODES+ISSS
2018. He is a senior member of ACM, senior member of IEEE, and distin‑
guished member of CCF.
Domain‑Specific
Computer Architectures
for Emerging
Applications
Machine Learning and Neural
­Networks

Chao Wang
First edition published 2024
by CRC Press
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

and by CRC Press


2385 NW Executive Center Drive, Suite 320, Boca Raton FL 33431

© 2024 Chao Wang

CRC Press is an imprint of Informa UK Limited

The right of Chao Wang to be identified as author of this work has been asserted in accordance with
sections 77 and 78 of the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form
or by any electronic, mechanical, or other means, now known or hereafter invented, including
photocopying and recording, or in any information storage or retrieval system, without permission
in writing from the publishers.

For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978‑750‑8400. For works that are not available on CCC please contact mpkbookspermissions@
tandf.co.uk

Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.

British Library Cataloguing‑in‑Publication Data


A catalogue record for this book is available from the British Library

ISBN: 978‑0‑367‑37453‑2 (hbk)


ISBN: 978‑1‑032‑76895‑3 (pbk)
ISBN: 978‑0‑429‑35508‑0 (ebk)

DOI: 10.1201/9780429355080

Typeset in Palatino
by codeMantra
Contents

Preface.......................................................................................................................xi

1 Overview of Domain‑Specific Computing................................................1


1.1 Background and Current Status of Domain‑Specific
Computing..............................................................................................1
1.2 Current Domain‑Specific Acceleration Means
and Platforms............................................................................3
1.2.1 Current Acceleration Means....................................................3
1.2.2 Current Domain‑Specific Acceleration
Platforms....................................................................................4
1.3 Metrics to Measure the Effectiveness of
Domain‑Specific Platforms...................................................................5
1.4 Content Structure of This Book...........................................................6

2 Machine Learning Algorithms and Hardware Accelerator


Customization................................................................................................ 10
2.1 Overview of Machine Learning......................................................... 10
2.1.1 Introduction to Machine Learning....................................... 10
2.1.2 Classification of Machine Learning Algorithms................ 11
2.1.3 Machine Learning Algorithm Acceleration Focus............ 19
2.1.4 Commonly Used Public Data Sets for Machine
Learning................................................................................. 23
2.2 Design of FPGA‑Based Machine Learning Gas Pedals.................. 24
2.2.1 Designing Accelerators for Specific Problems.................... 24
2.2.2 Designing Accelerators for Specific Algorithms................ 25
2.2.3 Designing Accelerators for Algorithm Common
Features.................................................................................... 33
2.2.4 Designing a Generic Accelerator Framework Using
Hardware Templates.............................................................. 37
2.3 Conclusion and Outlook.....................................................................42
2.3.1 Conclusion...............................................................................42
2.3.2 Outlook.....................................................................................43
References........................................................................................................44

v
vi Contents

3 Hardware Accelerator Customization for Data Mining


Recommendation Algorithms..................................................................... 46
3.1 Background on Recommendation Algorithms
and Their Hardware Acceleration..................................................... 46
3.2 Introduction to Collaborative Filtering
Recommendation Algorithm.............................................................. 48
3.2.1 Collaborative Filtering Recommendation
Algorithm Based on Neighborhood Model........................ 49
3.2.2 User‑Based Collaborative Filtering
Recommendation Algorithm................................................ 50
3.2.3 Collaborative Filtering Recommendation
Algorithm Based on Item....................................................... 53
3.2.4 SlopeOne Recommendation Algorithm.............................. 56
3.3 Hardware Acceleration Principle and Method................................ 57
3.3.1 Hardware Acceleration Principle......................................... 57
3.3.2 Commonly Used Hardware Acceleration Method............... 59
3.4 Analysis of Collaborative Filtering Recommendation
Algorithms Based on Neighborhood Models..................................63
3.4.1 Analysis of the Training Phase.............................................63
3.4.2 Analysis of Prediction Phase................................................ 66
3.5 Hardware Acceleration System Hierarchy....................................... 67
3.5.1 Training Accelerator Prototype Implementation............... 68
3.5.2 Predicting Accelerator Prototype
Implementation....................................................................... 69
3.5.3 Device Driver Implementation............................................. 69
3.6 Experimental Results and Analysis.................................................. 70
3.6.1 Acceleration Ratio of Training Accelerator......................... 70
3.6.2 Power Efficiency...................................................................... 73
3.6.3 Energy Efficiency.................................................................... 74
3.7 Conclusions........................................................................................... 74
References........................................................................................................ 75

4 Customization and Optimization of Distributed


Computing Systems for Recommendation Algorithms........................ 78
4.1 Application Context of Recommendation Algorithms
in Distributed Systems........................................................................ 78
4.1.1 Recommendation Systems..................................................... 78
4.1.2 Distributed Systems............................................................... 81
4.2 Algorithmic Details............................................................................. 87
4.2.1 Concept of Recommendation Systems................................. 87
4.2.2 Collaborative Filtering Recommendation
Algorithms............................................................................... 89
4.2.3 Content‑Based Recommendation Algorithms....................90
4.2.4 Model‑Based Recommendation Algorithms...................... 91
4.2.5 Evaluation Metrics.................................................................. 92
Contents vii

4.3 Deployment of Recommendation Systems...................................... 93


4.3.1 Overall Framework of Recommendation Systems............... 93
4.3.2 Weighted Hybrid Subsystems............................................... 93
4.3.3 Cross‑Harmonization System............................................. 103
4.4 Conclusions......................................................................................... 106
References...................................................................................................... 107

5 Hardware Customization for Clustering Algorithms......................... 110


5.1 Hardware Customization of Clustering Algorithms.................... 110
5.2 Clustering Algorithm Details.......................................................... 111
5.2.1 K-Means Algorithm.............................................................. 111
5.2.2 K‑Mediod Algorithm............................................................ 112
5.2.3 SLINK Algorithm.................................................................. 114
5.2.4 DBSCAN Algorithm............................................................. 115
5.3 Hardware Deployment/Acceleration
Customization‑Related Work........................................................... 116
5.3.1 Introduction to FPGA Acceleration Technology.............. 116
5.3.2 Functional Division of Hardware and
Software for Accelerated Systems...................................... 117
5.3.3 Introduction to the Framework Structure
of the Accelerator.................................................................. 126
5.4 Chapter Summary............................................................................. 138
References...................................................................................................... 139

6 Hardware Accelerator Customization Techniques


for Graph Algorithms................................................................................. 140
6.1 Graph Algorithms Background....................................................... 140
6.1.1 Traditional Graph Computation Algorithms.................... 140
6.1.2 Graph Neural Network Algorithm.................................... 143
6.2 Graphical Algorithm Model............................................................. 145
6.2.1 Graph Computation Models................................................ 145
6.2.2 Synchronous and Asynchronous Computation
Methods.................................................................................. 148
6.2.3 Introduction to Graph Computation Systems
and Graph Algorithms......................................................... 152
6.2.4 Graph Neural Network Algorithm.................................... 158
6.3 Hardware Deployment/Acceleration
Customization‑Related Work........................................................... 162
6.3.1 Distributed Graph Computer System................................ 162
6.3.2 Stand‑Alone Graphical Computation System................... 163
6.3.3 Graph Computation Accelerator......................................... 165
6.3.4 Graph Neural Network Accelerator................................... 168
6.4 Chapter Summary............................................................................. 172
References...................................................................................................... 172
viii Contents

7 Overview of Hardware Acceleration Methods for Neural


Network Algorithms................................................................................... 177
7.1 Neural Network Algorithms and Their Hardware
Acceleration Background.................................................................. 177
7.1.1 Principles of Neural Network Algorithms........................ 177
7.1.2 Hardware Acceleration Background of Neural
Network Algorithms............................................................ 180
7.2 Architectures of Hardware Accelerators........................................ 180
7.2.1 The ASIC Heterogeneous Accelerators
of Deep Learning.................................................................. 180
7.2.2 The GPU Accelerators of Neural Networks...................... 183
7.2.3 The FPGA Heterogeneous Accelerators
of Neural Networks.............................................................. 184
7.2.4 The Modern Storage Accelerators....................................... 185
7.3 Common Optimization Methods in Hardware
Customization.................................................................................... 187
7.3.1 Optimizing Calculation....................................................... 187
7.3.2 Optimizing Storage.............................................................. 189
7.3.3 Optimizing the Area and Power Consumption............... 191
7.3.4 Programming Framework................................................... 193
7.3.5 The New Method Applied to Neural Networks.............. 195
7.3.6 Applications Using Neural Network................................. 198
7.3.7 Others..................................................................................... 200
7.4 Parallel Programming Models and the Middleware
of Neural Networks........................................................................... 201
7.5 Latest Developments and Conclusions........................................... 203
7.5.1 Summary of the Latest Developments.............................. 203
7.5.2 Conclusions and Outlook.................................................... 204
References...................................................................................................... 204

8 Customization of FPGA‑Based Hardware Accelerators


for Deep Belief Networks.......................................................................... 210
8.1 Background and Significance........................................................... 210
8.2 Deep Belief Networks........................................................................ 211
8.2.1 Introduction to Deep Belief Networks............................... 211
8.2.2 Algorithm Analysis.............................................................. 214
8.3 Hardware Deployment/Acceleration
Customization‑Related Work........................................................... 226
8.3.1 Single FPGA Acceleration System PIE............................... 226
8.3.2 Multi‑FPGA Acceleration System....................................... 239
8.4 Chapter Summary............................................................................. 250
References...................................................................................................... 251
Contents ix

9 FPGA‑Based Hardware Accelerator Customization


for Recurrent Neural Networks................................................................ 253
9.1 Background and Significance........................................................... 253
9.2 Recurrent Neural Networks.............................................................254
9.2.1 Introduction to Recurrent Neural Networks....................254
9.2.2 Algorithm Analysis.............................................................. 259
9.3 Hardware Deployment/Acceleration Customization
Related Work....................................................................................... 267
9.3.1 The Overall Implementation Architecture....................... 267
9.3.2 Matrix‑Vector Multiplication Module................................ 269
9.3.3 Element‑Wise Module.......................................................... 282
9.3.4 Activation Function Module...............................................284
9.4 Chapter Summary............................................................................. 286
References...................................................................................................... 286

10 Hardware Customization/Acceleration Techniques


for Impulse Neural Networks................................................................... 288
10.1 Background of Impulse Neural Network Application................. 288
10.2 Details of the Impulse Neural Network Algorithm..................... 290
10.2.1 Impulse Neuron Model........................................................ 290
10.2.2 Impulse Neural Network Topology................................... 296
10.2.3 Impulse Coding Approach.................................................. 297
10.2.4 Pulsed Neural Network Learning Algorithm.................. 301
10.3 Hardware Deployment/Acceleration
Customization‑Related Work...........................................................304
10.3.1 Impulse Neural Network Number Hybrid
Implementation.....................................................................305
10.3.2 Pure Digital Circuit Implementation of Impulse
Neural Networks..................................................................306
References...................................................................................................... 311

11 Accelerators for Big Data Genome Sequencing.................................... 314


11.1 Background on Group‑Based Sequencing and
Its Hardware Acceleration................................................................ 314
11.1.1 Distributed Systems............................................................. 314
11.1.2 GPU Platforms....................................................................... 315
11.1.3 FPGA Platforms.................................................................... 315
11.1.4 Conclusion............................................................................. 316
11.2 Genome Sequencing Algorithms and Their Hardware
Acceleration Principles...................................................................... 317
11.2.1 Gene Sequencing.................................................................. 317
11.2.2 KMP and BWA...................................................................... 318
11.2.3 The Principle of the Accelerate Base on Hardware.......... 322
x Contents

11.3 The Design of Accelerator................................................................. 324


11.3.1 System Analysis.................................................................... 324
11.3.2 IP Core.................................................................................... 325
11.3.3 The Implement of the Accelerator...................................... 325
11.4 Conclusion........................................................................................... 330
References...................................................................................................... 330

12 RISC‑V Open Source Instruction Set and Architecture...................... 333


12.1 RISC‑V Architecture Principles....................................................... 333
12.1.1 Introduction to RISC‑V......................................................... 333
12.1.2 RISC‑V Instruction Set Features.........................................334
12.1.3 Status of RISC‑V Research................................................... 337
12.2 RISC‑V–Based Accelerator Customization Solutions....................340
12.2.1 Related Work.........................................................................340
12.2.2 RISC‑V Extended Instruction Design................................343
12.2.3 RISC‑V Instruction Set Extensions.....................................344
12.2.4 Examples................................................................................ 350
12.3 Chapter Summary............................................................................. 357
References...................................................................................................... 358

13 Compilation Optimization Methods in the


Customization of Reconfigurable Accelerators..................................... 360
13.1 High‑Level Integrated Tools and Hardware
Customization.................................................................................... 360
13.2 Source Code to RTL Optimization.................................................. 361
13.2.1 High‑Level Synthesis (HLS) Principles............................. 361
13.2.2 Summary................................................................................ 365
13.3 Source Code to Source Code Optimization................................... 366
13.3.1 Merlin Compiler.................................................................... 366
13.3.2 Polyhedral Models................................................................ 367
13.3.3 Summary................................................................................ 370
13.4 Domain Customization Languages and Intermediate
Expressions......................................................................................... 371
13.4.1 Domain Customization Languages................................... 371
13.4.2 Intermediate Expressions.................................................... 375
13.5 Accelerator Template Mapping........................................................ 377
13.5.1 Streaming Architecture Stream Dataflow......................... 377
13.5.2 Fluctuating Arrays Spatial Dataflow................................. 379
13.6 Chapter Summary............................................................................. 386
References...................................................................................................... 387

Index...................................................................................................................... 391
Preface

Domain‑specific computing is one of the most talked concepts in computer


science and engineering in recent years, which refers to the customization
and optimization of computing architectures with dedicated hardware and
software for the needs of specific application domains to achieve higher
performance and energy efficiency compared to general‑purpose comput‑
ing methods. Compared with general‑purpose computing, domain‑specific
computing is more focused on solving domain‑specific problems, and the
design content of the system will be optimized for the characteristics and
needs of the specific application domain to achieve higher performance or
efficiency on specific tasks. The process of customization and optimization
involves custom design on the hardware side, such as dedicated hardware
accelerators for specific algorithms or operations, as well as custom develop‑
ment on the software side, such as optimized algorithms and codes written
for specific domain requirements.
Currently, domain‑specific computing has a wide range of applications in
various fields, from scientific research to industrial production. Typically, in
the field of artificial intelligence (AI) and machine learning, for complex algo‑
rithms such as neural networks, specialized hardware accelerators, such as
graphics processing units and tensor processing units, can be designed to
accelerate the training and inference process; in image and video process‑
ing, domain‑specific computing also plays an important role, and special‑
ized hardware can accelerate image processing, video coding and decoding,
image recognition and other tasks, improving performance and energy effi‑
ciency; in emerging computing scenarios based on AI and image and video
processing, such as autonomous driving and intelligent transportation,
domain‑specific computing can be used for perception, decision‑making,
and control to provide real‑time, high‑performance processing to ensure
vehicle safety and smooth traffic.
Common training materials or textbooks at home and abroad often intro‑
duce more general customized methods and design ideas at the macro
level, and there is a lack of relevant discussions on how to flexibly apply
domain‑specific computing methods in combination with the character‑
istics of different types of actual computing scenarios. Some of the clas‑
sic textbooks related to the field of domain‑specific computing, such as
Domain‑Specific Processors: Systems, Architectures, Modeling, and Simulation
by Jörg Henkel et al., Domain‑Specific Languages by Martin Fowler et al.,
Designing Data‑Intensive Applications: The Big Ideas Behind Reliable, Scalable, and
Maintainable Systems by Martin Kleppmann et al., usually only focus on hard‑
ware design or software design but lack the analysis and discussion of the
co‑optimization of the underlying hardware and the upper software layers

xi
xii Preface

within the same system. Due to the early years of the books, the coverage
of the theories and methods in the books are somewhat different from that
of the algorithms and applications that attract the most attention nowadays,
and the hardware acceleration techniques based on new devices and semi‑
conductor processes are not included. Domain‑specific computing and its
design theory and method are the core technology of manufacturing devel‑
opment and upgrading, especially in the current era of AI in which vari‑
ous types of compute‑intensive and data‑intensive algorithms continue to
emerge, which is of key significance for reducing energy consumption and
hardware costs, and a large number of high‑level professionals are urgently
needed to devote themselves to this field. But at present, the relevant educa‑
tion at domestic universities is relatively weak, and the ability to train practi‑
cal talents is relatively insufficient.
Aiming at the lack of practical aspects of classic textbooks at home and
abroad, combined with our years of teaching and research practice, and by
sorting out the technical methods and development directions of the field,
this book stands on the perspective of computer science and technology and
takes the customized needs of different algorithmic application areas as a
clue to discuss their respective domain‑specific system design issues in dif‑
ferent categories. The book covers mainstream algorithm types including
neural networks, data mining, graph computing, etc., macro theory combined
with specific cases, and takes the construction of domain‑specific accelera‑
tor microstructures and acceleration systems based on field programmable
gate arrays as clues. The system optimization analysis and specific hardware
and software customization methods under different algorithmic scenarios
are discussed in detail in different chapters. Due to space limitations, it is
difficult to cover all the methods and ideas for different application fields
under different hardware platforms. For a more detailed and comprehen‑
sive domain‑specific system customization method for individual applica‑
tions, interested readers can further refer to AI Computing Systems edited by
Chen Yunji et al., and Handbook of Signal Processing Systems edited by Shuvra
S. Bhattacharyya et al.
The publication of this book unites the efforts of many teachers and students
in the Energy Efficient Intelligent Computing Lab, University of Science and
Technology of China. Dr. Lei Gong, Prof. Xuehai Zhou, Prof. Xi Li, Haoran Li,
Haoyu Cai, Yingxue Gao, Yang Yang, Songsong Li, Wenqi Lou, Xuan Wang,
Jiali Wang, Yang Yang, Yangyang Zhao, Haijie Fang, Wenbin Teng, Zheyuan
Zou, Yuxing He, Qiaochu Liang, Jize Pang, Hanyuan Gao, and many other
researchers also took part in the preparation of the manuscript. The material
of this book refers to a large number of relevant textbooks, courseware, and
academic papers at home and abroad, and I would like to express my heartfelt
thanks to the authors of the cited documents and apologize to the authors for
the missing information sources. Due to the author’s limited knowledge, there
must be some improprieties in the book, please readers criticize and correct.
For any comments and suggestions, please contact [email protected].
Preface xiii

The compilation of this book was supported by the National Natural


Science Foundation, the Key Research and Development Program of the
Ministry of Science and Technology, and the Youth Innovation Promotion
Association. At the same time, Mrs. Randi Slack has done a lot of editorial
work for the publication of this book, and I would like to express my sincere
thanks.

Chao Wang
University of Science and Technology of China
1
Overview of Domain‑Specific Computing

1.1 Background and Current Status


of Domain‑Specific Computing
With the increasing amount of massive data information and the widespread
use of applications in the field of data mining, people have entered the era of
big data. The advent of the era of big data brings not only opportunities but
also challenges. Efficiently and stably accessing data information and accel‑
erating the execution of data mining applications have become key prob‑
lems that academia and industry urgently need to solve. In the emerging
field of big data, machine learning, data mining, and artificial intelligence
algorithms, as the core components of next‑generation applications, have
attracted more attention from researchers. Utilizing existing hardware and
software means to carry out the design of a new algorithmic architecture has
become a hot research topic nowadays.
Accelerating new algorithms in the era of big data is very different from
the past. In the era of big data, there are many factors that make more and
more users abandon the original CPU‑based single‑node processing plat‑
form and turn to other platforms and means to accelerate the execution of
data mining/machine learning applications, some of which are as follows: (1)
Massive data: The potential data scale of many application fields is extremely
large, which makes it very impractical for a single machine to process data;
(2) High data dimensionality: In some data mining applications, the number
of features of instance data is large, and machine learning algorithms may
need to segment the data features in order to process these data; (3) Complex
models and algorithms: Some high‑precision machine learning algorithms
usually have a more complex model representation and often require a large
amount of data calculation; (4) Inference time constraints: Some data min‑
ing applications such as speech recognition and visual object detection have
real‑time requirements, making single‑computer processing unable to meet
the needs; and (5) Multi‑level prediction: Some machine learning algorithms
can be represented in the form of multi‑level pipelines, multi‑level classifiers
in the pipeline need to work in parallel, and single‑node CPU processing
platforms often cannot meet this demand.

DOI: 10.1201/9780429355080-1 1
2 Domain-Specific Computer Architectures for Emerging Applications

To this end, Hennessy and Patterson’s Turing Award Lecture in 2017


introduced the concept of domain‑specific architectures, which argues that
domain‑specific computing will bring a new golden age of computer archi‑
tecture. For a long time, computer architecture designers have focused on
general‑purpose computing. However, with the explosion of domain applica‑
tions, there is a need to build a variety of specialized computing devices for
a wide range of novel applications and algorithms to meet the needs of com‑
puting architecture performance, energy efficiency, scalability, and many
other aspects.
There are many different acceleration platforms available on the market
that we can utilize to implement machine learning algorithms that can han‑
dle massive amounts of data as well as achieve high efficiency. Generally,
these acceleration platforms can be summarized into four categories, which
are custom logic circuits (e.g. field programmable gate array (FPGA)/applica‑
tion specific integrated circuit (ASIC)), general‑purpose graphics processing
units (GPGPU), cloud computing platforms, and heterogeneous computing
platforms. These acceleration platforms often exhibit different parallel gran‑
ularity, are suitable for different application scenarios, and can also be com‑
bined to form heterogeneous systems to fully exploit the processing power of
different acceleration devices.
It is not enough for the acceleration platform to rely on the hardware sys‑
tem alone, but it also needs a series of supporting software systems. There are
still many different software systems that are suitable for different accelera‑
tion platforms, such as Hadoop, Spark, DryadLINQ, Pregel, and PowerGraph
for cloud computing platforms, and compute unified device architecture
(CUDA), OpenCL, and OpenACC for GPGPU platforms. These software sys‑
tems take full advantage of the capabilities of the acceleration platform and
are user‑friendly. With these software systems, users only need to write soft‑
ware applications following the corresponding specifications and using the
provided interfaces to obtain a lot of acceleration effect.
Cloud computing platforms and GPGPU are currently the most used
general‑purpose acceleration platforms. FPGA and ASIC are often used to
implement specific accelerators for specific problems to achieve hardware
acceleration. Heterogeneous computing platforms that use central process‑
ing unit (CPU), graphics processing unit (GPU), and FPGA together, such
as Axel, OptiML, and Lime, should theoretically have good acceleration
potential. However, they are still in the research stage due to the difficulty of
implementation and many other problems, and so they have not been widely
utilized. For cloud computing platforms, the current form of its composi‑
tion is mainly a large number of CPU‑based single‑node computing clusters,
which mainly use coarse‑grained task‑level parallelism to accelerate appli‑
cation execution. On the other hand, GPGPU mainly utilizes fine‑grained
data‑level parallelism. FPGA/ASICs mainly utilize fine‑grained data‑level
parallelism and pipelines to accelerate applications. The software systems
of cloud computing platforms mainly include Hadoop and Spark based on
Overview of Domain-Specific Computing 3

the MapReduce programming model, and Pregel and PowerGraph based


on the graph computing programming model. For GPGPU, its software sys‑
tems are single program multiple data (SPMD)‑based CUDA, OpenCL, and
OpenACC. For FPGA and ASIC, there is currently no programming model
and parallel platform suitable for it, and developers need to fully exploit the
acceleration potential for different problems and different algorithms, using
the hardware description language Verilog or VHDL to implement the cor‑
responding hardware structure.
For cloud computing platforms and GPGPU, the current general‑purpose
CPUs are not ideal when dealing with machine learning algorithms because
they are both data‑ and computation‑intensive, and the data communication
overhead of cloud computing platforms consisting of multiple CPUs has also
become a stumbling block to efficiency improvement. GPUs also tend to be
less efficient in processing data that is highly correlated and tends to have
higher power consumption. Therefore, using FPGAs to design accelerator
architectures for machine learning algorithms is a less involved but very
promising research direction.

1.2 Current Domain‑Specific Acceleration Means and Platforms


1.2.1 Current Acceleration Means
From the perspective of researchers, from a large aspect, the current means
of accelerating machine learning algorithms can be roughly divided into
three categories, namely, optimization at the software level, parallelization
of machine learning algorithms, and improvement at the hardware level.
Optimization at the software level mainly includes optimization and
improvement of the machine learning algorithm itself, and optimization and
improvement of the algorithm runtime library environment. Improvement of
the machine learning algorithm itself refers to the proposal of a new math‑
ematical model for a particular algorithm to increase the execution speed of
the algorithm, e.g. the proposal of the sequential minimal optimization (SMO)
method for the support vector machine (SVM) algorithm. Optimization of
the algorithm runtime library environment refers to the further optimization
of the software environment in which the algorithm is running, such as the
runtime library and operating system, to improve the efficiency of executing
machine learning algorithms.
Parallelization of machine learning algorithms is currently the most
common means of acceleration, which mainly parallelizes and distributes
machine learning algorithms so that the algorithms themselves can achieve
task‑level parallelism and data‑level parallelism on a specific hardware
parallel platform. Many machine learning algorithms can be parallelized
4 Domain-Specific Computer Architectures for Emerging Applications

relatively easily and run well on multi‑core, multi‑node hardware platforms.


Currently, the main hardware parallel platforms are cloud computing plat‑
forms and GPGPU platforms.
Improvement at the hardware level mainly refers to improving the exist‑
ing processor architecture for the characteristics of machine learning algo‑
rithms, so that it can execute machine learning algorithms efficiently and
quickly. The current architecture of general‑purpose CPU is not suitable for
dealing with machine learning problems because of the three characteristics
of machine learning algorithms, namely, the combination of data‑ and com‑
putation‑intensiveness, streaming data transfer and iterative computing, and
low branch instructions. Machine learning algorithms are both data‑ and
computation‑intensive, leading to frequent memory access to obtain data and
high‑intensity, large‑scale, and complex operations on data. For the CPU, its
memory access efficiency and computational power are often unable to meet
the requirements of large‑scale machine learning applications. Machine
learning algorithms generally read data and process them sequentially in a
stream, and often have iterative calculations based on the entire data set, that
is, a data processing often needs the entire data set to be processed before
the next calculation can be performed, all of which results in a high ratio of
CPU Cache Miss based on the least recently used (LRU) strategy, making the
entire algorithm execution less efficient. Branch instructions tend to have a
low proportion in machine learning algorithms, which makes the algorithm
relatively straightforward, but it also illustrates the waste of branch predic‑
tion components that take up most of the CPU chip area.

1.2.2 Current Domain‑Specific Acceleration Platforms


At present, the acceleration platforms for machine learning algorithms are
mainly divided into four categories, which are cloud computing platforms,
GPGPU platforms, FPGA/ASIC platforms, and heterogeneous computing
platforms which integrate the characteristics of the above three platforms.
These platforms tend to exhibit different parallelism granularity and are
suitable for different machine learning problems.
The cloud computing platform is currently the most widely popularized,
and the cloud computing platform can be used to distribute data processing
and parallelize machine learning algorithms. Cloud computing platforms
generally consist of a large number of isomorphic single‑node CPU‑based
servers, with multiple nodes cooperating and working together, and often
adopt task‑level parallelism and data‑level parallelism for problems. Cloud
computing platform programming models can be roughly categorized into
two types, namely, the MapReduce-based programming model and the graph
computing‑based programming model. Programs using the MapReduce
programming model can be abstracted into two phases, namely, Map and
Reduce. This model is more suitable for dealing with data with a relatively
low level of dependency, while it is less suitable for data with a high level
Overview of Domain-Specific Computing 5

of dependency. Programs using the graph computing programming model


can be abstracted into a computation based on a graph, where each node of
the graph is computed based on the information of neighboring edges and
nodes. This model is more suitable for situations with a high level of data
interdependence.
GPGPUs are well suited for data‑level parallel processing of data due
to their specialized architecture. GPGPUs are often composed of multiple
streaming multiprocessors (SMs) internally, and each SM consists of multiple
streaming processors (SPs). Multiple SMs share a global memory, and SPs in
each SM share multiple registers and shared memory. Essentially, GPGPU
is equivalent to a multi‑core architecture, and its different levels of memory
devices are not automatically maintained like CPU but are specified by pro‑
grammers, so GPGPU is well able to parallelize problems at the data level.
The proposal and implementation of programming specifications such as
CUDA, OpenCL, and OpenACC make GPU programming simple and fast,
so GPU has become a widely used acceleration parallel platform.
FPGA and ASIC are currently mainly used to design specialized hard‑
ware accelerators for specific algorithms and problems themselves. FPGA
is often an intermediate device used to verify and simulate the designed
accelerator architecture, and when the verification is complete, the spe‑
cialized ASIC accelerator can be customized. FPGA itself can also act as
a specialized acceleration device due to its flexible programmability and
reconfigurability characteristics, and the most suitable reconstruction
for different problems makes FPGA to have great acceleration potential.
However, FPGA and ASIC platforms are not very popular due to factors
such as high design difficulty, and they mainly exist in embedded devices
or applications in specific fields.
Heterogeneous computing platforms make use of a combination of CPU,
GPU, and FPGA, and often adopt the scheme of clusters composed of het‑
erogeneous computing nodes. However, heterogeneous computing plat‑
forms still have the problems of how to make good use of these computing
resources and how to provide users with a concise programming model that
is still immature and in the research stage. Some existing prototypes of het‑
erogeneous computing platforms are Axel, OptiML, and Lime.

1.3 Metrics to Measure the Effectiveness


of Domain‑Specific Platforms
There are many different metrics to measure the acceleration effectiveness of
domain‑specific platforms, and these metrics tend to reflect different aspects
of the acceleration platforms, some of which are speedup, efficiency, scalabil‑
ity, and resource utilization.
6 Domain-Specific Computer Architectures for Emerging Applications

Speedup is the ratio of the time it takes for the serial version of a program
to run to the time it takes for the parallel version of the program to run. It
makes sense to parallelize the program only when the ratio is greater than 1,
and often, the larger the ratio, the higher the acceleration effect on the paral‑
lelization of the program.
Efficiency refers to the ratio of the speedup of the program to the number of
processing units, which often reflects the utilization rate of multiple process‑
ing units; the higher the efficiency, the higher the utilization rate of multiple
processing units.
Scalability describes how the efficiency of a program fluctuates as the
number of processing units increases. Scalability is generally related to effi‑
ciency; the higher the efficiency, the better the scalability of the program, and
vice versa.
Resource utilization is primarily targeted at the use of FPGA platforms
for acceleration. When using FPGA to design accelerator architectures, hard‑
ware resources are often very limited, so hardware resources cannot be
blindly used in design, but we need to find a balance between resource and
performance.

1.4 Content Structure of This Book


This book covers some of the major aspects of domain‑specific computing,
including domain‑specific computing architectures and typical applications.
Specifically, this book aims at the common big data and artificial intelligence
scenarios in the existing industry, analyzes and summarizes some represen‑
tative applications of related scenarios, analyzes the characteristics of appli‑
cations, and proposes a series of domain‑specific accelerator design methods
and specific architectures. Specifically, the content of this book is structured
as follows:

Chapter 2, Machine Learning Algorithms and Hardware Accelerator


Customization, first analyzes some of the common existing machine
learning algorithms and analyzes the acceleration of these algo‑
rithms. Based on the characteristics of existing machine learning
algorithms, this chapter investigates common acceleration methods
and hardware platforms, such as specialized chips, FPGA, GPU,
and distributed systems. Each of the hardware acceleration architec‑
ture frameworks reflects the advantages of corresponding technical
means.
Chapter 3, Hardware Accelerator Customization for Data Mining
Recommendation Algorithms, focuses on the introduction of the
Overview of Domain-Specific Computing 7

common neighborhood‑based collaborative filtering (CF) methods


in recommendation algorithms and designs a specialized hardware
architecture to implement the training accelerator and the predic‑
tion accelerator. The training accelerator supports five similarity
measurements that can be used in the training phases of user‑based
CF and project‑based CF in different phases of computing SlopeOne.
The prediction accelerator supports the accumulation and weighted
average operations of these three algorithms in the prediction phase.
In addition, buses and interconnects between peripherals such as
host CPUs, memory, hardware accelerators, and direct memory
access (DMA) are designed. To facilitate user programmatic calls,
this chapter also describes how to create and encapsulate the Linux
operating system, as well as the user‑layer function call interfaces
for these hardware accelerators and DMA in the operating system
environment.
Chapter 4, Customization and Optimization of Distributed Computing
Systems for Recommendation Algorithms, focuses on the typical fea‑
tures of recommendation algorithms and how to adopt a Spark‑based
distributed system to realize the design of a domain‑specific com‑
puting platform. This chapter includes a basic introduction to CF
recommendation algorithms, content‑based recommendation algo‑
rithms, and model‑based recommendation algorithms, as well as a
general deployment of distributed recommendation systems, includ‑
ing an introduction to key technologies such as weight blend and
cross‑blending.
Chapter 5, Hardware Customization for Clustering Algorithms, mainly
focuses on the analysis of the features of the existing typical cluster‑
ing algorithms, including K‑Means, K‑medoid, SLINK, and DBSCAN,
to portray the basic algorithmic process and features, and to form the
relevant groundwork for hardware deployments and accelerator cus‑
tomization, such as software and hardware function division, soft‑
ware and hardware collaborative design process, and code locality
analysis. This is used as the basis for the hardware implementation
of hardware acceleration systems, execution abstraction, and code
mapping for clustering algorithms.
Chapter 6, Hardware Accelerator Customization Techniques for Graph
Algorithms, mainly focuses on traditional graph computing algo‑
rithms and new graph neural network algorithms. This chapter
analyzes the model implementation of graph computing systems
and algorithms, examines the commonly used algorithms in graph
computing such as PageRank, BFS, and WCC, and introduces the
mechanisms of graph convolution and graph attention. On this
basis, in terms of hardware deployment, distributed graph computer
systems, stand‑alone graph computing systems, graph computing
8 Domain-Specific Computer Architectures for Emerging Applications

accelerators, and graph neural network accelerators are introduced,


and typical hardware acceleration systems, such as GPU, ASIC, and
FPGA, are especially analyzed.
Chapter 7, Overview of Hardware Acceleration Methods for Neural
Network Algorithms, focuses on different neural network accelera‑
tion methods, including ASIC, GPU, FPGA, modern memory, and
parallel programming models and middleware for neural networks.
Chapter 8, Customization of FPGA‑Based Hardware Accelerators for
Deep Belief Networks, mainly summarizes the papers published on
neural network accelerator in the field of EDA in recent years, and
then classifies and analyzes the key techniques in each paper. For
example, typical optimization goals include computation, storage,
and performance power optimization. Typical techniques include
pruning, weight compression, data sharing, data parallelism, and
approximate computing. Finally, some new research hotspots and
development trends of neural networks are given.
Chapter 9, FPGA‑Based Hardware Accelerator Customization for
Recurrent Neural Networks, begins with an introduction to deep
belief networks, an analysis of the restricted Boltzmann machine,
and common computational memory access pipeline techniques.
On this basis, a hardware‑customized acceleration platform for deep
belief networks is introduced, and the basic implementation of the
acceleration system framework, inner product computing, parallel
processing, and pipeline mechanism are discussed. Based on this,
various optimization methods are described, including storage opti‑
mization, structure reuse, and multi‑platform parallel acceleration.
Chapter 10, Hardware Customization/Acceleration Techniques for
Impulse Neural Networks, first introduces the basic principles of
impulse neural networks; analyzes the impulse neuron model,
the HH neuron model, the LIF neuron model, and the SRM neu‑
ron model; and describes the topology and algorithms of impulse
neural network. On this basis, a hardware deployment/acceleration
customization for impulse neural networks is introduced; digital–
analog hybrid implementation and pure digital circuit implementa‑
tion are described; and the existing new accelerators are reviewed
and analyzed.
Chapter 11, Accelerators for Big Data Genome Sequencing, first ana‑
lyzes Knuth–Morris–Pratt (KMP) and Burrows–Wheeler Alignment
(BWA) algorithms and describes the typical operators of the algo‑
rithm process. On this basis, the principles of hardware accelera‑
tion are introduced, the specific design and implementation of the
accelerators are described, and methods of building a software and
hardware co‑design framework and specific task mapping schemes
are introduced.
Overview of Domain-Specific Computing 9

Chapter 12, RISC‑V Open Source Instruction Set and Architecture,


begins with an introduction to the reduced instruction set com‑
puter‑V (RISC‑V) architecture, analyzing in detail the characteristics
of the RISC‑V architecture in comparison to traditional instruc‑
tion set architectures. Then, the current research status of RISC‑V
in industry and academia is investigated and summarized, and the
extended instruction sets and processors based on RSIC‑V are intro‑
duced. On this basis, this chapter implements a convolutional neural
network (CNN) extension based on the RISC‑V instruction set and
introduces the overall architecture, unit design, performance evalu‑
ation, and other aspects.
Chapter 13, Compilation Optimization Methods in the Customization
of Reconfigurable Accelerators, first introduces the high‑level com‑
prehensive tools from source code to register transfer level code;
then introduces the key techniques such as source code to source
code optimization mechanism, domain‑customized language, inter‑
mediate expression, and accelerator template mapping; and makes a
more comprehensive summary and analysis of hardware accelerator
customization optimization from the compilation perspective.
2
Machine Learning Algorithms and
Hardware Accelerator Customization

2.1 Overview of Machine Learning


2.1.1 Introduction to Machine Learning
Machine learning is concerned with using data to construct appropriate
predictive models to make predictions about unknown data. In general, the
main task of machine learning is to select a function f from a certain class
of function models and learn it based on a data set so that the function can
accurately map the input domain X to the output domain Y, which is f: X→Y.
The input domain X often represents a collection of multiple sets of data, and
the output domain Y represents the identity or result corresponding to each
set of data.
Depending on the data used for learning, machine learning algorithms
can be categorized into two types: supervised learning and unsupervised
learning. In supervised learning algorithms, each set of training data in the
data set used for training has a clear identification or outcome, and the algo‑
rithm uses the training data to construct a function f which will be used
to make predictions on data with unknown identification or outcome. In
unsupervised learning, the identity or outcome of an existing set of input
data is often unknown, and most unsupervised learning algorithms assume
that the data is subject to some kind of joint probability distribution, and the
algorithms use this assumption to find the function f that best fits the input
training data. Supervised learning is mainly of two types, namely, classifica‑
tion and regression. In classification, the output domain Y of the function f
consists of a set of discrete values, and in regression, the output domain Y of
the function f is a continuous real value, and the function f is a continuous
real value. Unsupervised learning focuses on data clustering, which is the
process of grouping data from an unclassified data set into classes according
to attributes such as distance and similarity.

10 DOI: 10.1201/9780429355080-2
Machine Learning Algorithms and Hardware Accelerator Customization 11

Both supervised and unsupervised learning share the distinction between


learning and inference. Learning refers to the process of determining a pre‑
diction function f, while inference refers to the process of calculating f(x) based
on a certain instance x on X. Therefore, for machine learning algorithms, we
can choose whether to accelerate the learning process or the inference pro‑
cess or both in parallel, depending on the specific application scenario.
In addition, according to the characteristics of the algorithm itself,
machine learning algorithms can also be divided into two forms of batch
learning and online learning. Batch learning is the traditional sense of
learning, both first given a training set, training f, and then f used in the
test data. Online learning is different from the traditional learning, and it
is a kind of learning while predicting the process of the data, so the online
learning often has real‑time requirements. The online learning algorithms
often appear more meaningful when accelerated than the batch learning
algorithms.

2.1.2 Classification of Machine Learning Algorithms


According to the similarity of the presentation and implementation of
machine learning algorithms, we can categorize the algorithms such as
Bayesian‑based algorithms and neural network‑based algorithms. Of course,
the scope of machine learning is so vast that some algorithms are difficult to
categorize explicitly into a particular class, and for some classifications, algo‑
rithms of the same classification can target different types of problems. Some
researchers [1] classified most of the machine learning algorithms into 12
types, where each type of algorithm tends to have similar models and solu‑
tions, and it is possible to design gas pedals by extracting common features
in a particular class of machine learning algorithms to accelerate a particular
class of machine learning algorithms. These 12 types of machine learning
algorithms are listed in the following.

2.1.2.1 Regression Algorithms


Regression algorithms are a class of algorithms that attempt to explore rela‑
tionships between variables using a measure of error. Regression algorithms
are powerful tools for statistical machine learning. In the field of machine
learning, people talk about regression, sometimes in reference to a class of
problems and sometimes in reference to a class of algorithms, something that
often confuses beginners (Figure 2.1).
Common regression algorithms include ordinary least square, logistic
regression, stepwise regression, multi‑variate adaptive regression splines,
and locally estimated scatterplot smoothing.
12 Domain-Specific Computer Architectures for Emerging Applications

FIGURE 2.1
Using regression algorithms to make predictions about data.

2.1.2.2 Example‑Based Algorithms


Instance‑based algorithms are often used to model decision‑making prob‑
lems, such that a batch of sample data is often selected and then the new
data is compared to the sample data based on certain approximations. In this
way, the best match is found. For this reason, example‑based algorithms are
often referred to as “winner‑take‑all learning” or “memory‑based learning”
(Figure 2.2).
Common example‑based algorithms include k‑nearest neighbor, learning
vector quantization (LVQ), and self‑organizing map (SOM).

2.1.2.3 Regularization Methods


Regularization methods are extensions of other algorithms (usually regres‑
sion algorithms) that adjust the algorithms based on their complexity.
Regularization methods typically reward simple models and penalize com‑
plex algorithms (Figure 2.3).
Common regularization algorithms include ridge regression, least abso‑
lute shrinkage and selection operator, and elastic net.
Machine Learning Algorithms and Hardware Accelerator Customization 13




xj



FIGURE 2.2
Matching an objective function using an instance‑based algorithm.

20

15

10

-5

-10
0 2 4 6 8 10 12 14 16 18 20

FIGURE 2.3
Reducing overfitting with regularization methods.

2.1.2.4 Decision Tree Algorithms


Decision tree algorithms use a tree structure to build a decision model based
on the attributes of the data, and decision tree models are often used to solve
classification and regression problems (Figure 2.4).
14 Domain-Specific Computer Architectures for Emerging Applications

Root Node
Customized Accelera on?

Yes No

Reconfigurable? CPU/GPU‑Based

Yes No

FPGA‑Based ASIC‑Based

FIGURE 2.4
Solving classification and regression problems using decision tree algorithms.

2.5

1.5
x2

0.5

-0.5
0 0.5 1 1.5 2 2.5 3 3.5 4
x1

FIGURE 2.5
Solving classification and regression problems using Bayesian methods.

Common decision tree algorithms include classification and regression


tree (CART), ID3 (Iterative Dichotomiser 3), C4.5, chi‑squared automatic
interaction detection, decision stump, random forest, MARS, and gradient
boosting machine (GBM).

2.1.2.5 Bayesian Approach


Bayesian methods are a class of algorithms based on Bayes’ theorem and are
mainly used to solve classification and regression problems (Figure 2.5).
Machine Learning Algorithms and Hardware Accelerator Customization 15

Support
vectors

Support
vectors

FIGURE 2.6
Solving classification and regression problems using kernel‑based algorithms.

Common algorithms based on the Bayesian approach include the plain


Bayesian algorithm, averaged one‑dependence estimators, and Bayesian
belief network.

2.1.2.6 Kernel‑Based Algorithms


The most famous kernel‑based algorithm is the support vector machine
(SVM). Kernel‑based algorithms map the input data into a higher‑order vec‑
tor space, in which some classification or regression problems can be solved
more easily (Figure 2.6).
Common kernel‑based algorithms include SVM, radial basis function, and
linear discriminate analysis.

2.1.2.7 Clustering Algorithms


Clustering, like regression, is described sometimes as a class of problems
and sometimes as a class of algorithms. Clustering algorithms usually group
input data according to centroids or hierarchies. So clustering algorithms try
to find the intrinsic structure of the data in order to group the data according
to the greatest common denominator (Figure 2.7).
Common clustering algorithms include K‑Means algorithm and expecta‑
tion maximization.

2.1.2.8 Association Rule Learning


Association rule learning identifies useful association rules in large
multi‑variate data sets by finding the rules that best explain the relationships
between data variables (Figure 2.8).
16 Domain-Specific Computer Architectures for Emerging Applications

1.6

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6

FIGURE 2.7
Data categorization using clustering algorithms.

5
  

5
   

4 3
    

2 2
  

1 1 1 1 1
    

FIGURE 2.8
Using association rule learning to extract association rules in a multi‑variate data set.
Machine Learning Algorithms and Hardware Accelerator Customization 17

Common association rule learning algorithms include the Apriori algo‑


rithm and the Eclat algorithm.

2.1.2.9 Artificial Neural Networks


Artificial neural network algorithms mimic biological neural networks
and are a class of pattern matching algorithms. They are usually used to
solve classification and regression problems. Artificial neural networks are
a huge branch of machine learning with hundreds of different algorithms
(Figure 2.9).
Important artificial neural network algorithms include perceptron neural
network, back propagation, Hopfield network, SOM, and LVQ.

2.1.2.10 Deep Learning


Deep learning algorithms are a development of artificial neural networks. It
has won a lot of attention in the recent past, especially since Baidu has also
started to focus on deep learning. With computing power becoming increas‑
ingly cheap, deep learning attempts to build much larger and more complex
neural networks. Many deep learning algorithms are semi‑supervised learn‑
ing algorithms that are used to deal with large data sets where there is a
small amount of unlabeled data (Figure 2.10).

+1 +1
...
...

...

+PRWV.C[GT *KFFGP.C[GT 1WVRWV.C[GT

FIGURE 2.9
Artificial neural network structure.
18 Domain-Specific Computer Architectures for Emerging Applications

%QPXQNWVKQP (WNN[EQPPGEVGF










. +PRWV . . . . . .
Z Z Z Z Z 1WVRWV

FIGURE 2.10
Neural network structure in deep learning.

Common deep learning algorithms include restricted Boltzmann machine,


deep belief networks, convolutional networks, and auto‑encoders.

2.1.2.11 Dimensionality Reduction Algorithms


Like clustering algorithms, dimensionality reduction algorithms attempt
to analyze the intrinsic structure of the data, but dimensionality reduction
algorithms attempt to generalize or explain the data using less information
in an unsupervised learning manner. These types of algorithms can be used
to visualize high‑dimensional data or used to simplify the data for use in
supervised learning (Figure 2.11).
Common dimensionality reduction algorithms include principal compo‑
nent analysis, partial least square regression, Sammon mapping, multi‑dimen‑
sional scaling (MDS), projection tracking, and projection pursuit.

_
<燦
<燧

FIGURE 2.11
Using dimensionality reduction algorithms to analyze the intrinsic structure of data.
Machine Learning Algorithms and Hardware Accelerator Customization 19

2.1.2.12 Integration Algorithms


Integration algorithms use relatively weak learning models that are trained
independently on the same samples and then integrate the results to make an
overall prediction. The main difficulty with integrative algorithms is which
independent weak learning models to integrate and how to integrate the
results. This is a very powerful and popular class of algorithms (Figure 2.12).
Common algorithms include boosting, bootstrapped aggregation (bag‑
ging), AdaBoost, stacked generalization (blending), GBM, and random for‑
est, among others.

2.1.3 Machine Learning Algorithm Acceleration Focus


Accelerating the execution of machine learning algorithms is not simply a
matter of accelerating certain parts of the algorithms, but rather, there are
rules to follow. Since machine learning algorithms are compatible with
data‑intensive and computation‑intensive characteristics, acceleration of

All Data

random subset
random subset random subset random subset

At each node:
choose some small subset of variables at random
find a variable (and a value for that variable) which optimizes the split

FIGURE 2.12
Integration of independent learning models for integrated prediction using integration
algorithms.
20 Domain-Specific Computer Architectures for Emerging Applications

machine learning algorithms can start from accelerating data communica‑


tion and transmission and accelerating computation execution of algorithms.
According to more than 30 papers researched and my own understand‑
ing and summary, we can accelerate machine learning algorithms from four
points, namely, accelerating the computational core of the algorithm, abstract‑
ing the common characteristics of the algorithm, parallelizing machine
learning algorithms, and optimizing the data communication transmis‑
sion of machine learning algorithms. Accelerating the computational core
of the algorithm and parallelizing the machine learning algorithm belong
to the aspect of accelerating the computational execution of the algorithm.
Optimizing the data communication transmission of the algorithm belongs
to the aspect of accelerating the data communication transmission, while the
common features of the abstract algorithm and accelerating the features both
include the two aspects.
These four points of view are not independent of each other, but are closely
related to each other. For example, optimizing data communication trans‑
mission is a special case of abstracting and accelerating common features of
machine learning algorithms; parallelizing machine learning algorithms can
be done for the computational core of the algorithms; the common features of
the abstracted algorithms may be the computational core of the algorithms or
not; and it may not be meaningful to accelerate the execution of the common
features that are not computationally central. Starting from these four points
can help and guide us in accelerating machine learning algorithms.

2.1.3.1 Accelerating the Computational Core of an Algorithm


Regardless of the type of machine learning algorithm, different parts of
the algorithm have different impacts on the execution time of the whole
algorithm.
The computational core of the algorithm (kernel) is the most time‑consum‑
ing part of the algorithm, and accelerating the kernel can significantly reduce
the execution time of the whole algorithm. Therefore, for the kernel, we can
either utilize multiple computing units such as general‑purpose graphics
processor (GPGPU) to perform parallel computation on different data, or use
FPGA to solidify the kernel of the algorithm into multiple computing units
to accelerate the execution.
Paper [2] lists the top three most time‑consuming kernels for 15 machine
learning algorithms, as shown in Table 1.1.
There are many more algorithms for machine learning than those listed
above. Since we cannot enumerate them all, we will focus on summariz‑
ing the computational core of a particular class of algorithms in our future
research work.
Machine Learning Algorithms and Hardware Accelerator Customization 21

TABLE 1.1
Statistics of the Most Time‑Consuming Computational Kernels for Various Machine
Learning Algorithms
Top Three Kernels (%)
Application Kernel 1 (%) Kernel 2 (%) Kernel 3 (%) Sum (%)
k‑Means Distance (68) Clustering (21) minDist (10) 99
Fuzzy K‑Means Clustering (58) Distance (39) fuzzySum (1) 98
BIRCH Distance (54) Variance (22) Redistribution (10) 86
HOP Density (39) Search (30) Gather (23) 92
Naive Bayesian ProbCal (49) Variance (38) dataRead (10) 97
ScalParC Classify (37) giniCalc (36) Compare (24) 97
Apriori Subset (58) dataRead (14) Increment (8) 80
Eclat Intersect (39) addClass (23) invertClass (10) 71
SNP CompScore (68) updateScore (20) familyScore (2) 90
GeneNet CondProb (55) updateScore (31) familyScore (9) 95
SEMPHY bestBrnchLen (59) Expectation (39) IenOpt (1) 99
Rsearch Covariance (90) Histogram (6) dbRead (3) 99
SVM‑RFE quotMatrx (57) quadGrad (38) quotUpdate (2) 97
PLSA pathGridAssgn (51) fillGridCache (34) backPathFind (14) 99
Utility dataRead (46) Subsequence (29) Main (23) 98

2.1.3.2 Common Features of Abstract Algorithms


Many machine learning algorithms show a lot of common features, and
accelerating for these common features can achieve better acceleration
results and show relatively general characteristics. The common features of
machine learning algorithms can be roughly summarized into five points,
including large‑scale linear algebra operations, synchronous/asynchronous
iterative operations, algorithmic addition and multiplication, the use of com‑
monly used excitation functions, and abstraction based on graphical models.
Large‑scale linear algebra operations refer to the fact that most machine
learning algorithms often involve a large number of large‑scale linear
algebra operations, and accelerating the execution of these operations can
improve the performance of the whole algorithm. Paper [3] designed a gas
pedal device to speed up matrix multiplication operations and achieved
good acceleration results on a variety of machine learning algorithms.
Synchronous/asynchronous iterative operations, on the other hand, means
that many machine learning algorithms need to repeatedly perform syn‑
chronous/asynchronous iterations on data in the algorithm, and optimiza‑
tion of the iterative algorithm can significantly improve the performance of
the algorithm. Paper [4] designed an FPGA‑based asynchronous iterative gas
pedal structure, which can be utilized to accelerate the execution of many
machine learning algorithms very well.
22 Domain-Specific Computer Architectures for Emerging Applications

Algorithm multiplication proposed in paper [5] mainly refers to a part of


machine learning algorithms in the learning or reasoning process that is
often expressed in the form of multiplication‑accumulation, and each multi‑
plication corresponding to the degree of dependence on the data is often low,
making it convenient for parallelization.
Commonly used excitation functions, on the other hand, indicate that
many machine learning algorithms use many of the same auxiliary func‑
tions, such as the sigmoid function, in a certain step of their execution, and
speeding up for these commonly used excitation functions can achieve a cer‑
tain speed‑up effect.
The abstraction based on the graph model proposed in paper [6] shows
that the graph computation model can better and effectively deal with those
data mining algorithms that have a high degree of data dependency, and
therefore the process of abstracting data into a graph and then performing
graph‑based vertex computation can be accelerated.
It is important to note that, as mentioned earlier, common features
abstracted from multiple algorithms may or may not belong to some part
of the computational core of those algorithms. Therefore, if the abstracted
feature is a computational core in many algorithms, it makes more sense to
accelerate the execution of this feature; conversely, if the abstracted feature
is just a very common computational step in most algorithms, it makes rela‑
tively little sense to design a gas pedal structure for this feature.

2.1.3.3 Parallelizing Machine Learning Algorithms


Parallelizing machine learning algorithms is currently the most used accel‑
eration means, and task‑level parallelism or data‑level parallelism or a mix‑
ture of both can be used to parallelize most machine learning algorithms.
Essentially, the essence of parallelizing machine learning algorithms is to
parallelize the core computation of the algorithm in order to achieve better
acceleration results.
As mentioned in Section 2.2, the three main platforms for parallelizing
machine learning algorithms are cloud computing platforms, GPGPUs,
and FPGA/Application Specific Integrated Circuit (ASIC) platforms. Cloud
computing platform parallelism mainly utilizes task‑level parallelism and
data‑level parallelism, and the parallel granularity is relatively coarse. For
example, the Map and Reduce processes in the MapReduce model can be
executed in parallel, and the vertices in the graph computation model with
no dependencies can also be executed in parallel. GPGPU platform paral‑
lelism mainly utilizes data‑level parallelism, and the parallel granularity is
relatively fine. FPGA/ASIC platform parallelism mainly utilizes data‑level
parallelism, and the parallel granularity is relatively fine. FPGA/ASIC plat‑
form parallelization is a good choice to achieve the parallelization effect. The
parallelism using FPGA/ASIC platforms mainly depends on the different
Machine Learning Algorithms and Hardware Accelerator Customization 23

architectures of the designed gas pedals, which can utilize both task‑level
parallelism and data‑level parallelism. In addition, pipelining techniques are
often used in gas pedals to increase throughput.

2.1.3.4 Optimizing Data Communication Transmission


Since machine learning algorithms are both computation‑ and data‑inten‑
sive, accelerating the computation‑intensive part of the algorithm alone is
not enough, and the data communication such as access memory of the algo‑
rithm often becomes a bottleneck to improve performance. Optimizing data
communication transmission and access models for machine learning algo‑
rithms is a starting point for accelerating the data‑intensive part of machine
learning algorithms.
The three existing acceleration platforms all face data communication
problems to varying degrees.
For cloud computing platforms, parallel acceleration of certain machine
learning algorithms may be less than ideal, which is often rooted in the huge
overhead of data communication. Cloud computing platforms utilize a dis‑
tributed file system to store data, with point nodes connected via Ethernet. If
an algorithm requires data distributed across multiple nodes, or if the algo‑
rithm needs to access the data more frequently, the data transfer communica‑
tion overhead can be significant.
For a GPGPU, utilizing it to accelerate machine learning algorithms also
requires data transfer considerations. The data needed by the program is often
stored on disk in the node and transferred to the global memory of the GPGPU
via memory, a process that takes up a significant amount of time overhead. In
addition, GPGPUs also have internal memory hierarchies of registers, shared
memory, L1 Cache, etc., so the parallelization of algorithms using GPGPUs
needs to focus on how to utilize these different storage components.
And for FPGAs and ASICs, when designing specialized gas pedals based
on them, they are often faced with the process of transferring data from Host
memory to Device memory. Moreover, FPGAs also have different frequen‑
cies of internal memory devices, so designers need to focus on how to design
the storage part of the gas pedal, such as the intermediate values used in the
iterative computation of the design of the corresponding cache and so on.

2.1.4 Commonly Used Public Data Sets for Machine Learning


Currently, there are many public data sets on the web, which often have high
credibility and are used relatively frequently. Therefore, in the process of
designing the gas pedal, we can consider utilizing these public data sets to
test and compare the gas pedal prototype.
Commonly used public data sets include MNIST, Tsukuba, Maricopa, and
Tweet.
24 Domain-Specific Computer Architectures for Emerging Applications

2.2 Design of FPGA‑Based Machine Learning Gas Pedals


This section is mainly a brief summary of nearly 30 papers previously
researched on the design of gas pedal components for machine learning
algorithms using FPGAs. Overall, these 30 papers address different starting
points of the problem and design different gas pedal solutions, which can be
categorized into four groups according to the kind of problem to be solved,
namely, designing gas pedals for specific problems, designing gas pedals
for specific algorithms, designing gas pedals for common features of algo‑
rithms, and designing general gas pedal frameworks using hardware tem‑
plates. These four categories follow a process from specific to general, and
the design difficulty tends to increase. For the first two categories, design‑
ing gas pedals is more common and less difficult, while for the last two cat‑
egories, especially the last one, the design is more difficult and still in the
research stage and not popularized.
From a research point of view, we should aim at designing a generalized
gas pedal architecture for machine learning algorithms instead of limiting
ourselves to specific application scenarios or machine learning algorithms,
which would be very difficult but indeed very meaningful.

2.2.1 Designing Accelerators for Specific Problems


Designing gas pedals for specific problems using FPGAs is currently the
most widely used area of FPGA‑based gas pedals. Designing gas pedals spe‑
cifically for a particular problem not only fits the needs of the problem well,
but it is also relatively less difficult to design. Designing gas pedals for spe‑
cific problems often accelerates the reasoning process of machine learning
algorithms rather than the learning process.
A case study of designing a gas pedal structure for a specific problem is
listed below. In paper [7], a dedicated gas pedal device is designed using
FPGA to execute the C4.5 decision tree algorithm to accelerate the solution
of the Online Traffic Classification problem. The Online Traffic Classification
problem refers to the problem of determining the number of packets in the
transport stream of a TCP connection/UDP establishment based on the num‑
ber of packets of the TCP connection/UDP establishment. For example, ana‑
lyzing eight packets transmitted by a TCP connection, it can be determined
that the connection was established by QQ.
The gas pedal architecture designed in the paper is shown in the figure
above. As shown in Figure 2.13, the overall gas pedal architecture is divided
into two parts: the discretization module on the left and the classification
module on the right. The discretization module is a process of preprocessing
the input data, while the classification module actually makes classification
decisions on the input data.
Machine Learning Algorithms and Hardware Accelerator Customization 25

Conceptual Design
Discretizer C4.5 Classifier

Discrete Values
Input Discretized
Feature Raw Values ... Feature
vector vector

……
……

2
1

PE
PE

PE

PE
PE

PE
Stage 1

Stage 2

Stage
Stage
…… ……

Stage 1’

Stage 2’
Input Discretized

Dist. RAM
Dist. RAM

Dist. RAM
Dist. RAM
Dist. RAM

Dist. RAM
Feature Feature
vector vector

Hardware Architecture

FIGURE 2.13
FPGA‑based C4.5 algorithm gas pedal architecture designed in paper [7].

The attribute vectors of the data are fed from the left side to the discretiza‑
tion module, and after each level of the discretization processing unit, the
data is discretized corresponding to a particular attribute value. The data
is then fed to the classification module and after each level, the data goes
one level down in the decision tree. A classification unit has all the inter‑
mediate/leaf nodes of this level in the corresponding decision tree stored
in its local memory, and the next level of the classification unit receives the
parameters (data attribute set, intermediate node address) and then finds the
corresponding intermediate node to continue the classification.
The specific gas pedal structure designed in this thesis still has some short‑
comings. For example, for the classification module, each layer of the decision
tree is handled by a PE, which will inevitably lead to an imbalance in com‑
putational resources as the nodes of each layer are different, and thus the gas
pedal component might have some performance bottlenecks when the input
data size is relatively large.

2.2.2 Designing Accelerators for Specific Algorithms


Designing gas pedals for a particular machine learning algorithm using
FPGAs is also a common application area for FPGAs. Gas pedals designed
for specific machine learning algorithms can be applied to a specific problem,
often only needing to configure specific parameters, or some small changes
can be better adapted to the specific problem.
Currently, the research on this piece is not very sufficient, but only inves‑
tigated the five machine learning algorithms of SVM, Apriori, Decision
Tree, K‑Means, and Bayesian graph based on DAG, which are listed in the
following.
26 Domain-Specific Computer Architectures for Emerging Applications

2.2.2.1 SVM Algorithm


The SVM algorithm is one of the most famous kernel‑based machine learn‑
ing algorithms. Most of the current papers mainly focus on the inference
process of the SVM algorithm to design gas pedal devices. In the inference
process of the SVM algorithm, for data that needs to be classified, it needs to
be multiplied with all the support vectors to get the intermediate value, and
then the intermediate value will be sent to the kernel function for process‑
ing to get the final result. Therefore, for the inference process of SVM, we
can choose to accelerate the multiplication and summation part or the kernel
function part, or both.
Paper [8] proposes a gas pedal architecture designed for the inference pro‑
cess of the SVM algorithm. The architecture mainly accelerates the part of
the multiplication and accumulation of the vector to be classified and the
support vector, while the computation of the kernel function is still executed
in the CPU.
The overall gas pedal architecture is shown in Figure 2.14. There are mul‑
tiple vector processor clusters on the FPGA; each vector processor cluster
consists of multiple Vector Processing Elements (VPE) arrays; each VPE
array consists of multiple VPEs; and each VPE is a vector processing unit
that handles dot product operations between two vectors. During the execu‑
tion of the gas pedal device, large‑scale matrices are passed in as streams,
small‑scale matrices are stored on the on‑chip memory, and all VPEs in each
VPE array store a column of the small‑scale matrix. In addition, the gas pedal
device has a more fine‑grained dot product operation, where each vector dot
product operation is divided into multiple chunk dot products, and the size
of this chunk is rationalized so that the data transfer between the FPGA and
the Central Processing Unit (CPU) does not become a bottleneck.

Off-chip Memory for Support / Training Vectors


Staggered Pipeline
(Dense, long latency) Broadcast Stages Staggered
On-chip
Distribution
Bank 1 Bank 2 Bank M From
Memory

Off-Chip
DINB
Memory
(Vectors) VPE VPE VPE
DINA
Array Array Array
INST (N FUs) (N FUs) (N FUs)
Vector Vector Vector
Processor Processor …… Processor Parallel Parallel Parallel
Cluster Cluster Cluster Program SIMD Write Write Write
M from NREGs NREGs NREGs
2 2 Instr
Sequencer
Host

To
Off-Chip
Memory
(Results)
Result Consolidation VPE Array Detail
DINA
DINB

FPGA
1 2 3 N
Bank 1 Bank 2

Off-chip Memory for Results RESULTS

(Kernel column dot-products)


(Less dense, short latency)
TO HOST

FIGURE 2.14
Accelerator architecture for FPGA‑based SVM algorithm reasoning proposed in paper [8].
Machine Learning Algorithms and Hardware Accelerator Customization 27

alpha b
fixed-point floating-point
domain domain
x

log2(D)
dim1
SV
FPGA Board × ADDER TREE
Test Data FPGA MEMORY
dim2 +
test data SV
Mem MEMORY ×
dim3
Bank n Classifier +
SV
PCI-X Hypertile MEMORY ×
Controller Test Data dim4 +
SV
×
>
class Mem Classifier MEMORY
dim5
label Bank n SV Hypertile +
SVs SV
s MEMORY ×

dim6 + +

Test Data SV
MEMORY ×
. . . +
dim7
Mem
Bank n
Classifier
Hypertile
class label
SV
MEMORY ×
+
+ F
FX2FP
KERNEL
PROCESSOR × +
dim8 +

. .
. . .
. . .
SV
MEMORY × +

.
. . .
. . .
>
+ ȐUȑ

. . .
dimD +
SV
MEMORY ×

class

FIGURE 2.15
Improvements made in paper [9] for FPGA‑based SVM inference gas pedals.

The gas pedal architecture designed in paper [8] does not accelerate the
computation of kernel functions and does not support operations on hetero‑
geneous data. In paper [9], these problems are improved and a novel architec‑
ture for cascaded SVM gas pedal is proposed.
Paper [9] proposes an improved structure for the shortcomings of paper
[8], as shown in Figure 2.15. In this gas pedal structure, there are multiple
Classifier Hypertiles acting as PEs of the gas pedal. For certain test data, each
PE handles the operation between the test data and a portion of the support
vectors separately. All the support vectors are stored in the on‑chip memory
of the FPGA, and all the test data are stored in the off‑chip memory of the
FPGA. For each test, data is streamed into multiple PEs. For each Classifier
Hypertile (PE), in essence, it also goes for a multiply‑accumulate operation.
Unlike the previous MAC unit, Hypertile has a finer granularity, and it uses a
multiplier for each attribute that corresponds to the accuracy of the attribute
so that it can handle heterogeneous data better. In addition, the gas pedal
architecture uses specialized computational units to accelerate the computa‑
tion of kernel functions.
In addition to the improvement, paper [9] also proposes a novel struc‑
ture of the SVM gas pedal, namely a cascaded SVM gas pedal, as shown in
Figure 2.16. The so‑called cascade is a pipelined concatenation of multiple
SVM classifiers, each of which may have a different classification model and
different classification capabilities. This is equivalent to using the idea of
Boost algorithm, and both multiple weak classifiers are combined to form
a strong classifier. For a certain level of SVM, if it cannot more accurately
determine the type of the input value, then it will be given to the next level
to deal with. It is logical that the latter classifier should be stronger than the
former.
28 Domain-Specific Computer Architectures for Emerging Applications

FPGA Board Test Data SVs


Common Pool
Mem Memory
Bank n test data
PCI-X SVs
Controller Test Data
unclassified
Low precision High precision
class Mem data
Classifier Classifier
label Bank n
LP HP

lowp class highp class


Test Data label label

Mem
Bank n
FPGA

FIGURE 2.16
FPGA‑based cascaded SVM gas pedal structure designed in paper [9].

The thesis designed a two‑level classifier. The first level of the classifier can
better classify the points farther from the hyperplane, using a simple kernel
function, running faster, while the second level of the classifier can classify
the points at the edge of the hyperplane (the first level of the classifier cannot
judge the point), using a kernel function which may be more complex, run‑
ning a little slower.
The widespread use of SVM algorithms makes accelerating SVM algo‑
rithms seem relatively relevant. More work has been done on the infer‑
ence process of SVM algorithms and it is relatively well‑developed, while
relatively little work has been done on accelerating the learning process of
SVM algorithms. In addition, for the inference process of the SVM algo‑
rithm, there are often preprocessing processes such as orthogonalization
and regularization before data classification, which is often inefficient
and occupies a high proportion of time on the CPU. Therefore, accelerat‑
ing the execution of the preprocessing process is also a scientific research
entry point.

2.2.2.2 Apriori Algorithm


The Apriori algorithm is an important algorithm for dealing with association
analysis. The Apriori algorithm is mainly used to discover the association
connection between things, and it obtains the degree of association by count‑
ing the number of times things appear to each other.
Paper [10] designed a gas pedal structure for the first half of the Apriori
algorithm to accelerate the process of obtaining frequent item sets, as shown
Machine Learning Algorithms and Hardware Accelerator Customization 29

Item In Item Out


Item Item Item
Local Buffer Local Buffer Local Buffer
Controller

Mem (stall) Mem (stall) Mem (stall)

Set Comparator Set Comparator Set Comparator


Stall Out Stall In
Controller Controller Controller
Mode In
Support Counter Support Counter Support Counter

FIGURE 2.17
Structure of FPGA‑based gas pedal for the Apriori algorithm designed in paper [10].

in Figure 2.17. The paper divided the first half of Apriori to calculate the sup‑
port into three parts: Candidates Generate, Candidates Punnig, and Support
Calculation. The gas pedal structure can be reused for all three phases and
shows good acceleration results.
Candidates Generate is used to generate candidate frequent itemsets. If
there are two K‑frequent itemsets (already frequent itemsets) and their first
k − 1 items are the same, then a K + 1 candidate frequent item can be gener‑
ated from these two K‑frequent itemsets. Candidates Punnig is used to do
preprocessing on the K + 1 candidate frequent itemset just generated, both
for the K + 1 candidate frequent itemset and for the K + 1 candidate frequent
itemset. Preprocessing, both for the K + 1 attributes, removes any one attri‑
bute and test the remaining K itemsets in the set of K‑frequent itemsets that
have been generated. If one of the remaining K itemsets is not frequent, then
the newly generated K + 1 itemsets must not be frequent. Support Calculation
is used to do the statistics for the K + 1 candidate frequent itemsets that have
been pre‑checked, calculate the K + 1 candidate frequent itemsets, and calcu‑
late the K + 1 candidate frequent itemsets from these two K‑frequent itemsets.
The Support Calculation part is used to do statistics on the K + 1 candidate
frequent itemsets that have passed the pre‑check, calculating the frequency
of the K + 1 candidate frequent itemsets in the whole data set, and only after a
certain frequency, the K + 1 candidate frequent itemsets can be considered as
frequent, and then it is added to the set of K + 1 frequent itemsets.
The Apriori algorithm is not well‑researched; just read this one. However,
since the Apriori algorithm essentially exhibits a techno‑statistical process,
the use of FPGAs to accelerate the Apriori algorithm should also have a bet‑
ter future. In addition, most Apriori algorithms require that the data should
be pre‑numbered in a dictionary order when processing the data, so the pre‑
processing process of sorting the data in a dictionary order should also be a
potential acceleration point.
30 Domain-Specific Computer Architectures for Emerging Applications

2.2.2.3 Decision Tree Algorithm


The decision tree algorithm is a more general machine learning algorithm
that also has two processes: learning and reasoning. The computational core
of the learning process of the decision tree algorithm is the calculation of the
Gini coefficient (for the C4.5 algorithm) or the entropy gain factor (for the
CART algorithm). There is a lot of research on both the learning and reason‑
ing aspects of decision tree algorithms.
Paper [11] proposes a gas pedal structure for accelerating Gini coefficient
computation for the learning process of the C4.5 decision tree algorithm. The
structure is shown in Figure 2.18. Each successive attribute Gini coefficient
computation can be done by defining its own Gini unit in the FPGA, and then
all the Gini unit results are connected at the hierarchical level by comparing
the components so that the minimum Gini coefficients can be selected.
This paper was published earlier. The advanced accelerator structure for
decision trees should have been greatly improved, and it should be able to
perform the entire decision tree learning process rather than a small part of
it, which can reduce the communication delay between the data. In addition,
most decision tree algorithms tend to require the input data to be discrete
and thus can be accelerated for the preprocessing process of discretizing the
input data.

FIGURE 2.18
FPGA‑based gas pedal structure for accelerated Gini coefficient computation proposed in
paper [11].
Another Random Scribd Document
with Unrelated Content
refinement, no notion of the amenities of social life.—Bickerstaff, The
Maid of the Mill.

Giles (1 syl.), serving-boy to Claud Halcro.—Sir W. Scott, The


Pirate (time William III.).

Giles (1 syl.), warder of the Tower.—Sir W. Scott, Fortunes of Nigel


(time, James I.).

Giles (1 syl.), jailer of Sir Reginald Front de Bœuf.—Sir W. Scott,


Ivanhoe (time, Richard I.).

Giles (Will), apprentice of Gibbie Girder, the cooper at Wolf’s Hope


village.—Sir W. Scott, Bride of Lammermoor (time,f William III.).

Giles, the “farmer’s boy,” “meek, fatherless, and poor,” the hero of
Robert Bloomfield’s principal poem, which is divided into “Spring,”
“Summer,” “Autumn,” and “Winter” (1798).

Giles of Antwerp, Giles Coignet, the painter (1530-1600).

Gillfillan (Habakkuk), called “Gifted Gilfillan,” a Camero´nian


officer and enthusiast.—Sir W. Scott, Waverley (time, George II.).

Gill (Harry), a farmer, who forbade old Goody Blake to carry home
a few sticks, which she had picked up from his land, to light a wee-
bit fire to warm herself by. Old Goody Blake cursed him for his
meanness, saying he should never from that moment cease from
shivering with cold; and, sure enough, from that hour, a-bed or up,
summer or winter, at home or abroad, his teeth went “chatter,
chatter, chatter still.” Clothing was of no use, fires of no avail, for,
spite of all, he muttered, “Poor Harry Gill is very cold.”—Wordsworth,
Goody Blake and Harry Gill (1798).

Gill (Mrs. Peter). Bustling matron with a genius for innovation. She
conducts her household affairs according to sanitary and sanatory
principles; discovers that condiments are pernicious and that beans
are excellent for the complexion; is bent upon a water-cure, and
finds out and invents so many “must bes” and “don’ts” as to ruin the
comfort of husband and children.—Robert B. Roosevelt, Progressive
Petticoats (1874).

Gil´lamore (3 syl.) or Guillamur, king of Ireland, being slain in


battle by Arthur, Ireland was added by the conqueror to his own
dominions.

How Gillamore again to Ireland he pursued ...


And having slain the king, the country waste he laid.
Drayton, Polyolbion, iv. (1612).

Gil´lian, landlady of Don John and Don Frederic.—Beaumont and


Fletcher, The Chances (1620).

Gillian (Dame), tirewoman to Lady Eveline, and wife of Raoul the


huntsman.—Sir W. Scott, The Betrothed (time, Henry II.).

Gilliflowers. A nosegay of these flowers was given by the fairy


Amazo´na to Carpil´lona in her flight. The virtue of this nosegay
was, that so long as the princess had it about her person, those who
knew her before would not recognize her.—Comtesse D’Aunoy, Fairy
Tales (“Princess Carpillona,” 1682).

Gills (Solomon), ship’s instrument maker. A slow, thoughtful old


man, uncle of Walter Gay, who was in the house of Mr. Dombey,
merchant. Gills was very proud of his stock-in-trade, but never
seemed to sell anything.—C. Dickens, Dombey and Son (1846).

Gilpin (John), a linen-draper and train-band captain, living in


London. His wife said to him, “Though we have been married twenty
years, we have taken no holiday;” and at her advice the well-to-do
linen-draper agreed to make a family party, and dine at the Bell, at
Edmonton. Mrs. Gilpin, her sister, and four children went in the
chaise, and Gilpin promised to follow on horseback. As madam had
left the wine behind, Gilpin girded it in two stone bottles to his belt,
and started on his way. The horse, being fresh, began to trot, and
then to gallop; and John, being a bad rider, grasped the mane with
both his hands. On went the horse, off flew John Gilpin’s cloak,
together with his hat and wig. The dogs barked, the children
screamed, the turnpike men (thinking he was riding for a wager)
flung open their gates. He flew through Edmonton, and never
stopped till he reached Ware, when his friend the calender gave him
welcome, and asked him to dismount. Gilpin, however, declined,
saying his wife would be expecting him. So the calender furnished
him with another hat and wig, and Gilpin harked back again, when
similar disasters occurred, till the horse stopped at his house in
London.—W. Cowper, John Gilpin (1786).
⁂ John Gilpin was a Mr. Beyer, of Paternoster Row, who died in
1791, and it was Lady Austin who told the anecdote to the poet. The
marriage adventure of Commodore Trunnion in Peregrine Pickle is a
similar adventure.

Gines de Passamonte, one of the galley-slaves set free by Don


Quixote. Gines had written a history of his life and adventures. After
being liberated, the slaves set upon the knight; they assaulted him
with stones, robbed him and Sancho of everything they valued,
broke to pieces “Mambrino’s helmet,” and then made off with all
possible speed, taking Sancho’s ass with them. After a time the ass
was recovered (pt. I. iv. 3).
“Hark ye, friend,” said the galley-slave, “Gines is my name, and Passamonte the
title of my family.”—Cervantes, Don Quixote, I. iii. 8 (1605).

⁂ This Gines re-appears in pt. II. ii. 7 as “Peter the showman,”


who exhibits the story of “Melisendra and Don Gayferos.” The helmet
also is presented whole and sound at the inn, where it becomes a
matter of dispute whether it is a basin or a helmet.

Gineura, the troth-plight bride of Ariodantês, falsely accused of


infidelity, and doomed to die unless she found within a month a
champion to do battle for her honor. The duke who accused her felt
confident that no champion would appear, but on the day appointed
Ariodantês himself entered the lists. The duke was slain, the lady
vindicated, and the champion became Gineura’s husband.—Arisoto,
Orlando Furioso (1516).
Shakespeare, in Much Ado about Nothing, makes Hero falsely
accused of infidelity, through the malice of Don John, who induces
Margaret (the lady’s attendant) to give Borachio a rendezvous at the
lady’s chamber window. While this was going on, Claudio, the
betrothed lover of Hero, was brought to a spot where he might
witness the scene, and, believing Margaret to be Hero, was so
indignant, that next day at the altar he denounced Hero as unworthy
of his love. Benedict challenged Claudio for slander, but the combat
was prevented by the arrest and confession of Borachio. Don John,
finding his villainy exposed, fled to Messina.
Spencer has introduced a similar story in his Faëry Queen, v. 11
(the tale of “Irena,” q.v.).

Gin´evra, the young Italian bride who, playing hide-and-seek, hid


herself in a large trunk. The lid accidentally fell down, and was held
fast by a spring-lock. Many years afterwards the trunk was sold and
the skeleton discovered.—Rogers, Italy (1792).
T. Haynes Bayley wrote a ballad called The Mistletoe Bough, on
the same tradition. He calls the bridegroom “young Lovell.”
A similar narrative is given by Collet, in his Causes Célèbres.
Marwell Old Hall, once the residence of the Seymours, and
subsequently of the Dacre family, has a similar tradition attached to
it, and “the very chest is now the property of the Rev. J. Haygarth,
rector of Upham.”—Post-Office Directory.
Bramshall, Hampshire, has a similar tale and chest.
The same tale is also told of the great house at Malsanger, near
Basingstoke.

Gingerbread (Giles), the hero of an English nursery tale.


Jack the Giant-killer, Giles Gingerbread, and Tom Thumb will flourish in wide-
spreading and never-ceasing popularity.—Washington Irving.
Ginn or Jân (singular masculine Jinnee, feminine Jinniyeh), a
species of beings created long before Adam. They were formed of
“smokeless fire” or fire of the simoom, and were governed by
monarchs named suleyman, the last of whom was Jân-ibn-Jân or
Gian-ben-Gian, who “built the pyramids of Egypt.” Prophets were
sent to convert them, but on their persistent disobedience, an army
of angels drove them from the earth. Among the Ginn was one
named Aza´zel. When Adam was created, and God commanded the
angels to worship him, Azazel refused, saying, “Why should the
spirits of fire worship a creature made of earth?” Whereupon God
changed him into a devil, and called him Iblis or Eblis (“despair”).
Spelt also Djinn.

Gi´ona, a leader of the anabaptists, once a servant of Comte


d’Oberthal, but discharged from his service for theft. He joined the
rebellion of the anabaptists, but, with the rest of the conspirators,
betrayed the “prophet-king,” John of Leyden, when the emperor
arrived with his army.—Meyerbeer, Le Prophète (1849).

Giovan´ni (Don), a Spanish libertine of the aristocratic class. His


valet, Leporello, says, “He had 700 mistresses in Italy, 800 in
Germany, 91 in France and Turkey, and 1003 in Spain.” When the
measure of his iniquity was full, a legion of foul fiends carried him
off to the devouring gulf.—Mozart’s opera, Don Giovanni (1787).
(The libretto of this opera is by Lorenzo da Ponte).
⁂ The origin of this character was Don Juan Teno´rio, of Seville,
who lived in the fourteenth century. The traditions concerning him
were dramatized by Tirso de Mo´lina; thence passed into Italy and
France. Glück has a musical ballad called Don Juan (1765); Molière,
a comedy on the same subject (1665); and Thomas Corneille
(brother of the Grand Corneille) brought out, in 1673, a comedy on
the same subject, called Le Festin de Pierre, which is the second title
of Molière’s Don Juan. Goldoni, called “The Italian Molière,” has also
a comedy on the same favorite hero.

Gipsey, the favorite greyhound of Charles I.


One evening his [Charles I.] dog scraping at the door, he commanded me [Sir
Philip Warwick] to let in Gipsey.—Memoirs, 329.

Gypsey Ring, a flat gold ring, with stones let into it, at given
distances. So called because the stones were originally Egyptian
pebbles—that is, agate and jasper.
⁂ The tale is, that the gypsies are wanderers because they
refused to shelter the Virgin and Child in their flight into Egypt.—
Aventinus, Annales Boiorum, viii.

Giralda of Seville, called by the Knight of the Mirrors a giantess,


whose body was of brass, and who, without ever shifting her place,
was the most unsteady and changeable female in the world. In fact,
this Giralda was no other than the brazen statue on a steeple in
Seville, serving for a weathercock.
“I fixed the changeable Giralda ... I obliged her to stand still; for during the
space of a whole week no wind blew but from the north.”—Cervantes, Don
Quixote, II. i. 14 (1615).

Girder (Gibbie, i.e. Gilbert), the cooper at Wolf’s Hope village.


Jean Girder, wife of the cooper.—Sir W. Scott, Bride of
Lammermoor (time, William III.).

Girdle (Armi´da’s), a cestus worn by Armi´da, which, like that of


Venus, possessed the magical charm of provoking irresistible love.—
Tasso, Jerusalem Delivered (1575).

Girdle (Flor´imel’s), the prize of a grand tournament, in which Sir


Sat´yrane (3 syl.), Sir Brianor, Sir Sanglier, Sir Artĕgal, Sir Cambel,
Sir Tri´amond, Brit´omart, and others took part. It was accidentally
dropped by Florimel in her flight (bk. iii. 7, 31), picked up by Sir
Satyrane, and employed by him for binding the monster which
frightened Florimel to flight, but afterwards came again into Sir
Satyrane’s possession, when he placed it for safety in a golden
coffer. It was a gorgeous girdle, made by Vulcan for Venus, and
embossed with pearls and precious stones; but its chief merit was
It gave the virtue of chaste love
And wifehood true to all that it did bear;
But whosoever contrary doth prove,
Might not the same about her middle wear,
But it would loose, or else asunder tear.
Spenser, Faëry Queen, iii. 7 (1590).

Girdle (Venus’s), a girdle on which was embroidered the passions,


desires, joys, and pains of love. It was usually called a cestus, which
means “embroidered,” and was worn lower down than the cin
´gulum or matron’s girdle, but higher up than the zone or maiden’s
girdle. It was said to possess the magical power of exciting love.
Homer describes it thus:

In this was every art, and every charm,


To win the wisest, and the coldest warm;
Fond love, the gentle vow, the gay desire,
The kind deceit, the still reviving fire,
Persuasive speech, and more persuasive sighs,
Silence that spoke, and eloquence of eyes.
Pope, Iliad, xiv.

Girdle of Opakka, foresight and prudence.


“The girdle of Opakka, with which Kifri the enchanter is endued, what is it,” said
Shemshelnar, “but foresight and prudence—the best ‘girdle’ for the sultans of the
earth?”—Sir G. Morell [i.e. J. Ridley], Tales of the Genii (“History of Mahoud,” tale
vii., 1751).

Girdles, impressed with mystical characters, were bound with


certain ceremonies round women in gestation, to accelerate the birth
and alleviate the pains of labor. It was a Druid custom, observed by
the Gaels, and continued in practice till quite modern times.
Aldo offered to give Erragon “a hundred steeds, children of the rein; a hundred
hawks with fluttering wing, ... and a hundred girdles to bind high-bosomed maids,
friends of the births of heroes.”—Ossian, The Battle of Lora.
Girnington (The laird of), previously Frank Hayston, laird of
Bucklaw, the bridegroom of Lucy Ashton. He is found wounded by
his bride on the wedding night, recovers and leaves the country; but
the bride goes mad and dies.—Sir W. Scott, Bride of Lammermoor
(time, William III.).

Giulia (Donna), suspected wife of Don Alonzo in Richard


Mansfield’s play Don Juan. She becomes the fast friend of the
youthful lovers, although forced by her husband’s brutality to decoy
Juan into the trap set for him by Alonzo (1891).

Gjallar, Heimdall’s horn, which he blows to give the gods notice


when any one approaches the bridge Bifröst.—Scandinavian
Mythology.

Gladiator (The Dying). This famous statue, found at Nettuno (the


ancient Antium), was the work of Agasĭas, a sculptor of Ephesus.

Glads´moor (Mr.), almoner of the earl of Glenallan, at Glenallan


House.—Sir W. Scott, The Antiquary (time, George III.).

Glamorgan, according to British fable, is gla or glyn Morgan


(valley or glen of Morgan). Cundah´ and Morgan (says Spenser)
were sons of Goneril and Regan, the two elder daughters of King
Leyr. Cundah chased Morgan into Wales, and slew him in the glen
which perpetuates his name.

Then gan the bloody brethren both to raine:


But fierce Cundah gan shortly to envy
His brother Morgan ...
Raisd warre, and him in batteill overthrew;
Whence as he to those woody hilles did fly,
Which hight of him Gla-morgan, there him slew.
Spenser, Faëry Queen, ii. 10, 33 (1590).

This is not quite in accordance with Geoffrey’s account:


Some restless spirits ... inspired Margan with vain conceits, ... who marched
with an army through Cunedagius’s country, and began to burn all before him; but
he was met by Cunedagius, with all his forces, who attacked Margan ... and,
putting him to flight, ... killed him in a town of Kambria, which since his death has
been called Margan to this day.—British History, ii. 15 (1142).

Glasgow (The Bishop of).—Sir W. Scott, Castle Dangerous, xix.


(time, Henry I.).

Glasgow Arms, an oak tree with a bird above it, and a bell
hanging from one of the branches; at the foot of the tree a salmon
with a ring in its mouth. The legend is that St. Kentigern built the
city and hung a bell in an oak tree to summon the men to work. This
accounts for the “oak and bell.” Now for the rest: A Scottish queen,
having formed an illicit attachment to a soldier, presented her
paramour with a ring, the gift of her royal husband. This coming to
the knowledge of the king, he contrived to abstract it from the
soldier while he was asleep, threw it into the Clyde, and then asked
his queen to show it him. The queen, in great alarm, ran to St.
Kentigern, and confessed her crime. The father confessor went to
the Clyde, drew out a salmon with the ring in its mouth, handed it to
the queen, and by this means both prevented a scandal and
reformed the repentant lady.
A similar legend is told of Dame Rebecca Berry, wife of Thomas
Elton of Stratford Bow, and relict of Sir John Berry, 1696. She is the
heroine of the ballad called The Cruel Knight. The story runs thus: A
knight, passing by a cottage, heard the cries of a woman in labor. By
his knowledge of the occult sciences, he knew that the infant was
doomed to be his future wife; but he determined to elude his
destiny. When the child was of a marriageable age, he took her to
the seaside, intending to drown her, but relented, and, throwing a
ring into the sea, commanded her never to see his face again, upon
pain of death, till she brought back that ring with her. The damsel
now went as cook to a noble family, and one day, as she was
preparing a cod-fish for dinner, she found the ring in the fish, took it
to the knight, and thus became the bride of Sir John Berry. The
Berry arms show a fish, and in the dexter chief a ring.
Glass (Mrs.), a tobacconist, in London, who befriended Jeanie
Deans while she sojourned in town, whither she had come to crave
pardon from the queen for Effie Deans, her half-sister, lying under
sentence of death for the murder of her infant born before wedlock.
—Sir W. Scott, Heart of Midlothian (time, George II.).

Glass Armor. When Cherry went to encounter the dragon that


guarded the singing apple, he arrayed himself in glass armor, which
reflected objects like a mirror. Consequently, when the monster
came against him, seeing its reflection in every part of the armor, it
fancied hundreds of dragons were coming against it, and ran away
in alarm into a cave, which Cherry instantly closed up, and thus
became master of the situation.—Comtesse D’Aunoy, Fairy Tales
(“Princess Fairstar,” 1682).

Glasse (Mrs.), author of a cookery-book immortalized by the


saying, “First catch [skin] your hair, then cook it.” Mrs. Glasse is the
nom de plume of Dr. John Hill (1716-1775).

Glas´tonbury, in Arthurian romance, was the burial place of King


Arthur. Selden, in his Illustrations of Drayton, gives an account of
Arthur’s tomb “betwixt two pillars,” and says that “Henry II. gave
command to Henry de Bois (then abbot of Glastonbury) to make
great search for the body of the British king, which was found in a
wooden coffin some 16 foote deepe, and afterwards they found a
stone on whose lower side was fixed a leaden cross with the name
inscribed.”
Glastonbury Thorn. The legend is that Joseph of Arimatheēa stuck
his staff into the ground in “the sacred isle of Glastonbury,” and that
this thorn blossoms “on Christmas Day” every year. St. Joseph was
buried at Glastonbury.

Not great Arthur’s tomb, nor holy Joseph’s grave,


From sacrilege had power their sacred bones to save ...
[Here] trees in winter bloom and bear their summer’s green.
Drayton, Polyolbion, iii. (1612).
Glatisant, the questing beast. It had the head of a serpent, the
body of a libbard, buttocks of a lion, foot of a hart, and in its body
“there was a noise like that of thirty couple of hounds questing” (i.e.
in full cry). Sir Palomi´dês the Saracen was forever following this
beast.—Sir T. Malory, History of Prince Arthur, ii. 52, 53, 149 (1470).

Glau´ce (2 syl.), nurse of the Princess Brit´omart. She tried by


charms to “undo” her lady’s love for Sir Artegal, “but love that is in
gentle heart begun, no idle charm can remove.” Finding her sorcery
useless, she took the princess to consult Merlin, and Merlin told her
that by marrying Artegal she would found a race of kings from which
would arise “a royal virgin that shall shake the power of Spain.” The
two now started in quest of the knight, but in time got separated.
Glaucê became “the squire” of Sir Scu´damore, but re-appears (bk.
iii. 12) after the combat between Britomart and Artegal, reconciles
the combatants, and the princess consents “to be the love of
Artegal, and to take him for her lord” (bk. iv. 5, 6).—Spenser, Faëry
Queen (1590, 1596).

Glaucus, accomplished young Athenian, whose house in Pompeii


is a marvel of beauty and taste. He loves Ione, and is beloved by
Nydia, the blind flower-girl. He is rescued from a terrible fate in the
ampitheatre by the eruption of Vesuvius, escapes from the city,
guided by Nydia, and weds Ione.—E. L. Bulwer, Last Days of Pompeii
(1834).

Glaucus, a fisherman of Boæ´tia. He observed that all the fish


which he laid on the grass received fresh vigor, and immediately
leaped into the sea. This grass had been planted by Kronos, and
when Glaucus tasted it, he also leaped into the sea, and became a
prophetic marine deity. Once a year he visited all the coasts of
Greece, to utter his predictions. Glaucus is the sailors’ patron deity.
[By] old soothsaying Glaucus’ spell.
Milton, Comus, 874 (1634).
As Glaucus, when he tasted of the herb
That made him peer among the ocean gods.
Dante, Paradise, i. (1311).

Glaucus, son of Hippolytus. Being smothered in a tub of honey, he


was restored to life by [a] dragon given him by Escula´pios
(probably a medicine so called.)—Apollodorus, Bibliotheca, 23.

Glaucus, of Chios, inventor of the art of soldering metal.


Pausanias, Itinerary of Greece.
A second Glaucus, one who ruins himself by horses. This refers to
Glaucus, son of Sis´yphos, who was killed by his horses. Some say
he was trampled to death by them, and some that he was eaten by
them.
Glauci et Diomēdis permutatio, a very foolish exchange. Homer
(Iliad, vi.) tells us that Glaucus changed his golden armor for the
iron one of Diomēdês. The French say, C’est le troc de Glaucus et de
Diomede. This Glaucus was the grandson of Bellerophon. (In Greek,
“Glaukos.”)

Glegg (Mrs.),one of the Dodson sisters in George Eliot’s Mill on


the Floss, and the least amiable. When displeased or thwarted she
takes to her bed, reads Baxter’s Saints’ Rest, and lives on water-
gruel.

Glenallan (Joscelind, dowager countess of), whose funeral takes


place by torchlight in the Catholic chapel.
The earl of Glenallan, son of the dowager countess.—Sir W. Scott,
The Antiquary (time, George III.).

Glenalvon, heir of Lord Randolph. When young Norval, the son


of Lady Randolph, makes his unexpected appearance, Glenalvon
sees in him a rival, whom he hates. He pretends to Lord Randolph
that the young man is a suitor of Lady Randolph’s, and, having
excited the passion of jealousy, contrives to bring his lordship to a
place where he witnesses their endearments. A fight ensues, in
which Norval slays Glenalvon, but is himself slain by Lord Randolph,
who then discovers too late that the supposed suitor was his wife’s
son.—Home, Douglas (1757).

Glencoe (2 syl.), the scene of the massacre of M’Ian and thirty-


eight of his glenmen, in 1692. All Jacobites were commanded to
submit to William III. by the end of December, 1691. M’Ian was
detained by a heavy fall of snow, and Sir John Dalrymple, the master
of Stair, sent Captain Campbell to make an example of “the rebel.”
⁂ Talfourd has a drama entitled Glencoe, or the Fall of the
M’Donalds.

Glendale (Sir Richard), a papist conspirator with Redgauntlet.—


Sir W. Scott, Redgauntlet (time, George III.).

Glendin´ning (Elspeth) or Elspeth Brydone (2 syl.), widow of


Simon Glendinning, of the Tower of Glendearg.
Halbert and Edward Glendinning, sons of Elspeth Glendinning.—Sir
W. Scott, The Monastery (time, Elizabeth).
Glendinning (Sir Halbert), the knight of Avenel, husband of Lady
Mary of Avenel (2 syl.).—Sir W. Scott, The Abbot (time, Elizabeth).

Glendoveer´, plu. Glendoveers, the most beautiful of the good


spirits of Hindû mythology.

... the glendoveers.


The loveliest of all of heavenly birth.
Southey, Curse of Kehama, vi, 2 (1809.)

Glendow´er (Owen), a Welsh nobleman, descended from


Llewellyn (last of the Welsh kings). Sir Edmund Mortimer married
one of his daughters. Shakespeare makes him a wizard, but very
highly accomplished.—Shakespeare, 1 Henry IV. (1597).
Glengar´ry. So M’Donald of Glengarry (who gave in his adhesion
to William III.) is generally called.

Glenpro´sing (The old lady), a neighbor of old Jasper Yellowley.


—Sir W. Scott, The Pirate (time, William III.).

Glenthorn (Lord), the hero of Miss Edgeworth’s novel called


Ennui. Spoiled by indolence and bad education, he succeeds, by a
course of self-discipline, in curing his mental and moral faults, and in
becoming a useful member of society (1809).
The history of Lord Glenthorn affords a striking picture of ennui, and contains
some excellent delineations of character.—Chambers, English Literature, ii. 569.

Glenvar´loch (Lord), or Nigel Olifaunt, the hero of Scott’s novel


called The Fortunes of Nigel (time, James I.).

Glinter, the palace of Foresti “the peace-maker,” son of Balder. It


was raised on pillars of gold, and had a silver roof.

Gloria´na, “the greatest glorious queen of Faëry-land.”


By Gloriana I mean [true] Glory in my general intention, but in my particular I
conceive the most excellent and glorious person of our sovereign the queen
[Elizabeth], and her kingdom is Faerye-land.—Spenser, Introduction to The Faëry
Queen (1590).

Glorious John, John Dryden (1631-1701).

Glorious Preacher (The), St. John Chrysostom (i.e. John


Goldenmouth, 354-407).

Glory (Old), Sir Francis Burdett (1770-1844).

Glory (Mc Whirk). Irish girl rescued from wretched dependence by


a benevolent woman, and made at home in a comfortable dwelling.
She has a big, warm heart that yearns over everything helpless and
hurt, and, whereas, in her childhood, she mourned over “the good
times” she was “not in,” she comes to rejoice constantly in the
blessed truth that she is “in” them all.—A.D.T. Whitney, Faith
Gartney’s Girlhood (1863).

Glossin (Mr. Gilbert), a lawyer, who purchases the Ellangowan


estate, and is convicted by Counsellor Pleydell of kidnapping Henry
Bertram, the heir. Both Glossin and Dirk Hatteraick, his accomplice,
are sent to prison, and in the night Hatteraick first strangles the
lawyer and then hangs himself.—Sir W. Scott, Guy Mannering (time,
George II.).

Gloucester (The duke of), brother of Charles II.—Sir W. Scott,


Woodstock (time, Commonwealth).

Gloucester (Richard, duke of), in the court of King Edward IV.—Sir


W. Scott, Anne of Geierstein (time, Edward IV.)

Gloucester, (The earl of), in the court of King Henry II.—Sir W.


Scott, The Betrothed (time, Henry II.).

Glover (Simon), the old glover of Perth, and father of the “fair
maid.”
Catharine Glover, “the fair maid of Perth,” daughter of Simon the
glover, and subsequently bride of Henry Smith the armorer.—Sir W.
Scott, Fair Maid of Perth (time, Henry IV.).

Glover (Heins), the betrothed of Trudchen [i.e. Gertrude] Pavillon,


daughter of the syndic’s wife.—Sir W. Scott, Quentin Durward (time,
Edward IV.).

Glowrowrum (The old lady), a friend of Magnus Troil.—Sir W.


Scott, The Pirate (time, William III.).

Glück, a German musical composer, greatly patronized by Marie


Antoinette. Young France set up against him the Italian Piccini.
Between 1774 and 1780 every street, coffee-house, school and
drawing-room in Paris canvassed the merits of these two composers,
not on the score of their respective talents, but as the
representatives of the German and Italian schools of music. The
partisans of the German school were called Glückists, and those of
the Italian school Piccinists.

Est-ce Glück, est-ce Puccini,


Que doit couronner Polymnie?
Donc entre Glück et Puccini
Tout le Parnasse est désuni.
L’un soutient ce que l’autre nie,
Et Clio veut battre Uranie,
Pour moi, qui crains toute manie,
Plus irrésolu que Babouc
N’épeusant Piccini ni Glück,
Je n’y connais rien: ergo Glück.

⁂ A similar contest raged in England between the Bononcinists


and Handelists. The prince of Wales was the leader of the Handel or
German party, and the duke of Marlborough of the Bononcini or
Italian school. (See Tweedledum.)

Glumdalca, queen of the giants, captive in the court of King


Arthur. The king cast love-glances at her, and made Queen Dollallolla
jealous; but the giantess loved Lord Grizzle, and Lord Grizzle loved
the Princess Huncamunca, and Huncamunca loved the valiant Tom
Thumb.—Tom Thumb, by Fielding the novelist (1730), altered by
O’Hara, author of Midas (1778).

Glum-dal´clitch, a girl nine years old “and only forty feet high.”
Being such a “little thing,” the charge of Gulliver was committed to
her during his sojourn in Brobdingnag.—Swift, Gulliver’s Travels.

Soon as Glumdalclitch missed her pleasing care,


She wept, she blubbered, and she tore her hair.
Pope.

Glumms, the male population of the imaginary country


Nosmnbdsgrsutt, visited by Peter Wilkins. The Glumms, like the
females, called gawreys (q.v.), had wings, which served both for
flying and dress—R. Pultock, Peter Wilkins (1750).

Glutton (The), Vitellius, the Roman emperor (born a.d. 15,


reigned 69, died 69). Visiting the field after the battle of Bedriac, in
Gaul, he exclaimed, “The body of a dead enemy is a delightful
perfume.”
⁂ Charles IX. of France, when he went in grand procession to
visit the gibbet on which Admiral Coligny was hanging, had the
wretched heartlessness to exclaim, in doggerel verse;

Fragrance sweeter than the rose


Rises from our slaughtered foes.

Glutton (The), Gabius Apicius, who lived during the reign of


Tiberius. He spent £800,000 on the luxuries of the table, and when
only £80,000 of his large fortune remained, he hanged himself,
thinking death preferable to “starvation on such a miserable
pittance.”

Glynn (The Marshes of). Title of a poem by Sidney Lanier,


descriptive of a marsh on the Southern coast.

The creeks overflow; a thousand riverlets run


Twixt the roots of the sod; the blades of the marsh-grass stir,
Passeth a hurrying sound of wings that westward whir;
Passeth, and all is still, and the currents cease to run,
And the sea and the marsh are one.
Poems, by Sidney Lanier (1884).

Gna, the messenger of Frigga.—Scandinavian Mythology.

Goats. The Pleiades are called in Spain The Seven Little Goats.
So it happened that we passed close to the Seven Little Goats.—Cervantes, Don
Quixote, II. iii. 5 (1615).
⁂ Sancho Panza affirmed that two of the goats were of a green
color, two carnation, two blue, and one motley; “but,” he adds, “no
he-goat or cuckold ever passes beyond the horns of the moon.”

Goatsnose, a prophet, born deaf and dumb, who uttered his


predictions by signs.—Rabelais, Pantag´ruel, iii. 20 (1545).

Gobbo (Old), the father of Launcelot. He was stone blind.


Launcelot Gobbo, son of Old Gobbo. He left the service of Shylock
the Jew for that of Bassa´nio, a Christian. Launcelot Gobbo is one of
the famous clowns of Shakespeare.—Shakespeare, Merchant of
Venice (1698).

Gob´ilyve (Godfrey), the assumed name of False Report. He is


described as a dwarf, with great head, large brows, hollow eyes,
crooked nose, hairy cheeks, a pied beard, hanging lips, and black
teeth. His neck was short, his shoulders awry, his breast fat, his
arms long, his legs “kewed,” and he rode “brigge-a-bragge on a little
nag.” He told Sir Graunde Amoure he was wandering over the world
to find a virtuous wife, but hitherto without success. Lady Correction
met the party, and commanded Gobilyve (3 syl.) to be severely
beaten for a lying varlet.—Stephen Hawes, The Passe-tyme of
Plesure, xxix., xxxi., xxxii. (1515).

Gobseck, a grasping money-lender, the hero and title of one of


Balzac’s novels.

God.
Full of the god, full of wine, partly intoxicated.
God made the country, and man made the town.—Cowper’s Task
(“The Sofa”). Varro, in his De Re Rustica, has: “Divina Natura agros
dedit, ars humana ædificavit urbes.”
God sides with the strongest. Napoleon I. said, “Le bon Dieu est
toujours du coté des gros bataillons.” Julius Cæsar made the same
remark.
Godam, a nickname applied by the French to the English, in
allusion to a once popular oath.

Godfrey (de Bouillon), the chosen chief of the allied crusaders,


who went to wrest Jerusalem from the hands of the Saracens. He
was calm, circumspect, prudent, and brave. Godfrey despised
“worldly empire, wealth, and fame.”—Tasso, Jerusalem Delivered
(1575).

Godfrey (Sir Edmondbury), a magistrate killed by the papists. He


was very active in laying bare their nefarious schemes, and his body
was found pierced with his own sword, in 1678.—Sir W. Scott, Peveril
of the Peak (time, Charles II.).
⁂ Dryden calls Sir Edmondbury “Agag,” and Dr. Titus Otes he calls
“Corah.”

Corah might for Agag’s murder call,


In terms as coarse as Samuel used to Saul.
Absalom and Achitophel, i. (1681).

Godfrey (Miss), an heiress, daughter of an Indian governor.—Sam.


Foote, The Liar (1761).

God´inez (Doctor), a schoolmaster, “the most expert flogger in


Oviedo” [Ov.e.a.´do]. He taught Gil Blas, and “in six years his
worthy pupil understood a little Greek, and was a tolerable Latin
scholar.”—Lesage, Gil Blas, i. (1716).

Godi´va or Godgifu, wife of Earl Leofric. The tale is that she


begged her husband to remit a certain tax which oppressed the
people of Coventry. Leofric said he would do so only on one
condition—that she would ride naked through the city at midday. So
the lady gave orders that all people should shut up their windows
and doors; and she rode naked through the town, and delivered the
people from the tax. The tale further says that all the people did as
the lady bade them except Peeping Tom, who looked out, and was
struck blind.
⁂ This legend is told at length by Drayton in his Polyolbion, xiii.
(1613).

Godless Florins, English two-shilling pieces issued by Shiel when


master of the mint. He was a Roman Catholic, and left out F.D.
(defender of the faith) from the legend. They were issued and called
in the same year (1849).

Godmanchester Hogs and Huntingdon Sturgeon.


During a very high flood in the meadows between Huntingdon and
Godmanchester, something was seen floating, which the Godmanchester people
thought was a black hog, and the Huntingdon folk declared was a sturgeon. When
rescued from the waters, it proved to be a young donkey.—Lord Braybrooke
(Pepys, Diary, May 22, 1667).

Godmer, a British giant, son of Albion, slain by Canu´tus, one of


the companions of Brute.

Those three monstrous stones...


Which that huge son of hideous Albion,
Great Godmer, threw in fierce contention
At bold Canutus; but of him was slain.
Spenser, Faëry Queen, ii. 10 (1590).

Goëmot or Goëmagot, a British giant, twelve cubits high, and of


such prodigious strength that he could pull up a full-grown oak at
one tug. Same as Gogmagog (q.v.).
On a certain day, when Brutus was holding a solemn festival to the gods ... this
giant, with twenty more of his companions, came in upon the Britons, among
whom he made a dreadful slaughter; but the Britons at last ... killed them every
one but Goëmagot ... him Brutus preserved alive, out of a desire to see a combat
between the giant and Corineus, who took delight in such encounters.... Corineus
carried him to the top of a high rock, and tossed him into the sea.—Geoffrey,
British History, i. 16 (1142).

Goëmagot’s Leap, or “Lam Goemagot,” now called Haw, near


Plymouth; the place where the giant fell when Corin’eus (3 syl.)
tossed him down the craggy rocks, by which he was mangled to
pieces.—Geoffrey, British History, i. 16 (1142).
⁂ Southey calls the word Lan-gœ-mā-gog. (See Gogmagog).

Goer´vyl, sister of Prince Madoc, and daughter of Owen, late


king of North Wales. She accompanied her brother to America, and
formed one of the colony of Caer-madoc, south of the Missouri
(twelfth century).—Southey, Madoc (1805).

Goetz von Berlichingen, or Gottfried of the Iron Hand, a


famous German burgrave, who lost his right hand at the siege of
Landshut. The iron hand which replaced the one he had lost is still
shown at Jaxthausen, the place of his birth. Gottfried took a
prominent part in the wars of independence against the electors of
Brandenberg and Bavaria, in the sixteenth century (1480-1562).
⁂ Goethe has made this the title and subject of an historical
drama.

Goffe (Captain), captain of the pirate vessel.—Sir W. Scott, The


Pirate (time, William III.).

Gog, according to Ezek. xxxviii., xxxix., was “prince of Magog”, (a


country or people). Calmet says Camby´sês, king of Persia, is
meant; but others think Antiochus Epiph´anês is alluded to.

Gog, in Rev. xx. 7-9, means Antichrist. Gog and Magog, in


conjunction, mean all princes of the earth who are enemies of the
Christian Church.
⁂ Sale says Gog is a Turkish tribe.—Al Korân, xviii. note.

Gog and Magog. Prester John in his letter to Manuel Comnēnus,


emperor of Constantinople, speaks of Gog and Magog as two
separate nations tributary to him. These, with thirteen others, he
says, are now shut up behind inaccessible mountains, but at the end
of the world they will be let loose, and overrun the whole earth.—
Albericus Trium Fontium, Chronicles (1242).
Sale tells us that Gog and Magog are called by the Arabs “Yajui”
and “Ma-jûj,” which are two nations or tribes descended from
Japhet, son of Noah. Gog, according to some authorities, is a Turkish
tribe; and Magog is the tribe called “Gilân” by Ptolemy, and “Geli” or
“Gelæ” by Strabo.—Al Korân, xviii. note.
Respecting the re-appearance of Gog and Magog, the Korân says:
“They [the dead] shall not return ... till Gog and Magog have a
passage opened for them, and they [the dead] shall hasten from
every high hill,” i.e. the resurrection (ch. xxi.).

Gog and Magog. The two statues of Guildhall so called are in


reality the statues of Gogmagog or Goëmagot and Corineus, referred
to in the next article. (See also Corineus.) The Albion giant is known
by his pole-axe and spiked ball. Two statues so called stood on the
same spot in the reign of Henry V.; but those now seen were made
by Richard Saunders, in 1708, and are fourteen feet in height.
In Hone’s time, children and country visitors were told that every day, when the
giants heard the clock strike twelve, they came down to dinner.—Old and New
London, i. 387.

Another tale was that they then fell foul of each other in angry
combat.

Gog´magog, king of the Albion giants, eighteen feet in height,


killed by Corin in a wrestling match, and flung by him over the Hoe
or Haw of Plymouth. For this achievement, Brute gave his follower
all that horn of land now called Cornwall, Cor´n[w]all, a contraction
of Corinall. The contest is described by Drayton in his Polyolbion, i.
(1612).
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookultra.com

You might also like