A Random Forest using a Multi-valued Decision Diagram on an FPGa

A Random Forest using a Multi-valued
Decision Diagram on an FPGA
1Hiroki Nakahara, 1Akira Jinguji, 1Shimpei Sato,
2Tsutomu Sasao
1Tokyo Institute of Technology, JP, 2Meiji University, JP
May, 22nd, 2017
@ISMVL2017

Outline
• Background
• Random forest (RF)
• Multi-valued decision diagram (MDD)
• RF using MDDs
• Experimental results
• Conclusion
2

Machine Learning
3
Much computation power, and Big data
(Left): “Single-Threaded Integer Performance,” 2016
(Right): Nakahara, “Trend of Search Engine on modern Internet,” 2014

Machine Learning Algorithms
M. Warrick, “How to get started with machine learning,” PyCon2014 4

Introduction
• Random Forest (RF)
• Ensemble learning method
• Consists of multiple decision trees (DTs)
• Applications: Segmentation, human pose
detection
• It is based on binary DTs (BDTs)
• A node is evaluated by an if-then-else
statement
• The same variable may appear several times
• Multiple-valued decision diagram (MDD)
• Each variable appears only once on a path
5

Introduction (Contʼd)
• Target platform
• CPU: Too slow
• GPU: Not suitable to the RF → slow, and
consumes much power
• FPGA: Faster, low power, long TAT
• High-level synthesis (HLS) for the RF using
MDDs on an FPGA
• Low power, high performance,
short design time
6

Classification by a Binary
Decision Tree (BDT)
• Partition of the feature map
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1
C2 C1
X1
X2
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
8

Training of a BDT
• It is built by randomized samples
• Recursively partition the dataset to maximize its
entropy → The same variables may appear
9
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1 C2 C1
X1
X2
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1

Random Forest (RF)
• Ensemble learning
• Classification and regression
• Consists of multiple BDT
10
Tree 1 Tree 2 Tree n
C1
C2
C1
Voter
C1 (Class)
InputX1<0.53?
X3<0.71? X2<0.63?
X2<0.63? X3<0.72?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C3
C1
Tree 1
Binary Decision Tree (BDT) Random Forest
...

Applications
• Key point matching [Lepetit et al., 2006]
• Object detector [Shotton et al., 2008][Gall et al., 2011]
• Hand written character recognition [Amit&Geman, 1997]
• Visual word clustering
[Moosmann et al.,2006]
• Pose recognition
[Yamashita et al., 2010]
• Human detector
[Mitsui et al., 2011]
[Dahang et al., 2012]
• Human pose estimation
[Shotton 2011]
11

Known Problem
• Build BDTs from randomized samples
• The same variable may appear on a path
• Tend to be slow, even if we use the GPUs
12
X2<0.53?
X2<0.29? X2<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
if X2 < 0.09 then
output C1;
else
goto Child_node;

Multi-valued Decision Diagram
13

14
Binary Decision Diagram (BDD)
• Recursively apply Shannon expansion to a
given logic function
• Non-terminal node: If-then-else statement
• Terminal node: Set functional value
0 1
x1
x2
x3
x4
x5
x6
Non‐terminal node
Terminal node

15
Measurement of BDD
Memory size: # of nodes size of a node
Worst case performance: LPL (Longest Path Length)
→Dedicated fully pipeline hardware
0 1
x1
x2
x3
x4
x5
x6


16
Multi-Valued Decision Diagram (MDD)
• MDD(k): 2k outgoing edges
• Evaluates k variables at a time
0 1
x1
x2
x3
x4
x5
x6
BDD
0 1
X3
X2
X1
{x5,x6}
{x3,x4}
{x1,x2}
MDD(2)

Comparison the BDT with the MDD
17
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
X2
X1 X1
C1 C2
<0.29
<0.53
<1.00
<1.00
<0.71
<0.71
<1.00
<0.63
BDT MDD

# of Nodes
18
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1
C2 C1
X2
X1
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1
C2 C1
X2
X1
BDT MDD

Complexities of the BDT
and the MDD
19
# Nodes LPL
BDT O(Σ|Xi|) O(Σ|Xi|)
MDD O(|Xi|k) O(n)
The RF prefers shallow decision trees for avoid
the overfitting

Random Forest
using MDDs on an FPGA
20

FPGA (Field Programmable
Gate Array)
• Reconfigurable architecture
• Look-up Table (LUT)
• Configurable channel
• Advantages
• Faster than CPU
• Dissipate lower power
than GPU
• Short time design
than ASIC
21

Fully Pipeline Circuit
Tree 1 Tree 2 Tree b
C1 C2
C1
Voter
C1
X (Input)
...
22

System Design Tool
24
①
②
④
③
1. Behavior design
+ pragmas
2. Profile analysis
3. IP core generation by HLS
4. Bitstream generation by
FPGA CAD tool
5. Middle ware generation
↓
Automatically done

Proposed Tool Flow
Training
Dataset
scikit‐learn
Hyper
Parameter
(by Grid‐
search)
Random
Forest
Host
Code
Kernel
Code aocx
Binary
Host
PC
FPGA
Board
aoc
gcc
RF2AOC
25
scikit‐learn Intel SDK for OpenCL

Comparison the MDD
based with the BDT based
27
BDT MDD
Name Path len.
(Peform.)
#Nodes
(Mem.)
Max.
Path
Path len.
(Peform.)
#Nodes
(Mem.)
Dermatology 720 676 15 322 118336
Contraceptive
Method
600 1055 9 198 7360
Glass
Identification
952 1260 10 268 17204
Hayes‐Roth 480 577 5 73 448
Hepatitis 720 1040 15 357 145664
Ionosphere 1196 1077 20 381 671744
Iris 1056 777 4 199 517
Dataset: UCI Machine Learning Repository
http://archive.ics.uci.edu/ml/datasets.html

Comparison of Platforms
• Implemented RF following devices
• CPU: Intel Core i7 650
• GPU: NVIDIA GeForce GTX Titan
• FPGA: Terasic DE5-NET
• Measure dynamic power including
the host PC
• Test bench: 10,000 random vectors
• Execution time including
communication time between
the host PC and devices
28
GPU
FPGA

Comparison of Platforms
29
GPU@86W
GeForce Titan
CPU@13W
Xeon (R) E5607
FPGA@15W
Stratix V A7
Name LPS LPS/W LPS LPS/W LPS LPS/W
Dermatology 336.2 3.9 211.6 16.3 3221.2 214.7
Contraceptive
Method
521.9 6.1 286.4 22.0 10924.3 728.3
Glass
Identification
726.7 8.5 587.5 45.2 6442.3 429.5
Hayes‐Roth 1512.9 17.6 1165.5 89.7 12884.6 859.0
Hepatitis 739.1 8.6 662.7 51.0 8209.9 547.3
Ionosphere 821.0 9.5 595.9 45.8 9663.5 644.2
Iris 446.6 5.2 436.7 33.6 4831.7 322.1
LPS: #Looks Per Second

Conclusion
• Proposed the RF using MDDs
• Reduced the path length
• Increased the column multiplicity
• # of nodes: O(|X|k)
• The shallow decision diagram is
recommended to avoid the overfitting
• Developed the high-level synthesis design
flow toward the FPGA realization
• 10.7x faster than the GPU
• 14.0x faster than the CPU
30

A Random Forest using a Multi-valued Decision Diagram on an FPGa

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to A Random Forest using a Multi-valued Decision Diagram on an FPGa (20)

More from Hiroki Nakahara (6)

Recently uploaded (20)

A Random Forest using a Multi-valued Decision Diagram on an FPGa