0% found this document useful (0 votes)
88 views

Edge Machine Learning For Embedded Deep Dive

Uploaded by

Gabriel Virbila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Edge Machine Learning For Embedded Deep Dive

Uploaded by

Gabriel Virbila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Machine learning for embedded deep dive

Presented By

Andy Luo
Sr. Product Marketing Manager
2018-10-01

© Copyright 2018 Xilinx


Key Machine Learning Applications for Xilinx

Surveillance ADAS/AD Robotics Data Center

Cloud ML
And there are many more …
Edge ML
© Copyright 2018 Xilinx
Xilinx Value Proposition in Edge/Embedded ML

Only HW/SW configurable device 2 High performance / low power with


1
for fast changing networks custom internal memory hierarchy

3 Future proof to lower 4 Low latency end-to-end 5 Scalable device family for
precisions different applications
© Copyright 2018 Xilinx
Key Challenges for Xilinx in Edge/Embedded ML

1 Deploy ML to Xilinx FPGA easily and quickly

2 Expand ML into non-FPGA customers

3 Delivers excellent performance with power & cost constraints


for diverse embedded applications

© Copyright 2018 Xilinx


Frameworks
& Libraries

Machine Learning

Development tools

USB3

Platforms HDMI

MIPI

© Copyright 2018 Xilinx


Deephi Edge ML Solution

© Copyright 2018 Xilinx


Unique, Patented Deep Learning Acceleration Techniques

˃ Best paper awards for breakthrough DL acceleration

˃ Deephi’s compression technology can:


Pruning
Reduce DL accelerator footprint into smaller devices

Increase performance per watt (higher performance and/or lower energy)

Quantization

Unique Pruning Technology Provides a Significant Competitive Advantage


© Copyright 2018 Xilinx
DeePhi Solution Stack for Edge/Embedded ML
Models

Face detection Pose estimation Video analytics Lane detection Object detection Segmentation

Framework
Darknet

Compression
Pruning Quantization

Compilation
Compiler Assembler
Tools & IP
Runtime
Core API Loader

Driver Profiler

HW Platforms
Z7020 Board Z7020 SOM ZU2 SOM ZU2/3 Card ZU9 Card ZCU102 ZCU104 Ultra96
© Copyright 2018 Xilinx
Deephi also has LSTM IP for KU115/VU9P as a part of Cloud ML
DNNDK Overview

˃ DECENT (DEep ComprEssioN Tool)

DECENT N2Cube
˃ DNNC (Deep Neural Network Compiler)
DNNC Simulator

˃ DNNAS (Deep Neural Network ASsembler)


DNNAS DSight

˃ Runtime N2Cube (Cube of Nerual Network)


OS
˃ DPU Simulator – Internal tool
Host CPU DPU

˃ Profiler DSight

© Copyright 2018 Xilinx


Framework Support

• Pruning • Pruning • Quantization & Compilation


• Eval version
• Quantization • Quantization
• Pruning
• Compilation • Convertor for caffe
• Internal version

© Copyright 2018 Xilinx


DPU IP with High Efficiency EXTERNAL MEMORY

CPU MEM CONTROLLER

Utilization > 50% for mainstream neural networks


BUS

Data Mover 52%


INSTR
FETCHER GoogleNet-V3 23%
WEIGHTS WR IMG WR WB WR
SCHEDULER SCHEDULER SCHEDULER 14%
DECODER

SMART MEM FABRIC 51%

REG MAP DISPATCHER


ResNet-50 24%
WEIGHTS RD IMG RD 13%
SCHEDULER SCHEDULER

85%
CTRL PE Array
SIGNALS VGG16 40%
PE PE ... PE PE 18%

0% 20% 40% 60% 80% 100%


MISC CALC

AVG MAX ROI ELEMENT


Aristotle on 7020 FPGA Iphone8plus Kirin 970
POOL POOL POOL WISE ...
Source: Published results from Huawei

© Copyright 2018 Xilinx


Supported Operators
• Arbitrary Input Image Size • Deconv

• Conv • Depthwise conv

• Arbitrary Conv Kernel Size • Elementwise

• Arbitrary Conv Stride/Padding • FC(Int8/FP32)

• Dilation • Mean scale

• Pooling • Upsampling

• Max/Avg Pooling • Batch Normalization

• Arbitrary Max Pooling Size • Split

• Avg Pooling kernel size: 2x2~7x7 • Reorg

• Arbitrary Pooling Stride/Padding • Resize (Optional)

• ReLU / Leaky Relu • Softmax (Optional)

• Concat • Sigmoid (Optional)

© Copyright 2018 Xilinx


Constraints Between Layers
Next
Layer Layer
Type

●:Support ✕: Not support ○: Support when selecting additional features


© Copyright 2018 Xilinx
DPU Typical Options & Interfaces

slave-axi Master-axi-0 Master-axi-1 Master-axi-2


˃ B1152 64bits 64bits
Parallelism: 4 * 12 * 12
target Z7020/ZU2/ZU3 32bits 32bits
DPU
B1152
˃ B4096
Parallelism: 8 * 16 * 16
Target ZU5 and above
slave-axi Master-axi-0 Master-axi-1 Master-axi-2
128bits 128bits

32bits 32bits
DPU
B4096

© Copyright 2018 Xilinx


DPU Peak Perf & Power
DPU Peak3) Device
1) 2)
LUT Flip-Flops Block RAM DSP MACs Frequency Power
Config performance

Z7020 53200 106400 4.9Mb 220 1xB1152 576 230GOPS 200MHz 2W

ZU2 47000 94000 5.3Mb 240 1xB1152 576 576GOPS 500MHz 3.5W

ZU3 71000 141000 7.6Mb 360 1xB1152 576 576GOPS 500MHz N/A

ZU54) 117000 234000 5.1Mb+18Mb 1248 1xB4096 2048 1350GOPS 330MHz N/A
1xB4096 2048
ZU7EV 230000 461000 11Mb+27Mb 1728 2240GOPS 350MHz N/A
+2xB1152 +2*576
ZU9 274000 548000 32.1Mb 2520 2xB4096 4096 2700GOPS 330MHz 10W

1) One DSP48E is used for two int8 multiplication


2) MACs is constructed by DSP and LUT (if DSP is not enough)
3) Peak performance is calculated by MACs: GOPS = 2*MACs*Frequency
4) Just list our conservative projection in performance

>> 15
© Copyright 2018 Xilinx
DPU Utilization

LUT Slice_reg Block Ram DSPs


Single B1152 on Z7020 All logic 53200 106400 140 220
DPU 45535 56961 110.5 220
Utilization ratio 85.59% 53.53% 78.93% 100.00%
LUT Slice_reg Block Ram DSPs
Single B1152 on ZU2 All logic 47232 94464 150 240
DPU 40703 55083 112 240
Utilization ratio 86.18% 58.31% 74.67% 100.00%
LUT Slice_reg Block Ram DSPs
Single B1152 on ZU3 All logic 70560 141120 216 360
DPU_B1152 36560 68729 115.5 288
Utilization ratio 51.81% 48.70% 53.47% 66.67%
LUT Slice_reg Block Ram DSPs
Dual B4096 on ZU9 All logic 274080 548160 912 2520
DPU 156744 224650 501 2048
Utilization ratio 57.19% 40.98% 54.93% 81.27%

© Copyright 2018 Xilinx


Perf Improvement with the Next Version DPU
Performance Comparison (FPS)
600 Current B4096*2 wo Prune New B4096*3 wo Prune

500 445
400
313
300

200 179
118
73 92
100
12 28.3
0
VGG-SSD VGG16 ResNet50 GoogLeNet

*The FPS of VGG-SSD of end to end performance


*The FPS of VGG16/ResNet50/GoogLeNet is of CONV part (w/o FC layer)

Resource Utilization Comparison

DSP LUT FF BRAM


Current B4096*2 2048 156744 224650 501
Next Version B4096*3 1926 110311 255020 748.5

© Copyright 2018 Xilinx


DPU Scalability
Peak
Perf
INT8
(OPS)

6.8T ZU15

5.5T ZU11

4.1T ZU9

3.5T ZU7

2.9T ZU6
2.8T Z7100

2.4T ZU5

1.7T Z7045 DPU Configuration


1.6T Z7035 ZU4

1.2T ZU3

700G Z7030
576G ZU2
230G Z7020

115G Z7014S/Z7015
102G Z7012S
56G Z7010

© Copyright 2018 Xilinx


* B256/288/512/3136 work in progress
DNNDK Dev Flow

Five Steps 01 Model Compression

with DNNDK
02 Model Compilation

03 Programming

04 Hybrid Compilation

05 Execution

>> 19
© Copyright 2018 Xilinx
DECENT – Deephi Deep Compression Tool

© Copyright 2018 Xilinx


Deep Compression Overview

Deep compression Highlight


1/3 1/10 1/10 3X

Makes algorithm smaller and lighter Weight Bandwidth Model Perfor-


number load size mance

Compression Deep Compression Tool can achieve


efficiency significant compression on CNN and
RNN

Algorithm can be compressed 7 times


Accuracy without losing accuracy under SSD
object detection framework

© Copyright 2018 Xilinx


Pruning Tool – decent_p
Origin model
˃ 4 commands in decent_p
Ana Analyze
‒ analyze the network
Prune
‒ prune the network according to config Prune
Finetune
‒ finetune the network to recover accuracy
Transform finetune
‒ transform the pruned model to regular model

prune more?
Y

N
Transform

ana prune transform


pruned model

© Copyright 2018 Xilinx


Pruning Results
Baseline Pruning Result 1 Pruning Result 2
Classification Networks
Top-5 Top-5 ΔTop5 ratio Top-5 ΔTop5 ratio

Resnet50 [7.7G] 91.65% 91.23% -0.42% 40% 90.79% -0.86% 32%

Inception_v2 [4.0G] 91.07% 90.37% -0.70% 60% 90.07% -1.00% 55%

SqueezeNet [778M] 83.19% 82.46% -0.73% 89% 81.57% -1.62% 75%

Baseline Pruning Result 1 Pruning Result 2


Detection Networks
mAP mAP ΔmAP ratio mAP ΔmAP ratio

DetectNet [17.5G] 44.46 45.7 +1.24 63% 45.12 +0.66 50%

SSD+VGG [ 117G] 61.5 62.0 +0.5 16% 60.4 -1.1 10%

[A] SSD+VGG [ 173G] 57.1 58.7 +1.6 40% 56.6 -0.5 12%

[B] Yolov2 [ 198G] 80.4 81.9 +1.5 28% 79.2 -1.2 7%

© Copyright 2018 Xilinx


Pruning Example - SSD

140 SSD+VGG @Deephi Surveillance 4classes


117
120
100
80
61.5 63.4 63.5 63.4 62.4 62 61.5 61.1 61 60.8 59.2 60.4
57
60
37
40 27 23 19 17 15.6 14.6 13.6 12.2
20 11.6

0
1 2 3 4 5 6 7 8 9 10 11 12
operations (G) mAP (%)

Pruning Speedup on DPU (SSD)


120 103
100
71
80
FPS

60
40 18
20
0
11 7 G 19G 11 . 6 G
OPS
2x DPU-4096@ZU9

© Copyright 2018 Xilinx


© Copyright 2018 Xilinx
Makes Big Difference with Pruning
(SSD 480x360)
SSD

120 SSD Pruned

SSD GPU

105

90
(batch=1)

75
FPS

60

45
Result of
30
DeePhi Pruning
15

Jetson TX2 ZU9 ZU5 ZU2 7020


Power 10W 10W 5W 3W 2W
© Copyright 2018 Xilinx
Quantization Tool – decent_q

˃ 4 commands in decent_q decent_q


quantize Calibration data
(100-1000 images)
‒ Quantize network Pre-trained model
test (fp32)
‒ Test network accuracy quantize
Quantized model
finetune (Int16/Int8/...)
‒ Finetune quantized network test
deploy
Origin training data
‒ Generate model for DPU

˃ Data needs to Y
increase accuracy finetune
Calibration data
‒ Quantize activation
Training data
N
deploy
‒ Further increase accuracy

Model for DPU

© Copyright 2018 Xilinx


Quantization Results
˃ Uniform Quantization
8-bit for both weights and activation
A small set of images for calibration

Float32 baseline 8-bit Quantization


Networks
Top1 Top5 Top1 ΔTop1 Top5 ΔTop5

Inception_v1 66.90% 87.68% 66.62% -0.28% 87.58% -0.10%

Inception_v2 72.78% 91.04% 72.40% -0.38% 90.82% -0.23%

Inception_v3 77.01% 93.29% 76.56% -0.45% 93.00% -0.29%

Inception_v4 79.74% 94.80% 79.42% -0.32% 94.64% -0.16%

ResNet-50 74.76% 92.09% 74.59% -0.17% 91.95% -0.14%

VGG16 70.97% 89.85% 70.77% -0.20% 89.76% -0.09%

Inception-ResNet-v2 79.95% 95.13% 79.45% -0.51% 94.97% -0.16%

© Copyright 2018 Xilinx


DNNDK API
dpuOpen() dpuGetOutputTensorSize()
dpuClose() dpuGetOutputTensorScale()
dpuLoadKernel() dpuGetOutputTensorHeight()
dpuDestroyKernel() dpuGetOutputTensorWidth()
dpuCreateTask() dpuGetOutputTensorChannel()
dpuRunTask() dpuGetTensorSize()
dpuDestroyTask() dpuGetTensorAddress()
˃ For more details, refer to
dpuEnableTaskProfile() dpuGetTensorScale() DNNDK User Guide
dpuGetTaskProfile() dpuGetTensorHeight()
dpuGetNodeProfile() dpuGetTensorWidth()
dpuGetInputTensor() dpuGetTensorChannel() http://www.deephi.com/technology/
dpuGetInputTensorAddress() dpuSetIntputTensorInCHWInt8() dnndk
dpuGetInputTensorSize() dpuSetIntputTensorInCHWFP32()
dpuGetInputTensorScale() dpuSetIntputTensorInHWCInt8()
dpuGetInputTensorHeight() dpuSetIntputTensorInHWCFP32()
dpuGetInputTensorWidth() dpuGetOutputTensorInCHWInt8()
dpuGetInputTensorChannel() dpuGetOutputTensorInCHWFP32()
dpuGetOutputTensor() dpuGetOutputTensorInHWCInt8()
dpuGetOutputTensorAddress() dpuGetOutputTensorInHWCFP32()

© Copyright 2018 Xilinx


Programming with DNNDK API

© Copyright 2018 Xilinx


DNNDK Hybrid Compilation Model

© Copyright 2018 Xilinx


Optimization in DNNC

© Copyright 2018 Xilinx


DNNDK Runtime Engine

© Copyright 2018 Xilinx


Supported Networks
Application Module Algorithm Model Development Compression Deployment
Face detection SSD, Densebox ✔ ✔ ✔
Landmark Localization Coordinates Regression ✔ N/A ✔
Face
Face recognition ResNet + Triplet / A-softmax Loss ✔ ✔ ✔
Face attributes recognition Classification and regression ✔ N/A ✔
Pedestrian Detection SSD ✔ ✔ ✔
Pose Estimation Coordinates Regression ✔ ✔ ✔
Pedestrian
Person Re-identification ResNet + Loss Fusion ✔
Object detection SSD, RefineDet ✔ ✔ ✔
Pedestrian Attributes Recognition GoogleNet ✔ ✔ ✔
Car Attributes Recognition GoogleNet ✔ ✔ ✔
Video Analytics
Car Logo Detection DenseBox ✔ ✔
Car Logo Recognition GoogleNet + Loss Fusion ✔ ✔
License Plate Detection Modified DenseBox ✔ ✔ ✔
License Plate Recognition GoogleNet + Multi-task Learning ✔ ✔ ✔
Object Detection SSD, YOLOv2, YOLOv3 ✔ ✔ ✔
3D Car Detection F-PointNet, AVOD-FPN ✔
ADAS/AD Lane Detection VPGNet ✔ ✔ ✔
Traffic Sign Detection Modified SSD ✔
Semantic Segmentation FPN ✔ ✔ ✔
Drivable Space Detection MobilenetV2-FPN ✔
Multi-task (Detection+Segmentation) Deephi ✔

© Copyright 2018 Xilinx


Measured Performance
500

450

400

350
PERFORMANCE (FPS)

Inception v1(3.2, 313)


300

250

200
Tiny Yolov2 (7,
168) Tiny Yolov3 (5.6, 170)
150
ResNet50
100 (7.7, 118)

VGG16 (30, 73)


50 Yolov2 (36,42)
Yolov3 (65,25) SSD (117,19.7)

0
0 20 40 60 80 100 120
COMPUTATION (GOPS PER IMAGE )

© Copyright 2018 Xilinx


Measured Performance (Cont.)
500
Inception v1 (1.6, 481)
Baseline Network
450
Pruned Network
400

350
PERFORMANCE (FPS)

Inception v1(3.2, 313)


300

250

200
Tiny Yolov2 (7, 168)
Tiny Yolov3 (5.6, 170)
150 ResNet50 (3.8, 150)
SSD (11.6, 129)
ResNet50
100 (7.7, 118) VGG16 (20, 100)
Yolov2 (16, 95) VGG16 (30, 73)
50 Yolov3 (17, 54)
Yolov2 (36,42)
Yolov3 (65,25) SSD (117,19.7)

0
0 20 40 60 80 100 120
COMPUTATION (GOP PER IMAGE )

© Copyright 2018 Xilinx


Measured Performance (Cont.)
500
Inception v1 (1.6, 481)
Baseline Network
450

Pruned Network
400

Deephi Designed Network


350
PERFORMANCE (FPS)

Inception v1(3.2, 313)


300

250

200
Tiny Yolov2 (7, 168)
Tiny Yolov3 (5.6, 170)
150 ResNet50 (3.8, 150)
SSD (11.6, 129)
ResNet50
100 (7.7, 118) FPN (8.9, 120) VGG16 (20, 100)
Yolov2 (16, VGG16 (30, 73)
95)
50 Yolov3 (17, 54)
Yolov2 (36,42)
VPGNet (10, 30) Yolov3 (65,25) SSD (117,19.7)

0
0 20 40 60 80 100 120
COMPUTATION (GOPS PER IMAGE)

© Copyright 2018 Xilinx


Out-of-box Supported Boards

˃ DP8000
Z7020 SOM

˃ DP2400
ZU9 PCIe card

˃ Deephi ZU2/3 board


˃ Xilinx ZCU102
˃ Xilinx ZCU104
˃ Avnet Ultra96

© Copyright 2018 Xilinx


Video Surveillance ML Solutions

Intelligent Video Analytics


IP Camera Solution Acceleration Solution

Face recognition camera 8-channel 1080P Video Analytics


with Zynq7020 with ZU9EG

© Copyright 2018 Xilinx


Video Surveillance ML Ref Design
Gender : Female
Upper color : Yellow
Detection & Tracking Person Attributes Lower color : White
Hat : No
Backpack : No
Handbag : No
Other bag : No

Gender : Male
Upper color : Black
Detection & Tracking Person Attributes
Lower color : Black
Hat : No
Backpack : No
Handbag : No
Other bag : No

Detection & Tracking Car Attributes Color : White


Type : BUICK

Plate Detection
License Recognition Color : Blue
Number :渝C LC689

© Copyright 2018 Xilinx


ADAS/AD ML Reference Design
2D/3D Object Detection Pedestrian Detection

Lane Detection

Segmentation Pose Estimation


Segmentation + Detection

© Copyright 2018 Xilinx


8-ch Detection Demo
˃ Xilinx device
ZU9EG

˃ Network
SSD compact version

˃ Input image size to DPU


480 * 360

˃ Operations per frame


4.9G

˃ Performance
30fps per channel

© Copyright 2018 Xilinx


4-ch Segmentation + Detection Demo
˃ Xilinx device
ZU9EG

˃ Network
FPN compact version
SSD compact version

˃ Input image size to DPU


FPN – 512 * 256
SSD – 480 * 360

˃ Operations per frame


FPN – 9G
SSD – 4.9G

˃ Performance
15fps per channel

© Copyright 2018 Xilinx


ML Development with Deephi Solution

© Copyright 2018 Xilinx


Development Method

Algorithm Algorithm Parameter Algorithm

C/C++
RTL Instruction
RTL

RTL

FPGA FPGA FPGA

Traditional OpenCL/HLS DeePhi

>> 45
© Copyright 2018 Xilinx
Two Development Flows of Using Deephi DPU IP

˃ Vivado & SDK ˃ SDSoC

Traditional flow New high-level abstraction flow

Bottom up approach Top down approach

Suitable for FPGA designer Suitable for algorithm & software developer

Fine-grained customization Higher Productivity

© Copyright 2018 Xilinx


HW Integration with Vivado IPI
˃ Steps

Add DPU IP into repository

Add DPU into block design

Configure DPU parameters

Connect DPU with MPSoC(for reference)


‒ M_AXI_HP0 <-> S_AXI_HP0_FPD (ZYNQ)
‒ M_AXI_HP2 <-> S_AXI_HP1_FPD (ZYNQ)
‒ M_Axi_GP0 <-> S_AXI_LPD(ZYNQ)
‒ s_axi <-> M_AXI_HPM0_LPD (ZYNQ)

Assign Reg address for DPU in address editor


‒ e.g. 0x80000000, 4K space for one DPU

>> 47
© Copyright 2018 Xilinx
HW Integration with Vivado IPI (Cont.)
˃ Steps(Cont.)

Create top wrapper

Generate bitstream

Generate BOOT.BIN using Petalinux etc.

˃ Note
The port data width is consistent with DPU
data width
For frequency > 333MHz, clock wizard is
needed between MPSoC and DPU
Interrupt configuration was shown in binary.
[3]: 0- pl_ps_irq0 ; 1- pl_ps_irq1
[2:0]: interrupt number 0~7

>> 48
© Copyright 2018 Xilinx
SW Integration with SDK
˃ Device tree configuration
set interrupt number according to block design

set core-num

˃ OpenCV configuration
Enable in Filesystem Packages -> misc or libs

˃ Driver and DNNDK lib


Provide kernel information & OpenCV version
to Deephi

Deephi will provide driver and DNNDK package


with install script

Install driver and DNNDK lib

>> 49
© Copyright 2018 Xilinx
HW Integration with C-callable IP
Create a Library Library Use the library
#include “dpu.hpp”
void main(){
../include
˃ Steps …
uint32_t start = 0x1;
void dpu_set_start(uint32_t start); dpu.hpp dpu_set_start(start);
...

}
Header file dpu.hpp LFLAGS= -ldpu
#LFAGS = -ldpusw...

Create header file


SDSOC
(SDK/Vivado)
Package IP in Vivado

Create Makefile to generate *.a Vivado Packaged dpu IP


sdx_pack
PS

dpu
PL
libdpu.a
Configure DPU parameters Platform

Makefile
sdx_pack -header dpu.hpp -lib libdpu.a \
-func dpu_set_start -map start=s_axi:in:0x10 -func-end \ I/O I/O
Build application software -ip ../iprepo/dpu/component.xml -control none \
-add-ip-repo ../iprepo/src/ \ The packaged IP must use
-target-family zynquplus \ supported AXI and control
-target-cpu cortex-a53 -target-os linux -verbose interfaces

-header <header.h/pp>: Header file with function declarations. only one


top header file allowed
-lib : create a lib.a
-func <function_name> -map <swName>=<hwName>:direction:offset -
func-end
-ip: <component.xml>: IP packed by the Vivado IP integrator, only one top
IP allowed.

>> 50
© Copyright 2018 Xilinx
Deephi DPU IP Integration with SDSoC

C-callable IP

>> 51
© Copyright 2018 Xilinx
How to Use DNNK in SDSoC

Only 3 steps!

Write it Compile it Run it

© Copyright 2018 Xilinx


Resnet50 Example with C-callable DPU IP in SDSoC

© Copyright 2018 Xilinx


A Long Time for Every Build?

˃ SDSoC compiler compares the new data-motion network with the last one
˃ If the same, vpl will not be called to rerun syn & impl
˃ It only takes a few minutes if –
Use the same C-callable IP library
Use the same platform
Use the same project setting

© Copyright 2018 Xilinx


Multiple Sensors & Networks with C-callable DPU IP

• SDSoC 2018.2 Linux


ZCU102 (ZU9)
• 4 CNN models
ARM Cortex-A53 • Face detect, Joint detect, Traffic SSD,
Ped SSD
SDSoC Application
• 30, 12, 15, 13 FPS respectively
Video Lib App Stub Video Lib • 3 Live inputs + file / HDMI output
V4L2 DM* Driver DRM • Under 10 Watts
Linux

File
Face Traffic
detect SSD

USB3 D D
D D
Ped Joint
R R
HDMI SSD detect HDMI
ISP/
VPSS* 1x Deephi DPU
MIPI

© Copyright 2018 Xilinx


Availability

© Copyright 2018 Xilinx


Basic and Professional Editions Pricing TBD

˃ Timeframe
Early Access: Now DeePhi Professional
Public Access: Jan 2019
˃ To be available on AWS in Cloud Editions 3-day On-site
Training
˃ Add-on design service

Free
Pruning Tools
DeePhi Basic Access Pruning Technology
&
Compiler Compiler 3-day on-site training by a top-
notch ML expert
Everything you need Quantizer Quantizer &
to do it yourself 30-day evaluation with encrypted
Pruned Models Pruned Models pruning output

Unlimited Unlimited
Deployment Deployment

© Copyright 2018 Xilinx


Availability
˃ DNNDK
For DP8000(Z7020)/DP8020(ZU2) board, download from Deephi website
For other boards, separate package upon request
For pruning tool, separate upon request
˃ Demos & Ref Designs
General: Resnet50, Googlenet, VGG16, SSD, Yolo v2-v3, Tiny Yolo v2-v3, Mobilenet v2 etc..
Video surveillance: face detection & traffic structure
ADAS/AD: multi-channel detection & segmentation
C-callable DPU IP with SDSoC: Resnet50, Quad networks(Pedstrian, Pose, Face, Traffic)
˃ Documentation
DNNDK user guide
C-callable DPU IP w SDSoC user guide
DPU IP system integration user guide (Work in progress)
Pruning user guide (Work in progress)
˃ Request or Inquiry
Please contact Andy Luo, [email protected]
© Copyright 2018 Xilinx
Key Takeaway

1 Edge/Embeded ML bring great opportunities and challenges for Xilinx

2 Xilinx offers cutting-edge end-to-end Edge/Embedded ML solution

3 Tool/IP/Demo/Ref design available now for evaluation & development

© Copyright 2018 Xilinx


© Copyright 2018 Xilinx

You might also like