0% found this document useful (0 votes)

88 views

Edge Machine Learning For Embedded Deep Dive

Uploaded by

Gabriel Virbila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views

Edge Machine Learning For Embedded Deep Dive

Uploaded by

Gabriel Virbila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Machine learning for embedded deep dive

Presented By

Andy Luo
Sr. Product Marketing Manager
2018-10-01

© Copyright 2018 Xilinx

Key Machine Learning Applications for Xilinx

Surveillance ADAS/AD Robotics Data Center

Cloud ML
And there are many more …
Edge ML
© Copyright 2018 Xilinx
Xilinx Value Proposition in Edge/Embedded ML

Only HW/SW configurable device 2 High performance / low power with

1
for fast changing networks custom internal memory hierarchy

3 Future proof to lower 4 Low latency end-to-end 5 Scalable device family for
precisions different applications
© Copyright 2018 Xilinx
Key Challenges for Xilinx in Edge/Embedded ML

1 Deploy ML to Xilinx FPGA easily and quickly

2 Expand ML into non-FPGA customers

3 Delivers excellent performance with power & cost constraints

for diverse embedded applications

© Copyright 2018 Xilinx

Frameworks
& Libraries

Machine Learning

Development tools

USB3

Platforms HDMI

MIPI

© Copyright 2018 Xilinx

Deephi Edge ML Solution

© Copyright 2018 Xilinx

Unique, Patented Deep Learning Acceleration Techniques

˃ Best paper awards for breakthrough DL acceleration

˃ Deephi’s compression technology can:

Pruning
Reduce DL accelerator footprint into smaller devices

Increase performance per watt (higher performance and/or lower energy)

Quantization

Unique Pruning Technology Provides a Significant Competitive Advantage

Face detection Pose estimation Video analytics Lane detection Object detection Segmentation

Framework
Darknet

Compression
Pruning Quantization

Compilation
Compiler Assembler
Tools & IP
Runtime
Core API Loader

Driver Profiler

HW Platforms
Z7020 Board Z7020 SOM ZU2 SOM ZU2/3 Card ZU9 Card ZCU102 ZCU104 Ultra96
© Copyright 2018 Xilinx
Deephi also has LSTM IP for KU115/VU9P as a part of Cloud ML
DNNDK Overview

˃ DECENT (DEep ComprEssioN Tool)

DECENT N2Cube
˃ DNNC (Deep Neural Network Compiler)
DNNC Simulator

˃ DNNAS (Deep Neural Network ASsembler)

DNNAS DSight

˃ Runtime N2Cube (Cube of Nerual Network)

OS
˃ DPU Simulator – Internal tool
Host CPU DPU

˃ Profiler DSight

© Copyright 2018 Xilinx

Framework Support

• Pruning • Pruning • Quantization & Compilation

• Eval version
• Quantization • Quantization
• Pruning
• Compilation • Convertor for caffe
• Internal version

© Copyright 2018 Xilinx

DPU IP with High Efficiency EXTERNAL MEMORY

CPU MEM CONTROLLER

Utilization > 50% for mainstream neural networks

BUS

Data Mover 52%

INSTR
FETCHER GoogleNet-V3 23%
WEIGHTS WR IMG WR WB WR
SCHEDULER SCHEDULER SCHEDULER 14%
DECODER

SMART MEM FABRIC 51%

REG MAP DISPATCHER

ResNet-50 24%
WEIGHTS RD IMG RD 13%
SCHEDULER SCHEDULER

85%
CTRL PE Array
SIGNALS VGG16 40%
PE PE ... PE PE 18%

0% 20% 40% 60% 80% 100%

MISC CALC

AVG MAX ROI ELEMENT

Aristotle on 7020 FPGA Iphone8plus Kirin 970
POOL POOL POOL WISE ...
Source: Published results from Huawei

© Copyright 2018 Xilinx

Supported Operators
• Arbitrary Input Image Size • Deconv

• Conv • Depthwise conv

• Arbitrary Conv Kernel Size • Elementwise

• Arbitrary Conv Stride/Padding • FC(Int8/FP32)

• Dilation • Mean scale

• Pooling • Upsampling

• Max/Avg Pooling • Batch Normalization

• Arbitrary Max Pooling Size • Split

• Avg Pooling kernel size: 2x2~7x7 • Reorg

• Arbitrary Pooling Stride/Padding • Resize (Optional)

• ReLU / Leaky Relu • Softmax (Optional)

• Concat • Sigmoid (Optional)

© Copyright 2018 Xilinx

Constraints Between Layers
Next
Layer Layer
Type

●：Support ✕: Not support ○: Support when selecting additional features

slave-axi Master-axi-0 Master-axi-1 Master-axi-2

˃ B1152 64bits 64bits
Parallelism: 4 * 12 * 12
target Z7020/ZU2/ZU3 32bits 32bits
DPU
B1152
˃ B4096
Parallelism: 8 * 16 * 16
Target ZU5 and above
slave-axi Master-axi-0 Master-axi-1 Master-axi-2
128bits 128bits

32bits 32bits
DPU
B4096

© Copyright 2018 Xilinx

DPU Peak Perf & Power
DPU Peak3) Device
1) 2)
LUT Flip-Flops Block RAM DSP MACs Frequency Power
Config performance

Z7020 53200 106400 4.9Mb 220 1xB1152 576 230GOPS 200MHz 2W

ZU2 47000 94000 5.3Mb 240 1xB1152 576 576GOPS 500MHz 3.5W

ZU3 71000 141000 7.6Mb 360 1xB1152 576 576GOPS 500MHz N/A

ZU54) 117000 234000 5.1Mb+18Mb 1248 1xB4096 2048 1350GOPS 330MHz N/A
1xB4096 2048
ZU7EV 230000 461000 11Mb+27Mb 1728 2240GOPS 350MHz N/A
+2xB1152 +2*576
ZU9 274000 548000 32.1Mb 2520 2xB4096 4096 2700GOPS 330MHz 10W

1) One DSP48E is used for two int8 multiplication

2) MACs is constructed by DSP and LUT (if DSP is not enough)
3) Peak performance is calculated by MACs: GOPS = 2*MACs*Frequency
4) Just list our conservative projection in performance

>> 15
© Copyright 2018 Xilinx
DPU Utilization

LUT Slice_reg Block Ram DSPs

Single B1152 on Z7020 All logic 53200 106400 140 220
DPU 45535 56961 110.5 220
Utilization ratio 85.59% 53.53% 78.93% 100.00%
LUT Slice_reg Block Ram DSPs
Single B1152 on ZU2 All logic 47232 94464 150 240
DPU 40703 55083 112 240
Utilization ratio 86.18% 58.31% 74.67% 100.00%
LUT Slice_reg Block Ram DSPs
Single B1152 on ZU3 All logic 70560 141120 216 360
DPU_B1152 36560 68729 115.5 288
Utilization ratio 51.81% 48.70% 53.47% 66.67%
LUT Slice_reg Block Ram DSPs
Dual B4096 on ZU9 All logic 274080 548160 912 2520
DPU 156744 224650 501 2048
Utilization ratio 57.19% 40.98% 54.93% 81.27%

© Copyright 2018 Xilinx

Perf Improvement with the Next Version DPU
Performance Comparison (FPS)
600 Current B4096*2 wo Prune New B4096*3 wo Prune

500 445
400
313
300

200 179
118
73 92
100
12 28.3
0
VGG-SSD VGG16 ResNet50 GoogLeNet

*The FPS of VGG-SSD of end to end performance

*The FPS of VGG16/ResNet50/GoogLeNet is of CONV part (w/o FC layer)

Resource Utilization Comparison

DSP LUT FF BRAM

Current B4096*2 2048 156744 224650 501
Next Version B4096*3 1926 110311 255020 748.5

© Copyright 2018 Xilinx

DPU Scalability
Peak
Perf
INT8
(OPS)

6.8T ZU15

5.5T ZU11

4.1T ZU9

3.5T ZU7

2.9T ZU6
2.8T Z7100

2.4T ZU5

1.7T Z7045 DPU Configuration

1.6T Z7035 ZU4

1.2T ZU3

700G Z7030
576G ZU2
230G Z7020

115G Z7014S/Z7015
102G Z7012S
56G Z7010

© Copyright 2018 Xilinx

* B256/288/512/3136 work in progress
DNNDK Dev Flow

Five Steps 01 Model Compression

with DNNDK
02 Model Compilation

03 Programming

04 Hybrid Compilation

05 Execution

>> 19
© Copyright 2018 Xilinx
DECENT – Deephi Deep Compression Tool

© Copyright 2018 Xilinx

Deep Compression Overview

Deep compression Highlight

1/3 1/10 1/10 3X

Makes algorithm smaller and lighter Weight Bandwidth Model Perfor-

number load size mance

Compression Deep Compression Tool can achieve

efficiency significant compression on CNN and
RNN

Algorithm can be compressed 7 times

Accuracy without losing accuracy under SSD
object detection framework

© Copyright 2018 Xilinx

Pruning Tool – decent_p
Origin model
˃ 4 commands in decent_p
Ana Analyze
‒ analyze the network
Prune
‒ prune the network according to config Prune
Finetune
‒ finetune the network to recover accuracy
Transform finetune
‒ transform the pruned model to regular model

prune more?
Y

N
Transform

ana prune transform

pruned model

© Copyright 2018 Xilinx

Pruning Results
Baseline Pruning Result 1 Pruning Result 2
Classification Networks
Top-5 Top-5 ΔTop5 ratio Top-5 ΔTop5 ratio

Resnet50 [7.7G] 91.65% 91.23% -0.42% 40% 90.79% -0.86% 32%

Inception_v2 [4.0G] 91.07% 90.37% -0.70% 60% 90.07% -1.00% 55%

SqueezeNet [778M] 83.19% 82.46% -0.73% 89% 81.57% -1.62% 75%

Baseline Pruning Result 1 Pruning Result 2

Detection Networks
mAP mAP ΔmAP ratio mAP ΔmAP ratio

DetectNet [17.5G] 44.46 45.7 +1.24 63% 45.12 +0.66 50%

SSD+VGG [ 117G] 61.5 62.0 +0.5 16% 60.4 -1.1 10%

[A] SSD+VGG [ 173G] 57.1 58.7 +1.6 40% 56.6 -0.5 12%

[B] Yolov2 [ 198G] 80.4 81.9 +1.5 28% 79.2 -1.2 7%

© Copyright 2018 Xilinx

Pruning Example - SSD

140 SSD+VGG @Deephi Surveillance 4classes

117
120
100
80
61.5 63.4 63.5 63.4 62.4 62 61.5 61.1 61 60.8 59.2 60.4
57
60
37
40 27 23 19 17 15.6 14.6 13.6 12.2
20 11.6

0
1 2 3 4 5 6 7 8 9 10 11 12
operations (G) mAP (%)

Pruning Speedup on DPU (SSD)

120 103
100
71
80
FPS

60
40 18
20
0
11 7 G 19G 11 . 6 G
OPS
2x DPU-4096@ZU9

© Copyright 2018 Xilinx

120 SSD Pruned

SSD GPU

105

90
(batch=1)

75
FPS

45
Result of
30
DeePhi Pruning
15

Jetson TX2 ZU9 ZU5 ZU2 7020

Power 10W 10W 5W 3W 2W
© Copyright 2018 Xilinx
Quantization Tool – decent_q

˃ 4 commands in decent_q decent_q

quantize Calibration data
(100-1000 images)
‒ Quantize network Pre-trained model
test (fp32)
‒ Test network accuracy quantize
Quantized model
finetune (Int16/Int8/...)
‒ Finetune quantized network test
deploy
Origin training data
‒ Generate model for DPU

˃ Data needs to Y
increase accuracy finetune
Calibration data
‒ Quantize activation
Training data
N
deploy
‒ Further increase accuracy

Model for DPU

© Copyright 2018 Xilinx

Quantization Results
˃ Uniform Quantization
8-bit for both weights and activation
A small set of images for calibration

Float32 baseline 8-bit Quantization

Networks
Top1 Top5 Top1 ΔTop1 Top5 ΔTop5

Inception_v1 66.90% 87.68% 66.62% -0.28% 87.58% -0.10%

Inception_v2 72.78% 91.04% 72.40% -0.38% 90.82% -0.23%

Inception_v3 77.01% 93.29% 76.56% -0.45% 93.00% -0.29%

Inception_v4 79.74% 94.80% 79.42% -0.32% 94.64% -0.16%

ResNet-50 74.76% 92.09% 74.59% -0.17% 91.95% -0.14%

VGG16 70.97% 89.85% 70.77% -0.20% 89.76% -0.09%

Inception-ResNet-v2 79.95% 95.13% 79.45% -0.51% 94.97% -0.16%

© Copyright 2018 Xilinx

DNNDK API
dpuOpen() dpuGetOutputTensorSize()
dpuClose() dpuGetOutputTensorScale()
dpuLoadKernel() dpuGetOutputTensorHeight()
dpuDestroyKernel() dpuGetOutputTensorWidth()
dpuCreateTask() dpuGetOutputTensorChannel()
dpuRunTask() dpuGetTensorSize()
dpuDestroyTask() dpuGetTensorAddress()
˃ For more details, refer to
dpuEnableTaskProfile() dpuGetTensorScale() DNNDK User Guide
dpuGetTaskProfile() dpuGetTensorHeight()
dpuGetNodeProfile() dpuGetTensorWidth()
dpuGetInputTensor() dpuGetTensorChannel() http://www.deephi.com/technology/
dpuGetInputTensorAddress() dpuSetIntputTensorInCHWInt8() dnndk
dpuGetInputTensorSize() dpuSetIntputTensorInCHWFP32()
dpuGetInputTensorScale() dpuSetIntputTensorInHWCInt8()
dpuGetInputTensorHeight() dpuSetIntputTensorInHWCFP32()
dpuGetInputTensorWidth() dpuGetOutputTensorInCHWInt8()
dpuGetInputTensorChannel() dpuGetOutputTensorInCHWFP32()
dpuGetOutputTensor() dpuGetOutputTensorInHWCInt8()
dpuGetOutputTensorAddress() dpuGetOutputTensorInHWCFP32()

© Copyright 2018 Xilinx

Programming with DNNDK API

© Copyright 2018 Xilinx

DNNDK Hybrid Compilation Model

© Copyright 2018 Xilinx

Optimization in DNNC

© Copyright 2018 Xilinx

DNNDK Runtime Engine

© Copyright 2018 Xilinx

Supported Networks
Application Module Algorithm Model Development Compression Deployment
Face detection SSD, Densebox ✔ ✔ ✔
Landmark Localization Coordinates Regression ✔ N/A ✔
Face
Face recognition ResNet + Triplet / A-softmax Loss ✔ ✔ ✔
Face attributes recognition Classification and regression ✔ N/A ✔
Pedestrian Detection SSD ✔ ✔ ✔
Pose Estimation Coordinates Regression ✔ ✔ ✔
Pedestrian
Person Re-identification ResNet + Loss Fusion ✔
Object detection SSD, RefineDet ✔ ✔ ✔
Pedestrian Attributes Recognition GoogleNet ✔ ✔ ✔
Car Attributes Recognition GoogleNet ✔ ✔ ✔
Video Analytics
Car Logo Detection DenseBox ✔ ✔
Car Logo Recognition GoogleNet + Loss Fusion ✔ ✔
License Plate Detection Modified DenseBox ✔ ✔ ✔
License Plate Recognition GoogleNet + Multi-task Learning ✔ ✔ ✔
Object Detection SSD, YOLOv2, YOLOv3 ✔ ✔ ✔
3D Car Detection F-PointNet, AVOD-FPN ✔
ADAS/AD Lane Detection VPGNet ✔ ✔ ✔
Traffic Sign Detection Modified SSD ✔
Semantic Segmentation FPN ✔ ✔ ✔
Drivable Space Detection MobilenetV2-FPN ✔
Multi-task (Detection+Segmentation) Deephi ✔

© Copyright 2018 Xilinx

Measured Performance
500

450

400

350
PERFORMANCE (FPS)

Inception v1(3.2, 313)

300

250

200
Tiny Yolov2 (7,
168) Tiny Yolov3 (5.6, 170)
150
ResNet50
100 (7.7, 118)

VGG16 (30, 73)

50 Yolov2 (36,42)
Yolov3 (65,25) SSD (117,19.7)

0
0 20 40 60 80 100 120
COMPUTATION (GOPS PER IMAGE )

© Copyright 2018 Xilinx

Measured Performance (Cont.)
500
Inception v1 (1.6, 481)
Baseline Network
450
Pruned Network
400

350
PERFORMANCE (FPS)

Inception v1(3.2, 313)

300

250

200
Tiny Yolov2 (7, 168)
Tiny Yolov3 (5.6, 170)
150 ResNet50 (3.8, 150)
SSD (11.6, 129)
ResNet50
100 (7.7, 118) VGG16 (20, 100)
Yolov2 (16, 95) VGG16 (30, 73)
50 Yolov3 (17, 54)
Yolov2 (36,42)
Yolov3 (65,25) SSD (117,19.7)

0
0 20 40 60 80 100 120
COMPUTATION (GOP PER IMAGE )

© Copyright 2018 Xilinx

Measured Performance (Cont.)
500
Inception v1 (1.6, 481)
Baseline Network
450

Pruned Network
400

Deephi Designed Network

350
PERFORMANCE (FPS)

Inception v1(3.2, 313)

300

250

200
Tiny Yolov2 (7, 168)
Tiny Yolov3 (5.6, 170)
150 ResNet50 (3.8, 150)
SSD (11.6, 129)
ResNet50
100 (7.7, 118) FPN (8.9, 120) VGG16 (20, 100)
Yolov2 (16, VGG16 (30, 73)
95)
50 Yolov3 (17, 54)
Yolov2 (36,42)
VPGNet (10, 30) Yolov3 (65,25) SSD (117,19.7)

0
0 20 40 60 80 100 120
COMPUTATION (GOPS PER IMAGE)

© Copyright 2018 Xilinx

Out-of-box Supported Boards

˃ DP8000
Z7020 SOM

˃ DP2400
ZU9 PCIe card

˃ Deephi ZU2/3 board

˃ Xilinx ZCU102
˃ Xilinx ZCU104
˃ Avnet Ultra96

© Copyright 2018 Xilinx

Video Surveillance ML Solutions

Intelligent Video Analytics

IP Camera Solution Acceleration Solution

Face recognition camera 8-channel 1080P Video Analytics

with Zynq7020 with ZU9EG

Video Surveillance ML Ref Design
Gender : Female
Upper color : Yellow
Detection & Tracking Person Attributes Lower color : White
Hat : No
Backpack : No
Handbag : No
Other bag : No

Gender : Male
Upper color : Black
Detection & Tracking Person Attributes
Lower color : Black
Hat : No
Backpack : No
Handbag : No
Other bag : No

Detection & Tracking Car Attributes Color : White

Type : BUICK

Plate Detection
License Recognition Color : Blue
Number :渝C LC689

ADAS/AD ML Reference Design
2D/3D Object Detection Pedestrian Detection

Lane Detection

Segmentation Pose Estimation

Segmentation + Detection

8-ch Detection Demo
˃ Xilinx device
ZU9EG

˃ Network
SSD compact version

˃ Input image size to DPU

480 * 360

˃ Operations per frame

4.9G

˃ Performance
30fps per channel

4-ch Segmentation + Detection Demo
˃ Xilinx device
ZU9EG

˃ Network
FPN compact version
SSD compact version

˃ Input image size to DPU

FPN – 512 * 256
SSD – 480 * 360

˃ Operations per frame

FPN – 9G
SSD – 4.9G

˃ Performance
15fps per channel

ML Development with Deephi Solution

Development Method

Algorithm Algorithm Parameter Algorithm

C/C++
RTL Instruction
RTL

RTL

FPGA FPGA FPGA

Traditional OpenCL/HLS DeePhi

˃ Vivado & SDK ˃ SDSoC

Traditional flow New high-level abstraction flow

Bottom up approach Top down approach

Suitable for FPGA designer Suitable for algorithm & software developer

Fine-grained customization Higher Productivity

HW Integration with Vivado IPI
˃ Steps

Add DPU IP into repository

Add DPU into block design

Configure DPU parameters

Connect DPU with MPSoC(for reference)

‒ M_AXI_HP0 <-> S_AXI_HP0_FPD (ZYNQ)
‒ M_AXI_HP2 <-> S_AXI_HP1_FPD (ZYNQ)
‒ M_Axi_GP0 <-> S_AXI_LPD(ZYNQ)
‒ s_axi <-> M_AXI_HPM0_LPD (ZYNQ)

Assign Reg address for DPU in address editor

‒ e.g. 0x80000000, 4K space for one DPU

Create top wrapper

Generate bitstream

Generate BOOT.BIN using Petalinux etc.

˃ Note
The port data width is consistent with DPU
data width
For frequency > 333MHz, clock wizard is
needed between MPSoC and DPU
Interrupt configuration was shown in binary.
[3]: 0- pl_ps_irq0 ; 1- pl_ps_irq1
[2:0]: interrupt number 0~7

set core-num

˃ OpenCV configuration
Enable in Filesystem Packages -> misc or libs

˃ Driver and DNNDK lib

Provide kernel information & OpenCV version
to Deephi

Deephi will provide driver and DNNDK package

with install script

Install driver and DNNDK lib

>> 49
© Copyright 2018 Xilinx
HW Integration with C-callable IP
Create a Library Library Use the library
#include “dpu.hpp”
void main(){
../include
˃ Steps …
uint32_t start = 0x1;
void dpu_set_start(uint32_t start); dpu.hpp dpu_set_start(start);
...
…
}
Header file dpu.hpp LFLAGS= -ldpu
#LFAGS = -ldpusw...

Create header file

SDSOC
(SDK/Vivado)
Package IP in Vivado

Create Makefile to generate *.a Vivado Packaged dpu IP

sdx_pack
PS

dpu
PL
libdpu.a
Configure DPU parameters Platform

Makefile
sdx_pack -header dpu.hpp -lib libdpu.a \
-func dpu_set_start -map start=s_axi:in:0x10 -func-end \ I/O I/O
Build application software -ip ../iprepo/dpu/component.xml -control none \
-add-ip-repo ../iprepo/src/ \ The packaged IP must use
-target-family zynquplus \ supported AXI and control
-target-cpu cortex-a53 -target-os linux -verbose interfaces

-header <header.h/pp>: Header file with function declarations. only one

top header file allowed
-lib : create a lib.a
-func <function_name> -map <swName>=<hwName>:direction:offset -
func-end
-ip: <component.xml>: IP packed by the Vivado IP integrator, only one top
IP allowed.

C-callable IP

Only 3 steps!

Write it Compile it Run it

Resnet50 Example with C-callable DPU IP in SDSoC

A Long Time for Every Build?

˃ SDSoC compiler compares the new data-motion network with the last one
˃ If the same, vpl will not be called to rerun syn & impl
˃ It only takes a few minutes if –
Use the same C-callable IP library
Use the same platform
Use the same project setting

Multiple Sensors & Networks with C-callable DPU IP

• SDSoC 2018.2 Linux

ZCU102 (ZU9)
• 4 CNN models
ARM Cortex-A53 • Face detect, Joint detect, Traffic SSD,
Ped SSD
SDSoC Application
• 30, 12, 15, 13 FPS respectively
Video Lib App Stub Video Lib • 3 Live inputs + file / HDMI output
V4L2 DM* Driver DRM • Under 10 Watts
Linux

File
Face Traffic
detect SSD

USB3 D D
D D
Ped Joint
R R
HDMI SSD detect HDMI
ISP/
VPSS* 1x Deephi DPU
MIPI

Availability

Basic and Professional Editions Pricing TBD

˃ Timeframe
Early Access: Now DeePhi Professional
Public Access: Jan 2019
˃ To be available on AWS in Cloud Editions 3-day On-site
Training
˃ Add-on design service

Free
Pruning Tools
DeePhi Basic Access Pruning Technology
&
Compiler Compiler 3-day on-site training by a top-
notch ML expert
Everything you need Quantizer Quantizer &
to do it yourself 30-day evaluation with encrypted
Pruned Models Pruned Models pruning output

Unlimited Unlimited
Deployment Deployment

Availability
˃ DNNDK
For DP8000(Z7020)/DP8020(ZU2) board, download from Deephi website
For other boards, separate package upon request
For pruning tool, separate upon request
˃ Demos & Ref Designs
General: Resnet50, Googlenet, VGG16, SSD, Yolo v2-v3, Tiny Yolo v2-v3, Mobilenet v2 etc..
Video surveillance: face detection & traffic structure
ADAS/AD: multi-channel detection & segmentation
C-callable DPU IP with SDSoC: Resnet50, Quad networks(Pedstrian, Pose, Face, Traffic)
˃ Documentation
DNNDK user guide
C-callable DPU IP w SDSoC user guide
DPU IP system integration user guide (Work in progress)
Pruning user guide (Work in progress)
˃ Request or Inquiry
Please contact Andy Luo, [email protected]
© Copyright 2018 Xilinx
Key Takeaway

1 Edge/Embeded ML bring great opportunities and challenges for Xilinx

2 Xilinx offers cutting-edge end-to-end Edge/Embedded ML solution

3 Tool/IP/Demo/Ref design available now for evaluation & development

TCL c805 Calibration
100% (2)
TCL c805 Calibration
7 pages
SU22 - PD1 - Set1
No ratings yet
SU22 - PD1 - Set1
66 pages
Intro To Deep Learning
No ratings yet
Intro To Deep Learning
39 pages
High-Performance Hardware For Machine Learning - 0916
No ratings yet
High-Performance Hardware For Machine Learning - 0916
68 pages
Ug1327 DNNDK User Guide
No ratings yet
Ug1327 DNNDK User Guide
172 pages
Deep Learning Cookbook
No ratings yet
Deep Learning Cookbook
24 pages
Accelerating Deep Neural Networks Implem
No ratings yet
Accelerating Deep Neural Networks Implem
18 pages
Embedded Deep Learning Accelerators - A Survey On Recent Advances
No ratings yet
Embedded Deep Learning Accelerators - A Survey On Recent Advances
19 pages
Paper 8
No ratings yet
Paper 8
7 pages
Tutorial-on-DNN-6-of-9-Network-and-Hardware-Co-Design
No ratings yet
Tutorial-on-DNN-6-of-9-Network-and-Hardware-Co-Design
60 pages
Futureinternet 12 00113 v2
No ratings yet
Futureinternet 12 00113 v2
22 pages
introducing-the-versal-architecture
No ratings yet
introducing-the-versal-architecture
35 pages
Module 10 - Learners Guide
No ratings yet
Module 10 - Learners Guide
29 pages
Hardware Architectures For Deep Neural Networks: ISCA Tutorial June 24, 2017
No ratings yet
Hardware Architectures For Deep Neural Networks: ISCA Tutorial June 24, 2017
290 pages
EECS251Leture-JennyHuang 2021
No ratings yet
EECS251Leture-JennyHuang 2021
67 pages
An_End-to-End_Workflow_to_Efficiently_Compress_and_Deploy_DNN_Classifiers_on_SoC_FPGA
No ratings yet
An_End-to-End_Workflow_to_Efficiently_Compress_and_Deploy_DNN_Classifiers_on_SoC_FPGA
4 pages
2020_01_15_vivienne_sze_efficient_computing
No ratings yet
2020_01_15_vivienne_sze_efficient_computing
86 pages
Zhan Xu Huawei
No ratings yet
Zhan Xu Huawei
35 pages
Hardware Architectures For Deep Neural Networks-MIT'16
No ratings yet
Hardware Architectures For Deep Neural Networks-MIT'16
300 pages
Hardware Accleration For ML
No ratings yet
Hardware Accleration For ML
26 pages
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
No ratings yet
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
101 pages
Tutorial On DNN 1 of 9 Background of DNNs
No ratings yet
Tutorial On DNN 1 of 9 Background of DNNs
65 pages
ZyNet Automating Deep Neural Network Implementation On Low-Cost Reconfigurable Edge Computing Platforms
No ratings yet
ZyNet Automating Deep Neural Network Implementation On Low-Cost Reconfigurable Edge Computing Platforms
4 pages
DL Inference FPGA Class1
No ratings yet
DL Inference FPGA Class1
56 pages
FPGA CNN Project Paper
No ratings yet
FPGA CNN Project Paper
31 pages
dlpl-solution-brief-v0-1a
No ratings yet
dlpl-solution-brief-v0-1a
2 pages
A DNN Optimization Framework With Unlabeled
No ratings yet
A DNN Optimization Framework With Unlabeled
5 pages
Intro To Deep Learning
100% (1)
Intro To Deep Learning
35 pages
UNPU an Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision
No ratings yet
UNPU an Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision
13 pages
GPU Bootcamp Samhar
100% (1)
GPU Bootcamp Samhar
96 pages
Embedded_Deep_Learning_Accelerators_A_Survey_on_Recent_Advances
No ratings yet
Embedded_Deep_Learning_Accelerators_A_Survey_on_Recent_Advances
19 pages
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
No ratings yet
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
12 pages
Deep Learning in Matlab
No ratings yet
Deep Learning in Matlab
36 pages
Deep NN - Theory, Tutorial and Survey
No ratings yet
Deep NN - Theory, Tutorial and Survey
32 pages
15 ML
No ratings yet
15 ML
60 pages
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
No ratings yet
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
41 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
Capra 2020
No ratings yet
Capra 2020
48 pages
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
No ratings yet
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
8 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
5 pages
Day5_03_Converting Neural Networks model into Optimzied Code
No ratings yet
Day5_03_Converting Neural Networks model into Optimzied Code
25 pages
Ten Lessons From Three Generations Shaped Google S Tpuv4i
No ratings yet
Ten Lessons From Three Generations Shaped Google S Tpuv4i
40 pages
FP-BNN-on-FPGA
No ratings yet
FP-BNN-on-FPGA
15 pages
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
No ratings yet
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
42 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
15 pages
Introduction To Deep Neural Networks - DataCamp
No ratings yet
Introduction To Deep Neural Networks - DataCamp
10 pages
An Empirical Approach To Enhance Performance For Scalable CORDIC-Based Deep Neural Networks
No ratings yet
An Empirical Approach To Enhance Performance For Scalable CORDIC-Based Deep Neural Networks
32 pages
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network Using Stochastic Computing
No ratings yet
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network Using Stochastic Computing
14 pages
DLBench A Comprehensive Experimental Evaluation of
No ratings yet
DLBench A Comprehensive Experimental Evaluation of
23 pages
Deep-Learning-Optimization
No ratings yet
Deep-Learning-Optimization
62 pages
Applsci 12 10771 v2
No ratings yet
Applsci 12 10771 v2
44 pages
Jason Vidmar - AI and SDR
No ratings yet
Jason Vidmar - AI and SDR
22 pages
Hidet: Task-Mapping Programming Paradigm For Deep Learning Tensor Programs
No ratings yet
Hidet: Task-Mapping Programming Paradigm For Deep Learning Tensor Programs
15 pages
10 3390@electronics8030295
No ratings yet
10 3390@electronics8030295
15 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Osdi18 Chen
No ratings yet
Osdi18 Chen
17 pages
applsci-15-00688-v3
No ratings yet
applsci-15-00688-v3
21 pages
14280
No ratings yet
14280
47 pages
Autoencoders: Parallel Programming Parallel Processing
No ratings yet
Autoencoders: Parallel Programming Parallel Processing
5 pages
FP-DNN An Automated Framework For Mapping
No ratings yet
FP-DNN An Automated Framework For Mapping
8 pages
Software Defined Networking (SDN) - a definitive guide
From Everand
Software Defined Networking (SDN) - a definitive guide
Rajesh Kumar Sundararajan
2/5 (2)
OpenCL Programming by Example
From Everand
OpenCL Programming by Example
Koushik Bhattacharyya
No ratings yet
NX Knowledge Fusion CADVertex
No ratings yet
NX Knowledge Fusion CADVertex
5 pages
Introduction To IoT With Machine Learning and Image Processing Using Raspberry Pi (Shrirang Ambaji Kulkarni, Varadrah P. Gurupur Etc.) (Z-Library)
No ratings yet
Introduction To IoT With Machine Learning and Image Processing Using Raspberry Pi (Shrirang Ambaji Kulkarni, Varadrah P. Gurupur Etc.) (Z-Library)
167 pages
IT Lesson 4
No ratings yet
IT Lesson 4
4 pages
AN14120
No ratings yet
AN14120
17 pages
RapidRecovery 6.6 ReleaseNotes En-Us
No ratings yet
RapidRecovery 6.6 ReleaseNotes En-Us
15 pages
Lecture 3 Creational Patterns Complete
No ratings yet
Lecture 3 Creational Patterns Complete
51 pages
BW Training
No ratings yet
BW Training
44 pages
Autodesk® Advance Steel: Learn To Create and Customize A Fresh New Drawing Style
No ratings yet
Autodesk® Advance Steel: Learn To Create and Customize A Fresh New Drawing Style
35 pages
PP_UNIT-5
No ratings yet
PP_UNIT-5
19 pages
Jenis Komponen Dan Arsitektur Komputer
No ratings yet
Jenis Komponen Dan Arsitektur Komputer
30 pages
Madhuuu
No ratings yet
Madhuuu
161 pages
Exercises 5
No ratings yet
Exercises 5
5 pages
Website Development Process
No ratings yet
Website Development Process
12 pages
RPM Build Steps
No ratings yet
RPM Build Steps
7 pages
IT Syllabus 2017 Padeepz
No ratings yet
IT Syllabus 2017 Padeepz
136 pages
01 Minerals Library Overview PDF
No ratings yet
01 Minerals Library Overview PDF
67 pages
ReleaseNotes
No ratings yet
ReleaseNotes
31 pages
Document 1070954.1
No ratings yet
Document 1070954.1
10 pages
Software Engineering
No ratings yet
Software Engineering
79 pages
PSoC 3-PSoC 5LP System Reference Guide (Cy - Boot - v5 - 70 - Psoc3-5)
No ratings yet
PSoC 3-PSoC 5LP System Reference Guide (Cy - Boot - v5 - 70 - Psoc3-5)
110 pages
Lecture 02 Oop210
No ratings yet
Lecture 02 Oop210
17 pages
Level One Dante Certification Program Introduction Second Edition Audinate en Pres
No ratings yet
Level One Dante Certification Program Introduction Second Edition Audinate en Pres
141 pages
Graphic Thinking For Architects and Designers PDF - Google Search
No ratings yet
Graphic Thinking For Architects and Designers PDF - Google Search
2 pages
Dehancer Photo Plugin Quick Guide
No ratings yet
Dehancer Photo Plugin Quick Guide
33 pages
Ramzan MCA Project 2
No ratings yet
Ramzan MCA Project 2
28 pages
Aben 4422
No ratings yet
Aben 4422
44 pages
Poweredge t160 Spec Sheet
No ratings yet
Poweredge t160 Spec Sheet
3 pages
III b.tech II Sem Mad Unit-4 Lecture Notes
No ratings yet
III b.tech II Sem Mad Unit-4 Lecture Notes
53 pages