SlideShare a Scribd company logo
Combining Asynchronous Task Parallelism
and Intel SGX for Secure Deep Learning
19th European Dependable Computing Conference
Leuven, Belgium
10 April, 2024
Xavier Martorel
Universitat Politecnica de Catalanya
Valerio Schiavoni
University of Neuchâtel
Isabelly Rocha
University of Neuchâtel
Pascal Felber
University of Neuchâtel
Marcelo Pasin
University of Neuchâtel
Osman Unsal
Barcelona Super Computing
[Practical Experience Report]
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
Secure Deep Learning
2
Intel SGX
•Several deep-learning applications require private data
•Face recognition, speech recognition, self-driving cars,
genetic sequence modeling, NLP, etc.
•High accuracy, but very high training costs
• 11.5 hours on commodity hardware for NLP models
• Performance vs. Security trade-o
ff
•Privacy, integrity
•We want to exploit HW heterogeneity: how ?
no security guarantees on GPUs
(see you in 2 years)
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
•Addresses the performance requirements
•1 task ➡ program statement
•Arbitrary granularity
•No dependencies to other tasks, run in parallel
•Native support for HW heterogeneity
•Several frameworks exist
•OpenMP, charms++, OmpSs
•https://pm.bsc.es/ompss
Performance: Task-level Parallelism
3
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
•Trusted Execution Environments
•Addresses the security requirements
•Hardware area protected against powerful attacks
•Its content is an enclave, shielded from:
•compromised OS, compromised system libraries, attackers
with physical access to a machine
•Several implementations exist nowadays:
•Intel: TDX, SGX
•ARM: TrustZone, CCA
•AMD: SEV, SEV-SNP
•RISC-V: Keystone, MultiZone
•Google: Trusty, …
Security: TEEs
4
In this talk
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
Intel Software Guard Extensions
5
Intel SGX
Enclave
Create enclave
Call trusted
function
…
Execute
Return
Call
gate
Trusted function
Untrusted Trusted
➊
➋
➏
➎
➍
➌
➐
Intel SGX
Operating System
•Available since 2015, SkyLake
•Hardware-protected area on die
•Split the program in two:
•Untrusted vs. trusted (enclaves)
•Transparent encryption/decryption
•Code integrity
•Intel Attestation Service
•Memory limits: EPC, up to 64 GBs in recent server-grade
CPUs, older generations only ~100 MB
•Intel SDK, C/C++, Rust SDK, containers (Scone, SGX-LKL…)
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
Do we need a new system?
6
•State-of-the-art systems for secure computation with SGX
•At the time of this work, none would
fi
t the bill
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
SGX-OmpSs: example
7
1 int SGX_CDECL main(int argc, char*argv[])
2 {
3 ...
4 double *A, B, C = (double *) malloc(DIM * DIM * sizeof(double));
5 fill_random(A); fill_random(B); fill_random(C);
6 for(i=0;i<DIM;i++)
7 for (j = 0; j < DIM; j++)
8 for (k = 0; k < DIM; k++) {
9 // OmpSs pragmas
10 #pragma omp task in(A[i][k], B[k][j]) inout(C[i][j]) no_copy_deps
11 // SGX ecall
12 ecall_matmul(global_eid, &A[i][k], &B[k][j], &C[i][j], BSIZE); }
13 // OmpSs pragmas
14 #pragma omp taskwait //barrier to wait for pending tasks
…
}
•Matrix multiplication, 2 pragmas, 1 sgx ecall
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
8
create
enclave
call
Trusted() return
process
secrets
Untrusted Trusted
SGX Compiler
DFiant
HDL
CUDA
MaxJ
Enclave
Kernels
programmer
annotates SGX tasks
OmpsSS application
1
Mercurium Compiler
GCC OmpSs.elf
source code +
annotations
(calls to Nanos++)
2
3
Nanos Enclave Support
4
SGX-OmpSs: work
fl
ow
•Main contributions of this work, called SGX-OmpSs:
1.Integration of Intel SGX and task-based framework OmpSs
2.Application to deep-neural network applications
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
●Intel E3-1275 (SGX 1.0), 4 cores, 2 threads, 92 MiB EPC
●See more results in the paper:
● micro-benchmarks
● energy considerations
● 5 lessons learned
●In the rest of this talk:
● one micro-benchmark
● secure task-based DL
● YOLO-Pascal, LENET-MNIST
9
Evaluation
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
Microbenchmarks
10
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
Microbenchmarks
10
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
Microbenchmarks
10
Lesson 1:large overheads for “secure” versions
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
11
0
100
200
300
400
500
sgx 2 4 8
-100
-80
-60
-40
-20
0
20
40
60
Runtime
[s]
Difference
[%]
YOLO-Pascal LENET-MNIST
Runtime 🏃
•Real-time object detection on the Pascal VOC 2012 dataset
•Hand-written digits, lightweight CNN
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
11
0
100
200
300
400
500
sgx 2 4 8
-100
-80
-60
-40
-20
0
20
40
60
Runtime
[s]
Difference
[%]
YOLO-Pascal LENET-MNIST
Runtime 🏃
•Real-time object detection on the Pascal VOC 2012 dataset
•Hand-written digits, lightweight CNN
baseline, no parallelism
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
11
0
100
200
300
400
500
sgx 2 4 8
-100
-80
-60
-40
-20
0
20
40
60
Runtime
[s]
Difference
[%]
YOLO-Pascal LENET-MNIST
Runtime 🏃
•Real-time object detection on the Pascal VOC 2012 dataset
•Hand-written digits, lightweight CNN
baseline, no parallelism
lower is
better
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
11
0
100
200
300
400
500
sgx 2 4 8
-100
-80
-60
-40
-20
0
20
40
60
Runtime
[s]
Difference
[%]
YOLO-Pascal LENET-MNIST
Runtime 🏃
•Real-time object detection on the Pascal VOC 2012 dataset
•Hand-written digits, lightweight CNN
baseline, no parallelism
lower is
better
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
12
5
10
15
20
25
30
35
sgx 2 4 8
-100
-80
-60
-40
-20
0
20
40
60
80
100
120
140
Energy
[kJ]
Difference
[%]
YOLO-Pascal LENET-MNIST
Energy🔋🪫
Lesson 5: predicting performances is not easy, must be done on a case-by-case
read paper for 2-4
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
12
5
10
15
20
25
30
35
sgx 2 4 8
-100
-80
-60
-40
-20
0
20
40
60
80
100
120
140
Energy
[kJ]
Difference
[%]
YOLO-Pascal LENET-MNIST
Energy🔋🪫
Lesson 5: predicting performances is not easy, must be done on a case-by-case
read paper for 2-4
valerio.schiavoni@unine.ch - SGX-OmpSs - EDCC’24
/11
Conclusion
13
•SGX-OmpSs can accelerate the execution of secure applications
•Easy to use it in any application domain
•It exploits the asynchronous task parallelism paradigm
•For SGX-based applications, e
ff
orts to port to SGX-OmpSs are
minimal
•Taski
fi
ed deep-learning workloads improve runtime (up to 94%)
and reduce energy requirements (up to 92%)
•In a (far) future: extend to FPGAs and secure GPUs
Ad

More Related Content

Similar to Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning (20)

AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Robb Boyd
 
VLSI
VLSIVLSI
VLSI
MAYANK KUMAR
 
Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008
Jan Wedekind
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
DataStax Academy
 
VLSI
VLSIVLSI
VLSI
MAYANK KUMAR
 
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentationHiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation
VEDLIoT Project
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
Intel® Software
 
Digital VLSI Design and FPGA Implementation
Digital VLSI Design and FPGA ImplementationDigital VLSI Design and FPGA Implementation
Digital VLSI Design and FPGA Implementation
Amber Bhaumik
 
WebLogic Event Server - Alexandre Alves, BEA
WebLogic Event Server - Alexandre Alves, BEAWebLogic Event Server - Alexandre Alves, BEA
WebLogic Event Server - Alexandre Alves, BEA
mfrancis
 
One library for all Java encryption
One library for all Java encryptionOne library for all Java encryption
One library for all Java encryption
Dan Cvrcek
 
Anup Rungta
Anup RungtaAnup Rungta
Anup Rungta
Anup Rungta
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
LEGATO project
 
Providing user security guarantees in public infrastructure clouds
Providing user security guarantees in public infrastructure cloudsProviding user security guarantees in public infrastructure clouds
Providing user security guarantees in public infrastructure clouds
Finalyearprojects Toall
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
Marco77328
 
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Elasticsearch
 
CV-RENJINIK-27062016
CV-RENJINIK-27062016CV-RENJINIK-27062016
CV-RENJINIK-27062016
Renjini K
 
IBM Cloud Paris Meetup - 20180628 - OpenSense
IBM Cloud Paris Meetup - 20180628 - OpenSenseIBM Cloud Paris Meetup - 20180628 - OpenSense
IBM Cloud Paris Meetup - 20180628 - OpenSense
IBM France Lab
 
Using ScaleIO in an OpenStack Environment
Using ScaleIO in an OpenStack EnvironmentUsing ScaleIO in an OpenStack Environment
Using ScaleIO in an OpenStack Environment
Jason Sturgeon
 
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
David Pasek
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Robb Boyd
 
Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008
Jan Wedekind
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
DataStax Academy
 
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentationHiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation
VEDLIoT Project
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
Intel® Software
 
Digital VLSI Design and FPGA Implementation
Digital VLSI Design and FPGA ImplementationDigital VLSI Design and FPGA Implementation
Digital VLSI Design and FPGA Implementation
Amber Bhaumik
 
WebLogic Event Server - Alexandre Alves, BEA
WebLogic Event Server - Alexandre Alves, BEAWebLogic Event Server - Alexandre Alves, BEA
WebLogic Event Server - Alexandre Alves, BEA
mfrancis
 
One library for all Java encryption
One library for all Java encryptionOne library for all Java encryption
One library for all Java encryption
Dan Cvrcek
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
LEGATO project
 
Providing user security guarantees in public infrastructure clouds
Providing user security guarantees in public infrastructure cloudsProviding user security guarantees in public infrastructure clouds
Providing user security guarantees in public infrastructure clouds
Finalyearprojects Toall
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
Marco77328
 
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Elasticsearch
 
CV-RENJINIK-27062016
CV-RENJINIK-27062016CV-RENJINIK-27062016
CV-RENJINIK-27062016
Renjini K
 
IBM Cloud Paris Meetup - 20180628 - OpenSense
IBM Cloud Paris Meetup - 20180628 - OpenSenseIBM Cloud Paris Meetup - 20180628 - OpenSense
IBM Cloud Paris Meetup - 20180628 - OpenSense
IBM France Lab
 
Using ScaleIO in an OpenStack Environment
Using ScaleIO in an OpenStack EnvironmentUsing ScaleIO in an OpenStack Environment
Using ScaleIO in an OpenStack Environment
Jason Sturgeon
 
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
David Pasek
 

More from vschiavoni (13)

ACM Middleware 2024 PC Chairs Opening Remarks
ACM Middleware 2024 PC Chairs Opening RemarksACM Middleware 2024 PC Chairs Opening Remarks
ACM Middleware 2024 PC Chairs Opening Remarks
vschiavoni
 
DEBS-2023.pdf
DEBS-2023.pdfDEBS-2023.pdf
DEBS-2023.pdf
vschiavoni
 
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...Shielding Federated Learning Systems against Inference Attacks with ARM Trust...
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...
vschiavoni
 
SafeFS: A Modular Architecture for Secure User-Space File Systems (One FUSE t...
SafeFS: A Modular Architecture for Secure User-Space File Systems (One FUSE t...SafeFS: A Modular Architecture for Secure User-Space File Systems (One FUSE t...
SafeFS: A Modular Architecture for Secure User-Space File Systems (One FUSE t...
vschiavoni
 
X-Search: Revisiting private web search using Intel SGX
X-Search: Revisiting private web search using Intel SGXX-Search: Revisiting private web search using Intel SGX
X-Search: Revisiting private web search using Intel SGX
vschiavoni
 
SPLAY: Distributed Systems Made Simple
SPLAY: Distributed Systems Made SimpleSPLAY: Distributed Systems Made Simple
SPLAY: Distributed Systems Made Simple
vschiavoni
 
Actor concurrency for the JVM: a case study
Actor concurrency for the JVM: a case studyActor concurrency for the JVM: a case study
Actor concurrency for the JVM: a case study
vschiavoni
 
DHT and NAT
DHT and NATDHT and NAT
DHT and NAT
vschiavoni
 
FraSCAti: An Open SCA Platform
FraSCAti: An Open SCA PlatformFraSCAti: An Open SCA Platform
FraSCAti: An Open SCA Platform
vschiavoni
 
Spring Intro
Spring IntroSpring Intro
Spring Intro
vschiavoni
 
Scorware - Spring Introduction
Scorware - Spring IntroductionScorware - Spring Introduction
Scorware - Spring Introduction
vschiavoni
 
BindingFactory
BindingFactoryBindingFactory
BindingFactory
vschiavoni
 
Maven: Convention over Configuration
Maven: Convention over ConfigurationMaven: Convention over Configuration
Maven: Convention over Configuration
vschiavoni
 
ACM Middleware 2024 PC Chairs Opening Remarks
ACM Middleware 2024 PC Chairs Opening RemarksACM Middleware 2024 PC Chairs Opening Remarks
ACM Middleware 2024 PC Chairs Opening Remarks
vschiavoni
 
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...Shielding Federated Learning Systems against Inference Attacks with ARM Trust...
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...
vschiavoni
 
SafeFS: A Modular Architecture for Secure User-Space File Systems (One FUSE t...
SafeFS: A Modular Architecture for Secure User-Space File Systems (One FUSE t...SafeFS: A Modular Architecture for Secure User-Space File Systems (One FUSE t...
SafeFS: A Modular Architecture for Secure User-Space File Systems (One FUSE t...
vschiavoni
 
X-Search: Revisiting private web search using Intel SGX
X-Search: Revisiting private web search using Intel SGXX-Search: Revisiting private web search using Intel SGX
X-Search: Revisiting private web search using Intel SGX
vschiavoni
 
SPLAY: Distributed Systems Made Simple
SPLAY: Distributed Systems Made SimpleSPLAY: Distributed Systems Made Simple
SPLAY: Distributed Systems Made Simple
vschiavoni
 
Actor concurrency for the JVM: a case study
Actor concurrency for the JVM: a case studyActor concurrency for the JVM: a case study
Actor concurrency for the JVM: a case study
vschiavoni
 
FraSCAti: An Open SCA Platform
FraSCAti: An Open SCA PlatformFraSCAti: An Open SCA Platform
FraSCAti: An Open SCA Platform
vschiavoni
 
Scorware - Spring Introduction
Scorware - Spring IntroductionScorware - Spring Introduction
Scorware - Spring Introduction
vschiavoni
 
BindingFactory
BindingFactoryBindingFactory
BindingFactory
vschiavoni
 
Maven: Convention over Configuration
Maven: Convention over ConfigurationMaven: Convention over Configuration
Maven: Convention over Configuration
vschiavoni
 
Ad

Recently uploaded (20)

APES 6.5 Presentation Fossil Fuels .pdf
APES 6.5 Presentation Fossil Fuels   .pdfAPES 6.5 Presentation Fossil Fuels   .pdf
APES 6.5 Presentation Fossil Fuels .pdf
patelereftu
 
Influenza-Understanding-the-Deadly-Virus.pptx
Influenza-Understanding-the-Deadly-Virus.pptxInfluenza-Understanding-the-Deadly-Virus.pptx
Influenza-Understanding-the-Deadly-Virus.pptx
diyapadhiyar
 
Effect of nutrition in Entomophagous Insectson
Effect of nutrition in Entomophagous InsectsonEffect of nutrition in Entomophagous Insectson
Effect of nutrition in Entomophagous Insectson
JabaskumarKshetri
 
Concise Notes on tree and graph data structure
Concise Notes on tree and graph data structureConcise Notes on tree and graph data structure
Concise Notes on tree and graph data structure
YekoyeTigabu2
 
RAPID DIAGNOSTIC TEST (RDT) overviewppt.pptx
RAPID DIAGNOSTIC TEST (RDT)  overviewppt.pptxRAPID DIAGNOSTIC TEST (RDT)  overviewppt.pptx
RAPID DIAGNOSTIC TEST (RDT) overviewppt.pptx
nietakam
 
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
home
 
Structure formation with primordial black holes: collisional dynamics, binari...
Structure formation with primordial black holes: collisional dynamics, binari...Structure formation with primordial black holes: collisional dynamics, binari...
Structure formation with primordial black holes: collisional dynamics, binari...
Sérgio Sacani
 
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Ali Raei
 
Infrastructure for Tracking Information Flow from Social Media to U.S. TV New...
Infrastructure for Tracking Information Flow from Social Media to U.S. TV New...Infrastructure for Tracking Information Flow from Social Media to U.S. TV New...
Infrastructure for Tracking Information Flow from Social Media to U.S. TV New...
Himarsha Jayanetti
 
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptxPresentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
muralinath2
 
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptxQuiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
NutriGen
 
Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025
Shiva Nejati
 
Polytene chromosomes. A Practical Lecture.pptx
Polytene chromosomes. A Practical Lecture.pptxPolytene chromosomes. A Practical Lecture.pptx
Polytene chromosomes. A Practical Lecture.pptx
Dr Showkat Ahmad Wani
 
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
muralinath2
 
Skin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _ControlSkin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _Control
muralinath2
 
Multydisciplinary Nature of Environmental Studies
Multydisciplinary Nature of Environmental StudiesMultydisciplinary Nature of Environmental Studies
Multydisciplinary Nature of Environmental Studies
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Sérgio Sacani
 
Chapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.pptChapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.ppt
JessaBalanggoyPagula
 
On the Lunar Origin of Near-Earth Asteroid 2024 PT5
On the Lunar Origin of Near-Earth Asteroid 2024 PT5On the Lunar Origin of Near-Earth Asteroid 2024 PT5
On the Lunar Origin of Near-Earth Asteroid 2024 PT5
Sérgio Sacani
 
Preparation of Permanent mounts of Parasitic Protozoans.pptx
Preparation of Permanent mounts of Parasitic Protozoans.pptxPreparation of Permanent mounts of Parasitic Protozoans.pptx
Preparation of Permanent mounts of Parasitic Protozoans.pptx
Dr Showkat Ahmad Wani
 
APES 6.5 Presentation Fossil Fuels .pdf
APES 6.5 Presentation Fossil Fuels   .pdfAPES 6.5 Presentation Fossil Fuels   .pdf
APES 6.5 Presentation Fossil Fuels .pdf
patelereftu
 
Influenza-Understanding-the-Deadly-Virus.pptx
Influenza-Understanding-the-Deadly-Virus.pptxInfluenza-Understanding-the-Deadly-Virus.pptx
Influenza-Understanding-the-Deadly-Virus.pptx
diyapadhiyar
 
Effect of nutrition in Entomophagous Insectson
Effect of nutrition in Entomophagous InsectsonEffect of nutrition in Entomophagous Insectson
Effect of nutrition in Entomophagous Insectson
JabaskumarKshetri
 
Concise Notes on tree and graph data structure
Concise Notes on tree and graph data structureConcise Notes on tree and graph data structure
Concise Notes on tree and graph data structure
YekoyeTigabu2
 
RAPID DIAGNOSTIC TEST (RDT) overviewppt.pptx
RAPID DIAGNOSTIC TEST (RDT)  overviewppt.pptxRAPID DIAGNOSTIC TEST (RDT)  overviewppt.pptx
RAPID DIAGNOSTIC TEST (RDT) overviewppt.pptx
nietakam
 
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
home
 
Structure formation with primordial black holes: collisional dynamics, binari...
Structure formation with primordial black holes: collisional dynamics, binari...Structure formation with primordial black holes: collisional dynamics, binari...
Structure formation with primordial black holes: collisional dynamics, binari...
Sérgio Sacani
 
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Ali Raei
 
Infrastructure for Tracking Information Flow from Social Media to U.S. TV New...
Infrastructure for Tracking Information Flow from Social Media to U.S. TV New...Infrastructure for Tracking Information Flow from Social Media to U.S. TV New...
Infrastructure for Tracking Information Flow from Social Media to U.S. TV New...
Himarsha Jayanetti
 
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptxPresentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
muralinath2
 
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptxQuiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
NutriGen
 
Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025
Shiva Nejati
 
Polytene chromosomes. A Practical Lecture.pptx
Polytene chromosomes. A Practical Lecture.pptxPolytene chromosomes. A Practical Lecture.pptx
Polytene chromosomes. A Practical Lecture.pptx
Dr Showkat Ahmad Wani
 
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
muralinath2
 
Skin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _ControlSkin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _Control
muralinath2
 
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Sérgio Sacani
 
Chapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.pptChapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.ppt
JessaBalanggoyPagula
 
On the Lunar Origin of Near-Earth Asteroid 2024 PT5
On the Lunar Origin of Near-Earth Asteroid 2024 PT5On the Lunar Origin of Near-Earth Asteroid 2024 PT5
On the Lunar Origin of Near-Earth Asteroid 2024 PT5
Sérgio Sacani
 
Preparation of Permanent mounts of Parasitic Protozoans.pptx
Preparation of Permanent mounts of Parasitic Protozoans.pptxPreparation of Permanent mounts of Parasitic Protozoans.pptx
Preparation of Permanent mounts of Parasitic Protozoans.pptx
Dr Showkat Ahmad Wani
 
Ad

Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning

  • 1. Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning 19th European Dependable Computing Conference Leuven, Belgium 10 April, 2024 Xavier Martorel Universitat Politecnica de Catalanya Valerio Schiavoni University of Neuchâtel Isabelly Rocha University of Neuchâtel Pascal Felber University of Neuchâtel Marcelo Pasin University of Neuchâtel Osman Unsal Barcelona Super Computing [Practical Experience Report]
  • 2. [email protected] - SGX-OmpSs - EDCC’24 /11 Secure Deep Learning 2 Intel SGX •Several deep-learning applications require private data •Face recognition, speech recognition, self-driving cars, genetic sequence modeling, NLP, etc. •High accuracy, but very high training costs • 11.5 hours on commodity hardware for NLP models • Performance vs. Security trade-o ff •Privacy, integrity •We want to exploit HW heterogeneity: how ? no security guarantees on GPUs (see you in 2 years)
  • 3. [email protected] - SGX-OmpSs - EDCC’24 /11 •Addresses the performance requirements •1 task ➡ program statement •Arbitrary granularity •No dependencies to other tasks, run in parallel •Native support for HW heterogeneity •Several frameworks exist •OpenMP, charms++, OmpSs •https://pm.bsc.es/ompss Performance: Task-level Parallelism 3
  • 4. [email protected] - SGX-OmpSs - EDCC’24 /11 •Trusted Execution Environments •Addresses the security requirements •Hardware area protected against powerful attacks •Its content is an enclave, shielded from: •compromised OS, compromised system libraries, attackers with physical access to a machine •Several implementations exist nowadays: •Intel: TDX, SGX •ARM: TrustZone, CCA •AMD: SEV, SEV-SNP •RISC-V: Keystone, MultiZone •Google: Trusty, … Security: TEEs 4 In this talk
  • 5. [email protected] - SGX-OmpSs - EDCC’24 /11 Intel Software Guard Extensions 5 Intel SGX Enclave Create enclave Call trusted function … Execute Return Call gate Trusted function Untrusted Trusted ➊ ➋ ➏ ➎ ➍ ➌ ➐ Intel SGX Operating System •Available since 2015, SkyLake •Hardware-protected area on die •Split the program in two: •Untrusted vs. trusted (enclaves) •Transparent encryption/decryption •Code integrity •Intel Attestation Service •Memory limits: EPC, up to 64 GBs in recent server-grade CPUs, older generations only ~100 MB •Intel SDK, C/C++, Rust SDK, containers (Scone, SGX-LKL…)
  • 6. [email protected] - SGX-OmpSs - EDCC’24 /11 Do we need a new system? 6 •State-of-the-art systems for secure computation with SGX •At the time of this work, none would fi t the bill
  • 7. [email protected] - SGX-OmpSs - EDCC’24 /11 SGX-OmpSs: example 7 1 int SGX_CDECL main(int argc, char*argv[]) 2 { 3 ... 4 double *A, B, C = (double *) malloc(DIM * DIM * sizeof(double)); 5 fill_random(A); fill_random(B); fill_random(C); 6 for(i=0;i<DIM;i++) 7 for (j = 0; j < DIM; j++) 8 for (k = 0; k < DIM; k++) { 9 // OmpSs pragmas 10 #pragma omp task in(A[i][k], B[k][j]) inout(C[i][j]) no_copy_deps 11 // SGX ecall 12 ecall_matmul(global_eid, &A[i][k], &B[k][j], &C[i][j], BSIZE); } 13 // OmpSs pragmas 14 #pragma omp taskwait //barrier to wait for pending tasks … } •Matrix multiplication, 2 pragmas, 1 sgx ecall
  • 8. [email protected] - SGX-OmpSs - EDCC’24 /11 8 create enclave call Trusted() return process secrets Untrusted Trusted SGX Compiler DFiant HDL CUDA MaxJ Enclave Kernels programmer annotates SGX tasks OmpsSS application 1 Mercurium Compiler GCC OmpSs.elf source code + annotations (calls to Nanos++) 2 3 Nanos Enclave Support 4 SGX-OmpSs: work fl ow •Main contributions of this work, called SGX-OmpSs: 1.Integration of Intel SGX and task-based framework OmpSs 2.Application to deep-neural network applications
  • 9. [email protected] - SGX-OmpSs - EDCC’24 /11 ●Intel E3-1275 (SGX 1.0), 4 cores, 2 threads, 92 MiB EPC ●See more results in the paper: ● micro-benchmarks ● energy considerations ● 5 lessons learned ●In the rest of this talk: ● one micro-benchmark ● secure task-based DL ● YOLO-Pascal, LENET-MNIST 9 Evaluation
  • 10. [email protected] - SGX-OmpSs - EDCC’24 /11 Microbenchmarks 10
  • 11. [email protected] - SGX-OmpSs - EDCC’24 /11 Microbenchmarks 10
  • 12. [email protected] - SGX-OmpSs - EDCC’24 /11 Microbenchmarks 10 Lesson 1:large overheads for “secure” versions
  • 13. [email protected] - SGX-OmpSs - EDCC’24 /11 11 0 100 200 300 400 500 sgx 2 4 8 -100 -80 -60 -40 -20 0 20 40 60 Runtime [s] Difference [%] YOLO-Pascal LENET-MNIST Runtime 🏃 •Real-time object detection on the Pascal VOC 2012 dataset •Hand-written digits, lightweight CNN
  • 14. [email protected] - SGX-OmpSs - EDCC’24 /11 11 0 100 200 300 400 500 sgx 2 4 8 -100 -80 -60 -40 -20 0 20 40 60 Runtime [s] Difference [%] YOLO-Pascal LENET-MNIST Runtime 🏃 •Real-time object detection on the Pascal VOC 2012 dataset •Hand-written digits, lightweight CNN baseline, no parallelism
  • 15. [email protected] - SGX-OmpSs - EDCC’24 /11 11 0 100 200 300 400 500 sgx 2 4 8 -100 -80 -60 -40 -20 0 20 40 60 Runtime [s] Difference [%] YOLO-Pascal LENET-MNIST Runtime 🏃 •Real-time object detection on the Pascal VOC 2012 dataset •Hand-written digits, lightweight CNN baseline, no parallelism lower is better
  • 16. [email protected] - SGX-OmpSs - EDCC’24 /11 11 0 100 200 300 400 500 sgx 2 4 8 -100 -80 -60 -40 -20 0 20 40 60 Runtime [s] Difference [%] YOLO-Pascal LENET-MNIST Runtime 🏃 •Real-time object detection on the Pascal VOC 2012 dataset •Hand-written digits, lightweight CNN baseline, no parallelism lower is better
  • 17. [email protected] - SGX-OmpSs - EDCC’24 /11 12 5 10 15 20 25 30 35 sgx 2 4 8 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 Energy [kJ] Difference [%] YOLO-Pascal LENET-MNIST Energy🔋🪫 Lesson 5: predicting performances is not easy, must be done on a case-by-case read paper for 2-4
  • 18. [email protected] - SGX-OmpSs - EDCC’24 /11 12 5 10 15 20 25 30 35 sgx 2 4 8 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 Energy [kJ] Difference [%] YOLO-Pascal LENET-MNIST Energy🔋🪫 Lesson 5: predicting performances is not easy, must be done on a case-by-case read paper for 2-4
  • 19. [email protected] - SGX-OmpSs - EDCC’24 /11 Conclusion 13 •SGX-OmpSs can accelerate the execution of secure applications •Easy to use it in any application domain •It exploits the asynchronous task parallelism paradigm •For SGX-based applications, e ff orts to port to SGX-OmpSs are minimal •Taski fi ed deep-learning workloads improve runtime (up to 94%) and reduce energy requirements (up to 92%) •In a (far) future: extend to FPGAs and secure GPUs