SlideShare a Scribd company logo
1 © NEC Corporation 2019
Brand-new Vector Supercomputing power in Server Chassis
SX-Aurora TSUBASA (Vector Engine)
Deepak Pathania, OEM Alliance
2 © NEC Corporation 2019
High Bytes per FLOPs or Evolving towards massive data processing
Vector Processor
Computing
Performance
Massively
Parallel
Processor
(GPGPU)
Voice Recognition
Image Recognition
Memory
Performance
Multipurpose
Processor
Simulation
Crash
Simulation
Weather
Simulation
massive data processing
Demand Forecasting
AI
Price Forecasting
Recommendation
Compute
Performance
3 © NEC Corporation 2019
High Bytes per FLOP’s Targeted Roadmap
Computing
Performance
Memory
Performance
Compute
Performance
Vector Engine
GPU
CPU
Aurora 3
Aurora 2
Aurora 1
4 © NEC Corporation 2019
Vector Processor on Card
(World’s Highest Memory Bandwidth Processor)
 New Developed Vector Processor (Derived from Super-Computer)
 PCIe Card Implementation
 8 cores / processor
 2.15TF performance (double precision)
 1.2TB/s memory bandwidth, 48GB memory
 Normal programming with Fortran/C/C++
5 © NEC Corporation 2019
SPU
Scalar Processing
Unit
Core Architecture
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
1.2TB/s / processor
(Ave. 150GB/s / core)
400GB/s / core
Single core
6 © NEC Corporation 2019
Open-Source
▌Materials for Learning and
Training
Tuning guides and various manuals
▌VEOS or middleware
 Heterogeneous Computing with Vectors
▌ML Libraries in C/C++
 Frovedis Library or Spark alike machine
learning framework in C/C++
▌Deep Learning using
Vectors
Licensed
▌C/C++ Compiler
 ISO/IEC 9899:2011 (aka C11)
 ISO/IEC 14882:2014 (aka C++14)
▌Fortran Compiler
 ISO/IEC 1539-1:2004 (aka Fortran 2003)
 ISO/IEC 1539-1:2010 (aka Fortran 2008)
▌Open MP and MPI
 Version 4.5
▌Libraries
 Glibc
 MPI version 3.1 (fully tuned for Vector Engine Architecture
 NLC Libraries (BLAS, LAPACK, FFT etc.)
▌ Tools
FtraceViewer/PROGINF
Gprof
GDB, Eclipse Parallel Tools Platform
7 © NEC Corporation 2019
Ease of Programming
Automatic Vectorization feature and various tools help in vectorization for advance levels
void matmul(float *A, float *B, float *C, int l, int m, int n){
int i, j, k;
for (i = 0; i < l; i++) {
for (j = 0; j < n; j++) {
float sum = 0.0;
for (k = 0; k < m; k++)
sum += A[i * m + k] * B[k * n + j];
C[i*n+j] = sum;
}
}
}
void alloc_matrix(float **m_h, int h, int w){
*m_h = (float *)malloc(sizeof(float) * h * w);
}
// other function definitions …
int main(int argc, char *argv[]){
float *Ah, *Bh, *Ch;
struct timeval t1, t2;
// prepare matrix A
alloc_matrix(&Ah, L, M);
init_matrix(Ah, L, M);
// do it again for matrix B
alloc_matrix(&Bh, M, N);
init_matrix(Bh, M, N);
// allocate spaces for matrix C
alloc_matrix(&Ch, L, N);
// call matmul function
matmul(Ah, Bh, Ch, L, M, N);
return 0;
}
[Compiler diagnostic message]
$ ncc sample.c -O4 -report-all -fdiag-vector=2
ncc: opt(1589): sample.c, line 11: Outer loop moved inside inner loop(s).: j
ncc: vec( 101): sample.c, line 11: Vectorized loop.
ncc: opt(1592): sample.c, line 13: Outer loop unrolled inside inner loop.: k
ncc: vec( 101): sample.c, line 13: Vectorized loop.
ncc: vec( 128): sample.c, line 14: Fused multiply-add operation applied.
ncc: vec( 101): sample.c, line 15: Vectorized loop.
No modification is necessary for vectorization
Just compile, and loops are vectorized automatically
[Format list]
8: void matmul(float *A, float *B, float *C, int l, int m, int n){
9: int i, j, k;
10: +------> for (i = 0; i < l; i++) {
11: |X-----> for (j = 0; j < n; j++) {
12: || float sum = 0.0;
13: ||V----> for (k = 0; k < m; k++)
14: ||V---- F sum += A[i * m + k] * B[k * n + j];
15: || C[i*n+j] = sum;
16: |X----- }
17: +------ }
18: }
8 © NEC Corporation 2019
VEOS offload models
Run the application in the right way possible using VEOS
OS Offload VH call VEO
x86
node
Vector
Engine
VE
Application
x86
Application
x86
node
Vector
Engine
x86
Application
VE
Application
Vector
Engine
VE
Application
x86
node
OSOS OS
Study Reference: https://www.hpc.nec/api/v1/forum/file/download?id=LbGhNY
9 © NEC Corporation 2019
Vectors for Weather
NEC received a 50 Million Euro order from the
Deutscher Wetterdienst (DWD) for a highly
innovative European weather forecasting system
using NEC SX-Aurora TSUBASA
Vectors for Material Science
NEC SX-Aurora TSUBASA installed by High Energy
Accelerator Research Organization, and National
Institute for Environmental Studies for successive
vector supercomputer systems, Japan
“The organization promotes simulation research in high
energy physics, and introduced SX-Aurora TSUBASA to
implement the joint use program "Primary Particle Nuclear
Space Simulation Program".,”
Says KEK (High Energy Accelerator Research Organization)
Source: NEC Press Release
“The system combines forecasting based on observations
with very demanding numerical weather prediction models
in order for a more precise prediction of the development
and the tracks of such small-scale weather events up to
twelve hours into the future. This will enable better and
earlier warnings for local populations,"
said Mr. Detlev Majewski, Head of the Department of
Meteorological Analysis and Numerical Modelling at the
Deutscher Wetterdienst.
Source: NEC Press Release
NEC and HPE join forces in an Exclusive partnership
Vector optimized expertise meets Global HPC leader
10
 #1 HPC Market Share *
 Global market coverage
 Purpose built server portfolio
 Strong ISV partner eco-system
throughout the world
 Specialized HPC expertise
• System management and
operation
• Benchmarking
• Solution integration
• Hybrid infrastructure
 30+ years in Vector compute
architecture
 Silicon to software design
expertise
 Optimized software stack
• Compiler
• SDK
• Libraries
11 © NEC Corporation 2019
NEC and Colfax partner to provide groundbreaking HPC
development at your desk
• Over 30 years of experience in delivering custom and HPC solutions
• Extensive customer base especially academia and research labs
• Specialized HPC expertise
• Solution design and development
• HPC research and training
• Hybrid system design
• NEC and Colfax partnership aims to provide “personal supercomputing”
power for leading-edge development
12 © NEC Corporation 2019
Legacy Change for GoodAdapt/Evolve
▌Earth Simulator delivered the
highest efficiency or high
Bytes/Flops.
▌Fairly easy to implement for
harnessing full vector potential.
▌Vectors are till today and for
tomorrow will remain the
ultimate super power for huge
data processing of any super
computing architecture.
▌ Yet reduction in size has been
an area of improvement from
legacy.
▌Super computers unlike in the
past, should now be available
for everyone.
▌Pure vectors needed for
massive data processing but
keeping in mind power and
efficiency.
▌Co-exist with others for solving
problems.
▌Continue the legacy….
▌SX-Aurora TSUBASA (Wing)
supercomputing processor on a
PCIe Card and supporting
heterogeneity.
▌HBM2 and 1.2 TB/s of
bandwidth with large vector
pipes and cores delivering 2.15
TFLOPs DP per card.
▌Auto-vectorization and
parallelization for ease of
programming.
▌Continue the legacy to deliver
High Bytes/FLOPs with power
efficiency.
13 © NEC Corporation 2019
Wish to experience our NEC SX-Aurora TSUBASA?
Join our trial program today!
▌Contact Us: : Info@hpc.jp.nec.com
SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in Server Chassis

More Related Content

What's hot (20)

PDF
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
IJMER
 
PDF
IMPLEMENTATION OF 128-BIT SPARSE KOGGE-STONE ADDER USING VERILOG
International Journal of Technical Research & Application
 
PDF
Implementation of Low Power and Area Efficient Carry Select Adder
inventionjournals
 
PDF
High Speed Carryselect Adder
ijsrd.com
 
PPT
Cmos Arithmetic Circuits
ankitgoel
 
PPT
Low power & area efficient carry select adder
Sai Vara Prasad P
 
PDF
Design and Verification of Area Efficient Carry Select Adder
ijsrd.com
 
PDF
DESIGN AND IMPLEMENTATION OF LOW POWER ALU USING CLOCK GATING AND CARRY SELEC...
IAEME Publication
 
PPTX
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
 
DOC
Survey on Prefix adders
Lakshmi Yasaswi Kamireddy
 
PDF
Nanosatellite Components Catalogue German Orbital Systems
IKosenkov
 
PPT
Design and development of carry select adder
ABIN THOMAS
 
PDF
Introduction to HDFLook_MODIS
The HDF-EOS Tools and Information Center
 
PDF
Miniaturizing Space: Small-satellites
X. Breogan COSTA
 
PPTX
Parallel & Distributed Computing
rohit_ainapure
 
PDF
IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload usin...
VIMALKUMAR KUMARESAN
 
PDF
carry select adder
faisal_hussain2008
 
PDF
Dl2 computing gpu
Armando Vieira
 
PPTX
IMSDB-COBOL PROGRAM EXPLANATION
Srinimf-Slides
 
PPTX
Csla 130319073823-phpapp01-140821210430-phpapp02
Jayaprakash Nagaruru
 
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
IJMER
 
IMPLEMENTATION OF 128-BIT SPARSE KOGGE-STONE ADDER USING VERILOG
International Journal of Technical Research & Application
 
Implementation of Low Power and Area Efficient Carry Select Adder
inventionjournals
 
High Speed Carryselect Adder
ijsrd.com
 
Cmos Arithmetic Circuits
ankitgoel
 
Low power & area efficient carry select adder
Sai Vara Prasad P
 
Design and Verification of Area Efficient Carry Select Adder
ijsrd.com
 
DESIGN AND IMPLEMENTATION OF LOW POWER ALU USING CLOCK GATING AND CARRY SELEC...
IAEME Publication
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
 
Survey on Prefix adders
Lakshmi Yasaswi Kamireddy
 
Nanosatellite Components Catalogue German Orbital Systems
IKosenkov
 
Design and development of carry select adder
ABIN THOMAS
 
Introduction to HDFLook_MODIS
The HDF-EOS Tools and Information Center
 
Miniaturizing Space: Small-satellites
X. Breogan COSTA
 
Parallel & Distributed Computing
rohit_ainapure
 
IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload usin...
VIMALKUMAR KUMARESAN
 
carry select adder
faisal_hussain2008
 
Dl2 computing gpu
Armando Vieira
 
IMSDB-COBOL PROGRAM EXPLANATION
Srinimf-Slides
 
Csla 130319073823-phpapp01-140821210430-phpapp02
Jayaprakash Nagaruru
 

Similar to SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in Server Chassis (20)

PDF
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Databricks
 
PDF
PCCC21:日本電気株式会社「一台何役?SX-Aurora TSUBASA最新情報」
PC Cluster Consortium
 
PPT
Track A-Compilation guiding and adjusting - IBM
chiportal
 
PDF
Speeding up Programs with OpenACC in GCC
inside-BigData.com
 
PPTX
Programmable Exascale Supercomputer
Sagar Dolas
 
PDF
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
LEGATO project
 
PDF
IEEE CloudCom 2014参加報告
Ryousei Takano
 
PDF
Automatic generation of hardware memory architectures for HPC
Facultad de Informática UCM
 
PDF
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
inside-BigData.com
 
PPTX
LEGaTO: Software Stack Runtimes
LEGATO project
 
PDF
RECAP: The Simulation Approach
RECAP Project
 
PDF
Optimising Service Deployment and Infrastructure Resource Configuration
RECAP Project
 
PDF
OpenCL programming using Python syntax
cscpconf
 
PDF
Open cl programming using python syntax
csandit
 
PDF
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Software
 
PDF
State of ARM-based HPC
inside-BigData.com
 
PPTX
byteLAKE's Alveo FPGA Solutions
byteLAKE
 
PDF
electronics-11-03883.pdf
RioCarthiis
 
PPTX
DATE 2020: Design, Automation and Test in Europe Conference
LEGATO project
 
PPTX
HPC in higher education
Kishor Satpathy
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Databricks
 
PCCC21:日本電気株式会社「一台何役?SX-Aurora TSUBASA最新情報」
PC Cluster Consortium
 
Track A-Compilation guiding and adjusting - IBM
chiportal
 
Speeding up Programs with OpenACC in GCC
inside-BigData.com
 
Programmable Exascale Supercomputer
Sagar Dolas
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
LEGATO project
 
IEEE CloudCom 2014参加報告
Ryousei Takano
 
Automatic generation of hardware memory architectures for HPC
Facultad de Informática UCM
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
inside-BigData.com
 
LEGaTO: Software Stack Runtimes
LEGATO project
 
RECAP: The Simulation Approach
RECAP Project
 
Optimising Service Deployment and Infrastructure Resource Configuration
RECAP Project
 
OpenCL programming using Python syntax
cscpconf
 
Open cl programming using python syntax
csandit
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Software
 
State of ARM-based HPC
inside-BigData.com
 
byteLAKE's Alveo FPGA Solutions
byteLAKE
 
electronics-11-03883.pdf
RioCarthiis
 
DATE 2020: Design, Automation and Test in Europe Conference
LEGATO project
 
HPC in higher education
Kishor Satpathy
 
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
inside-BigData.com
 
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
PPTX
Transforming Private 5G Networks
inside-BigData.com
 
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
PDF
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
PDF
Machine Learning for Weather Forecasts
inside-BigData.com
 
PPTX
HPC AI Advisory Council Update
inside-BigData.com
 
PDF
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
PDF
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
PDF
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
PDF
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
PDF
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
PDF
Overview of HPC Interconnects
inside-BigData.com
 
PDF
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
inside-BigData.com
 
Major Market Shifts in IT
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
inside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
inside-BigData.com
 
Ad

Recently uploaded (20)

PPTX
Lecture 5 - Agentic AI and model context protocol.pptx
Dr. LAM Yat-fai (林日辉)
 
PPTX
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
ShapeBlue
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
PDF
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PDF
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
PDF
Trading Volume Explained by CIFDAQ- Secret Of Market Trends
CIFDAQ
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PDF
Generative AI in Healthcare: Benefits, Use Cases & Challenges
Lily Clark
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
PDF
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
Lecture 5 - Agentic AI and model context protocol.pptx
Dr. LAM Yat-fai (林日辉)
 
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
ShapeBlue
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
Trading Volume Explained by CIFDAQ- Secret Of Market Trends
CIFDAQ
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
Generative AI in Healthcare: Benefits, Use Cases & Challenges
Lily Clark
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 

SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in Server Chassis

  • 1. 1 © NEC Corporation 2019 Brand-new Vector Supercomputing power in Server Chassis SX-Aurora TSUBASA (Vector Engine) Deepak Pathania, OEM Alliance
  • 2. 2 © NEC Corporation 2019 High Bytes per FLOPs or Evolving towards massive data processing Vector Processor Computing Performance Massively Parallel Processor (GPGPU) Voice Recognition Image Recognition Memory Performance Multipurpose Processor Simulation Crash Simulation Weather Simulation massive data processing Demand Forecasting AI Price Forecasting Recommendation Compute Performance
  • 3. 3 © NEC Corporation 2019 High Bytes per FLOP’s Targeted Roadmap Computing Performance Memory Performance Compute Performance Vector Engine GPU CPU Aurora 3 Aurora 2 Aurora 1
  • 4. 4 © NEC Corporation 2019 Vector Processor on Card (World’s Highest Memory Bandwidth Processor)  New Developed Vector Processor (Derived from Super-Computer)  PCIe Card Implementation  8 cores / processor  2.15TF performance (double precision)  1.2TB/s memory bandwidth, 48GB memory  Normal programming with Fortran/C/C++
  • 5. 5 © NEC Corporation 2019 SPU Scalar Processing Unit Core Architecture VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV VFMA0 VFMA1 VFMA2 ALU0 ALU1 DIV 1.2TB/s / processor (Ave. 150GB/s / core) 400GB/s / core Single core
  • 6. 6 © NEC Corporation 2019 Open-Source ▌Materials for Learning and Training Tuning guides and various manuals ▌VEOS or middleware  Heterogeneous Computing with Vectors ▌ML Libraries in C/C++  Frovedis Library or Spark alike machine learning framework in C/C++ ▌Deep Learning using Vectors Licensed ▌C/C++ Compiler  ISO/IEC 9899:2011 (aka C11)  ISO/IEC 14882:2014 (aka C++14) ▌Fortran Compiler  ISO/IEC 1539-1:2004 (aka Fortran 2003)  ISO/IEC 1539-1:2010 (aka Fortran 2008) ▌Open MP and MPI  Version 4.5 ▌Libraries  Glibc  MPI version 3.1 (fully tuned for Vector Engine Architecture  NLC Libraries (BLAS, LAPACK, FFT etc.) ▌ Tools FtraceViewer/PROGINF Gprof GDB, Eclipse Parallel Tools Platform
  • 7. 7 © NEC Corporation 2019 Ease of Programming Automatic Vectorization feature and various tools help in vectorization for advance levels void matmul(float *A, float *B, float *C, int l, int m, int n){ int i, j, k; for (i = 0; i < l; i++) { for (j = 0; j < n; j++) { float sum = 0.0; for (k = 0; k < m; k++) sum += A[i * m + k] * B[k * n + j]; C[i*n+j] = sum; } } } void alloc_matrix(float **m_h, int h, int w){ *m_h = (float *)malloc(sizeof(float) * h * w); } // other function definitions … int main(int argc, char *argv[]){ float *Ah, *Bh, *Ch; struct timeval t1, t2; // prepare matrix A alloc_matrix(&Ah, L, M); init_matrix(Ah, L, M); // do it again for matrix B alloc_matrix(&Bh, M, N); init_matrix(Bh, M, N); // allocate spaces for matrix C alloc_matrix(&Ch, L, N); // call matmul function matmul(Ah, Bh, Ch, L, M, N); return 0; } [Compiler diagnostic message] $ ncc sample.c -O4 -report-all -fdiag-vector=2 ncc: opt(1589): sample.c, line 11: Outer loop moved inside inner loop(s).: j ncc: vec( 101): sample.c, line 11: Vectorized loop. ncc: opt(1592): sample.c, line 13: Outer loop unrolled inside inner loop.: k ncc: vec( 101): sample.c, line 13: Vectorized loop. ncc: vec( 128): sample.c, line 14: Fused multiply-add operation applied. ncc: vec( 101): sample.c, line 15: Vectorized loop. No modification is necessary for vectorization Just compile, and loops are vectorized automatically [Format list] 8: void matmul(float *A, float *B, float *C, int l, int m, int n){ 9: int i, j, k; 10: +------> for (i = 0; i < l; i++) { 11: |X-----> for (j = 0; j < n; j++) { 12: || float sum = 0.0; 13: ||V----> for (k = 0; k < m; k++) 14: ||V---- F sum += A[i * m + k] * B[k * n + j]; 15: || C[i*n+j] = sum; 16: |X----- } 17: +------ } 18: }
  • 8. 8 © NEC Corporation 2019 VEOS offload models Run the application in the right way possible using VEOS OS Offload VH call VEO x86 node Vector Engine VE Application x86 Application x86 node Vector Engine x86 Application VE Application Vector Engine VE Application x86 node OSOS OS Study Reference: https://www.hpc.nec/api/v1/forum/file/download?id=LbGhNY
  • 9. 9 © NEC Corporation 2019 Vectors for Weather NEC received a 50 Million Euro order from the Deutscher Wetterdienst (DWD) for a highly innovative European weather forecasting system using NEC SX-Aurora TSUBASA Vectors for Material Science NEC SX-Aurora TSUBASA installed by High Energy Accelerator Research Organization, and National Institute for Environmental Studies for successive vector supercomputer systems, Japan “The organization promotes simulation research in high energy physics, and introduced SX-Aurora TSUBASA to implement the joint use program "Primary Particle Nuclear Space Simulation Program".,” Says KEK (High Energy Accelerator Research Organization) Source: NEC Press Release “The system combines forecasting based on observations with very demanding numerical weather prediction models in order for a more precise prediction of the development and the tracks of such small-scale weather events up to twelve hours into the future. This will enable better and earlier warnings for local populations," said Mr. Detlev Majewski, Head of the Department of Meteorological Analysis and Numerical Modelling at the Deutscher Wetterdienst. Source: NEC Press Release
  • 10. NEC and HPE join forces in an Exclusive partnership Vector optimized expertise meets Global HPC leader 10  #1 HPC Market Share *  Global market coverage  Purpose built server portfolio  Strong ISV partner eco-system throughout the world  Specialized HPC expertise • System management and operation • Benchmarking • Solution integration • Hybrid infrastructure  30+ years in Vector compute architecture  Silicon to software design expertise  Optimized software stack • Compiler • SDK • Libraries
  • 11. 11 © NEC Corporation 2019 NEC and Colfax partner to provide groundbreaking HPC development at your desk • Over 30 years of experience in delivering custom and HPC solutions • Extensive customer base especially academia and research labs • Specialized HPC expertise • Solution design and development • HPC research and training • Hybrid system design • NEC and Colfax partnership aims to provide “personal supercomputing” power for leading-edge development
  • 12. 12 © NEC Corporation 2019 Legacy Change for GoodAdapt/Evolve ▌Earth Simulator delivered the highest efficiency or high Bytes/Flops. ▌Fairly easy to implement for harnessing full vector potential. ▌Vectors are till today and for tomorrow will remain the ultimate super power for huge data processing of any super computing architecture. ▌ Yet reduction in size has been an area of improvement from legacy. ▌Super computers unlike in the past, should now be available for everyone. ▌Pure vectors needed for massive data processing but keeping in mind power and efficiency. ▌Co-exist with others for solving problems. ▌Continue the legacy…. ▌SX-Aurora TSUBASA (Wing) supercomputing processor on a PCIe Card and supporting heterogeneity. ▌HBM2 and 1.2 TB/s of bandwidth with large vector pipes and cores delivering 2.15 TFLOPs DP per card. ▌Auto-vectorization and parallelization for ease of programming. ▌Continue the legacy to deliver High Bytes/FLOPs with power efficiency.
  • 13. 13 © NEC Corporation 2019 Wish to experience our NEC SX-Aurora TSUBASA? Join our trial program today! ▌Contact Us: : [email protected]