SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in Server Chassis

1 © NEC Corporation 2019
Brand-new Vector Supercomputing power in Server Chassis
SX-Aurora TSUBASA (Vector Engine)
Deepak Pathania, OEM Alliance

High Bytes per FLOPs or Evolving towards massive data processing
Vector Processor
Computing
Performance
Massively
Parallel
Processor
(GPGPU)
Voice Recognition
Image Recognition
Memory
Performance
Multipurpose
Processor
Simulation
Crash
Simulation
Weather
Simulation
massive data processing
Demand Forecasting
AI
Price Forecasting
Recommendation
Compute
Performance

High Bytes per FLOP’s Targeted Roadmap
Computing
Performance
Memory
Performance
Compute
Performance
Vector Engine
GPU
CPU
Aurora 3
Aurora 2
Aurora 1

Vector Processor on Card
(World’s Highest Memory Bandwidth Processor)
 New Developed Vector Processor (Derived from Super-Computer)
 PCIe Card Implementation
 8 cores / processor
 2.15TF performance (double precision)
 1.2TB/s memory bandwidth, 48GB memory
 Normal programming with Fortran/C/C++

SPU
Scalar Processing
Unit
Core Architecture
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
VFMA0
VFMA1
VFMA2
ALU0
ALU1
DIV
1.2TB/s / processor
(Ave. 150GB/s / core)
400GB/s / core
Single core

Open-Source
▌Materials for Learning and
Training
Tuning guides and various manuals
▌VEOS or middleware
 Heterogeneous Computing with Vectors
▌ML Libraries in C/C++
 Frovedis Library or Spark alike machine
learning framework in C/C++
▌Deep Learning using
Vectors
Licensed
▌C/C++ Compiler
 ISO/IEC 9899:2011 (aka C11)
 ISO/IEC 14882:2014 (aka C++14)
▌Fortran Compiler
 ISO/IEC 1539-1:2004 (aka Fortran 2003)
 ISO/IEC 1539-1:2010 (aka Fortran 2008)
▌Open MP and MPI
 Version 4.5
▌Libraries
 Glibc
 MPI version 3.1 (fully tuned for Vector Engine Architecture
 NLC Libraries (BLAS, LAPACK, FFT etc.)
▌ Tools
FtraceViewer/PROGINF
Gprof
GDB, Eclipse Parallel Tools Platform

Ease of Programming
Automatic Vectorization feature and various tools help in vectorization for advance levels
void matmul(float *A, float *B, float *C, int l, int m, int n){
int i, j, k;
for (i = 0; i < l; i++) {
for (j = 0; j < n; j++) {
float sum = 0.0;
for (k = 0; k < m; k++)
sum += A[i * m + k] * B[k * n + j];
C[i*n+j] = sum;
}
}
}
void alloc_matrix(float **m_h, int h, int w){
*m_h = (float *)malloc(sizeof(float) * h * w);
}
// other function definitions …
int main(int argc, char *argv[]){
float *Ah, *Bh, *Ch;
struct timeval t1, t2;
// prepare matrix A
alloc_matrix(&Ah, L, M);
init_matrix(Ah, L, M);
// do it again for matrix B
alloc_matrix(&Bh, M, N);
init_matrix(Bh, M, N);
// allocate spaces for matrix C
alloc_matrix(&Ch, L, N);
// call matmul function
matmul(Ah, Bh, Ch, L, M, N);
return 0;
}
[Compiler diagnostic message]
$ ncc sample.c -O4 -report-all -fdiag-vector=2
ncc: opt(1589): sample.c, line 11: Outer loop moved inside inner loop(s).: j
ncc: vec( 101): sample.c, line 11: Vectorized loop.
ncc: opt(1592): sample.c, line 13: Outer loop unrolled inside inner loop.: k
ncc: vec( 128): sample.c, line 14: Fused multiply-add operation applied.
No modification is necessary for vectorization
Just compile, and loops are vectorized automatically
[Format list]
8: void matmul(float *A, float *B, float *C, int l, int m, int n){
9: int i, j, k;
10: +------> for (i = 0; i < l; i++) {
11: |X-----> for (j = 0; j < n; j++) {
12: || float sum = 0.0;
13: ||V----> for (k = 0; k < m; k++)
14: ||V---- F sum += A[i * m + k] * B[k * n + j];
15: || C[i*n+j] = sum;
16: |X----- }
17: +------ }
18: }

VEOS offload models
Run the application in the right way possible using VEOS
OS Offload VH call VEO
x86
node
Vector
Engine
VE
Application
x86
Application
x86
node
Vector
Engine
x86
Application
VE
Application
Vector
Engine
VE
Application
x86
node
OSOS OS
Study Reference: https://www.hpc.nec/api/v1/forum/file/download?id=LbGhNY

Vectors for Weather
NEC received a 50 Million Euro order from the
Deutscher Wetterdienst (DWD) for a highly
innovative European weather forecasting system
using NEC SX-Aurora TSUBASA
Vectors for Material Science
NEC SX-Aurora TSUBASA installed by High Energy
Accelerator Research Organization, and National
Institute for Environmental Studies for successive
vector supercomputer systems, Japan
“The organization promotes simulation research in high
energy physics, and introduced SX-Aurora TSUBASA to
implement the joint use program "Primary Particle Nuclear
Space Simulation Program".,”
Says KEK (High Energy Accelerator Research Organization)
Source: NEC Press Release
“The system combines forecasting based on observations
with very demanding numerical weather prediction models
in order for a more precise prediction of the development
and the tracks of such small-scale weather events up to
twelve hours into the future. This will enable better and
earlier warnings for local populations,"
said Mr. Detlev Majewski, Head of the Department of
Meteorological Analysis and Numerical Modelling at the
Deutscher Wetterdienst.
Source: NEC Press Release

NEC and HPE join forces in an Exclusive partnership
Vector optimized expertise meets Global HPC leader
10
 #1 HPC Market Share *
 Global market coverage
 Purpose built server portfolio
 Strong ISV partner eco-system
throughout the world
 Specialized HPC expertise
• System management and
operation
• Benchmarking
• Solution integration
• Hybrid infrastructure
 30+ years in Vector compute
architecture
 Silicon to software design
expertise
 Optimized software stack
• Compiler
• SDK
• Libraries

NEC and Colfax partner to provide groundbreaking HPC
development at your desk
• Over 30 years of experience in delivering custom and HPC solutions
• Extensive customer base especially academia and research labs
• Specialized HPC expertise
• Solution design and development
• HPC research and training
• Hybrid system design
• NEC and Colfax partnership aims to provide “personal supercomputing”
power for leading-edge development

Legacy Change for GoodAdapt/Evolve
▌Earth Simulator delivered the
highest efficiency or high
Bytes/Flops.
▌Fairly easy to implement for
harnessing full vector potential.
▌Vectors are till today and for
tomorrow will remain the
ultimate super power for huge
data processing of any super
computing architecture.
▌ Yet reduction in size has been
an area of improvement from
legacy.
▌Super computers unlike in the
past, should now be available
for everyone.
▌Pure vectors needed for
massive data processing but
keeping in mind power and
efficiency.
▌Co-exist with others for solving
problems.
▌Continue the legacy….
▌SX-Aurora TSUBASA (Wing)
supercomputing processor on a
PCIe Card and supporting
heterogeneity.
▌HBM2 and 1.2 TB/s of
bandwidth with large vector
pipes and cores delivering 2.15
TFLOPs DP per card.
▌Auto-vectorization and
parallelization for ease of
programming.
▌Continue the legacy to deliver
High Bytes/FLOPs with power
efficiency.

Wish to experience our NEC SX-Aurora TSUBASA?
Join our trial program today!
▌Contact Us: : Info@hpc.jp.nec.com

SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in Server Chassis

SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in Server Chassis

More Related Content

What's hot (20)

Similar to SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in Server Chassis (20)

More from inside-BigData.com (20)

Recently uploaded (20)

SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in Server Chassis