Age of Language Models in NLP

The Age of Language Models in NLP
Tuesday | 23rd June, 2020
LIVE WEBINAR
Presented by

AGENDA
1 . About Tyrone
 World’s high performing AI platform system – A100
 Get Development, Training, Inference in one
 Era of Modern Mixed Workloads
 Tyrone Kubyts™
2. Word Embeddings
 How Word embeddings create a context based relationship
 How to create Word Embeddings
3. Sequence Modelling
 Introduction of Deep learning in NLP
 Overview on the model architecture to use
4. Advanced Language Models
 Overview of the Language models
 How are they created
 Transformers
 BERT , GP2 etc
5. NLP Attention Mechanism
 Overview on attention Mechanism
6. Case Studies

NVIDIA HGX A100 PERFORMANCE
New Tensor Core for AI & HPC
New Multi-instance GPU
New Hardware Engines
Increase in GPU
interconnect
bandwidth
Increase in GPU
memory
Increase in
memory
bandwidth
Speedup in
AI performance

54 Billion
XTORS
3rd Gen
Tensor cores
Sparsity
Acceleration
Multi
Instance GPU
3rd GEN NVLINK
& NVSwitch

NVIDIA A100
Greatest Generational Leap – 20X Volta
54B XTOR | 826mm2 | TSMC 7N | 40GB Samsung HBM2 | 600 GB/s NVLink
Peak Vs Volta
FP32 TRAINING 312 TFLOPS 20X
INT8 INFERENCE 1,248 TOPS 20X
FP64 HPC 19.5 TFLOPS 2.5X
MULTI INSTANCE GPU 7X GPUs

New tf32 tensor cores on A100
20X Higher FLOPS for AI, Zero Code Change
20X Faster than Volta FP32 | Works like FP32 for AI with Range of FP32 and Precision of FP16
No Code Change Required for End Users | Supported on PyTorch, TensorFlow and MXNet Frameworks Containers

Most flexible ai platform with MULTI-INSTANCE GPU (MIG)
Optimize GPU Utilization, Expand Access to More Users with Guaranteed Quality of Service
Up To 7 GPU Instances In a Single A100:
Simultaneous Workload Execution With
Guaranteed Quality Of Service:
All MIG instances run in parallel with predictable
throughput & latency
Flexibility to run any type of workload on a MIG
instance
Right Sized GPU Allocation:
Different sized MIG instances based on target
workloads
Amber
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU

ONE SYSTEM FOR ALL ai INFRASTRUCTURE
AI Infrastructure Re-Imagined, Optimized, and Ready for Enterprise AI-at-Scale
any job | any size | any node | anytime
Analytics  Training  Inference
Flexible AI infrastructure that adapts to the
pace of enterprise
• One universal building block for the AI data
center
• Uniform, consistent performance across the
data center
• Any workload on any node - any time
• Limitless capacity planning with predictably
great performance with scale

Game-changing performance for innovators
9x Mellanox ConnectX-6 200Gb/s Network Interface
450GB/sec Peak Bi-directional Bandwidth
Dual 64-core AMD Rome CPUs and 1TB RAM
3.2X More Cores to Power the Most Intensive AI Jobs
8x NVIDIA A100 GPUs with 320GB Total GPU Memory
12 NVLinks/GPU 600GB/sec GPU-to-GPU Bi-directional Bandwidth
15TB Gen4 NVME SSD
4.8TB/sec Bi-directional Bandwidth
2X More than Previous Generation NVSwitch
6x NVIDIA NVSwitches
25GB/sec Peak Bandwidth
2X Faster than Gen3 NVME SSDs

2U GPU server up to 4 NVIDIA HGX ™ A100 GPU
Camarero DAS7TGVQ-24RT
Tyrone NVIDIA A100 based SERVERS
• Supports 4x A100 40GB SXM4 GPUs
• Supports CPU TDP up to 280W
• Dual AMD EPYC™ 7002 Series Processors in up to 128 Cores
• Flexible storage with 4 hotswap for SAS, SATA or NVMe
• PCI-E Gen 4 NVLink for fast GPU-GPU connections
• 32 DIMM Slots that allow up to 8TB of 3200Mhz DDR4 memory
• 4 Hot-swap heavy duty fans
• 2x 2200W Redundant Power Supplies, Titanium Level
PCI-E Gen 4
NEW LAUNCH
NVIDIA NVLink

NVIDIA NVLink & NVSwitch
NEW LAUNCH
• Supports up to 8 double-width GPUs,
• Supports CPU TDP up to 280W
• Dual AMD EPYC™ 7002 Series Processors in up to 128 Cores
• Flexible storage with 4 hotswap for SAS, SATA or NVMe
• PCI-E Gen 4 NVLink for fast GPU-GPU connections
• 32 DIMM Slots that allow up to 8TB of 3200Mhz DDR4 memory
• 4 Hot-swap heavy duty fans
• 2x 2000W Redundant Power Supplies, Titanium Level

NVIDIA NVLink
COMING SOON
• Supports Intel Xeon
• Supports NVLink
• 8 x NVIDIA Tesla A100 SXM4

Delivers 4XFASTER TRAINING
than other GPU-based systems
Your Personal AI Supercomputer
Power-on to Deep Learning in Minutes
Pre-installed with Powerful
Deep Learning
Software
Extend workloads from your
Desk-to-Cloud in Minutes

Mixed Workloads Convergence of
AI|HPC| Cloud | Containers

The Era of Modern Mixed Workload
F L E X I B L E Is the usage going to be constant?
O P T I M I Z A T I O N Is optimal utilization required?
R E S I L I E N C E Do we need the application to run all the time.
E A S E Is ‘ease of maintenance’ key?
S C A L A B I L I T Y & S P E E D Do we have one size that fits all?

Connectivity and usage
Virtual Desktop
Laptop
Tyrone Cloud
Manager
Tyrone Cloud
Manager
Laptop

Run Multiple Applications
simultaneously
Flow Architecture Revolutionizing Deep Learning CPU-GPU Environment
10X20X30X40X50X60X70X
SPEED
WITH TYRONE KUBYTS™ CLIENT
Compatible Workstations
has a repository of :
50 containerized applications
100s of Containers
CLOUD

Tyrone KUBITS : Revolutionizing Deep Learning CPU-GPU Environment
Run different
applications
simultaneously
Check for Tyrone
KUBITS Compatible
Workstations
Get access to over
100+ Containers on
Tyrone KUBITS Cloud.
High scalability
Affordable price
Has both GPU &
CPU Optimized
Containers
Design a simple Workstation
or Large Clusters with KUBITS
technology.
Talk to our experts & build
the right workstation within
your budget.
KUBITS
CLOUDCOMPATIBLE

Word Embedding
• Word Embedding is a language modeling technique used
for mapping words to vectors of real numbers.
• It represents words or phrases in vector space with
several dimensions.
• Word embeddings can be generated using various
methods like neural networks, co-occurrence matrix,
probabilistic models, etc.
• Word2Vec consists of models for generating word
embedding. These models are shallow two layer neural
networks having one input layer, one hidden layer and
one output layer. Word2Vec utilizes two architectures

CBOW – Continuous Bag of Words
• CBOW model predicts the
current word given context
words within specific window.
The input layer contains the
context words and the output
layer contains the current word.
• The hidden layer contains the
number of dimensions in which
we want to represent current
word present at the output layer.

Skip Gram – Word Embeddings
• Skip gram predicts the surrounding
context words within specific
window given current word. The
input layer contains the current
word and the output layer contains
the context words.
• The hidden layer contains the
number of dimensions in which we
want to represent current word
present at the input layer.

Advanced Language Models and Transformers
ELMo ULMFit
BERT Transformer

BERT Architecture
• Transformer is an attention-based architecture for NLP
• Transformer composed of two parts: Encoding component and
Decoding component
• BERT is a multi-layer bidirectional Transformer encoder

Q&A Session
Hirdey Vikram
Hirdey.vikram@netwebindia.com
India (North)
Niraj
niraj@netwebindia.com
India (South)
Vivek
vivek@netwebindia.com
India (East)
Navin
navin@netwebindia.com
India (West)
Anupriya
anupriya@netwebtech.com
Singapore
Arun
arun@netwebtech.com
UAE
Agam
agam@netwebtech.com
Indonesia
Contact our team if you have any further questions after this webinar
ai@netwebtech.comTalk to our AI Experts

Age of Language Models in NLP

More Related Content

What's hot (15)

Similar to Age of Language Models in NLP (20)

More from Tyrone Systems (20)

Recently uploaded (20)

Age of Language Models in NLP