0% found this document useful (0 votes)
7 views

Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)

Ni in

Uploaded by

tangent699
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)

Ni in

Uploaded by

tangent699
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Deep Learning for various Domains, GPUs and

Frameworks
Sudip Bhattacharya, BIT Durg
July 2024

Contents
1 Abstract 2

2 Deep learning for different domains 2


2.1 Challenges faced in various domains . . . . . . . . . . . . . . . . 2
2.2 Deep Learning Models for Image Processing . . . . . . . . . . . . 4
2.3 Deep Learning Models for Video Analytics . . . . . . . . . . . . . 5
2.4 Deep Learning Models for NLP . . . . . . . . . . . . . . . . . . . 5
2.5 Deep Learning Models for Speech Processing . . . . . . . . . . . 6

3 Role of GPUs in Deep Learning 7


3.1 Harnessing GPUs for Deep Learning . . . . . . . . . . . . . . . . 7
3.2 NVidia - The Game Changer . . . . . . . . . . . . . . . . . . . . 9

4 Deep Learning Frameworks 9


4.1 TensorFlow versus PyTorch . . . . . . . . . . . . . . . . . . . . . 11
4.2 PyTorch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5 Key Takeways - PyTorch versus TensorFlow . . . . . . . . . . . . 14

1
1 Abstract
Deep Learning has been found to be very effective in the areas of image/Video
processing, natural language processing and speech processing. Deeper and
deeper CNNs have with skip connections have been meeting higher benchmarks
with famous models like Reset and VGG. Video processing involves the temporal
dimension too so time series based models like RNNs and its variants with
the capability of remembering longer sequences like LSTMs and GRUs comes
into picture. Vision Transformers have been very succesful in many tasks as
compared to classical deep learning models in these domains. This pchapter
lists some prominent models designed by academia / industry for these tasks.
NLP and speech processing domains have been primarlity dominated by
RNNs and tranformer based models like BERTs. The recent advances in GPT
and the release of GPT based models in public domain like chatGPT have
traiggered a lot of reserach and businnes interests in these domains. This chapter
lists some prominent models in tasks performed in the realm of NLP and speech
processing.
GPUs have been the game changers in the evolution and explosive growth of
deep learning since the last 2 decades. The chapter highlights the suitability of
GPU by the virtue of its architecture for deep learning workloads. This chapter
also discusses the pivotal role of NVidia in promoting the GPU based ecosystem
for deep learning.
Deep learng software frameworks has been a competitive market with a
plethora of software libararies/frameworks which evolved in the last 2 decades.
This chapter lists some of the most successful frameworks that were available
in this period. Inspite of being a competitive space, the current time witnessed
only two major frameworks - PyTorch and Tensorflow( with Keras API). While
PyTorch is becoming the fist choice in the resesearch community , Tensorflow has
an edge in production environments. The section of DL Frameworks provides a
head to head comparison between PyTorch and Tensorflow.

2 Deep learning for different domains


2.1 Challenges faced in various domains
Challenges in Image and Video Processing
Deep learning models have transformed image processing, but they also face
several significant challenges:
• Large Image/Video Dataset Requirements : Deep learning models, es-
pecially complex architectures like CNNs and Transformers, require vast
amounts of labeled data and class balanced data to perform well. Acquir-
ing and labeling such data is often expensive and time-consuming.
• Handling Occlusions and Motion Blur: Fast movement or poor video qual-
ity can cause motion blur or reduced resolution, making it difficult for
models to capture details accurately.

2
• Computational Complexity : Training deep learning models, particularly
on large datasets, demands considerable computational resources, such as
high-performance GPUs or TPUs. This can be costly and time-consuming.
Handling high-resolution images or 3D data (like in medical imaging) re-
quires substantial memory, which can limit model size and batch process-
ing capabilities.
• Interpretability and Explainability: Deep learning models, especially deep
neural networks, are often regarded as ”black boxes.” Understanding how
they make decisions is challenging, which limits their application in critical
fields where explainability is essential, such as healthcare or autonomous
driving.

• Robustness and Adversarial Attacks: Deep learning models can be vulner-


able to adversarial attacks, where small, carefully crafted perturbations to
the input image can lead to incorrect classifications.
• Domain Adaptation and Transfer Learning: Models trained on specific
datasets (e.g., urban scenes) may not perform well on different domains
(e.g., rural scenes) without adaptation. Adapting models to new domains
or environments often requires fine-tuning with new data.
• Ethics and Bias: Large-scale image datasets often involve personal or sen-
sitive data, raising privacy concerns, especially in fields like facial recog-
nition and surveillance.Training data can contain demographic biases.

Challenges in Deep learning for NLP and Speech Processing


Deep learning models have achieved remarkable success in Natural Language
Processing (NLP), but they also face several unique challenges due to the com-
plexity and variability of human language. Here are some of the key challenges
in NLP:
• Data Requirements and Quality: NLP models, especially large language
models like BERT and GPT, require extensive datasets for training. Lan-
guage data is often noisy and unstructured, containing typos, slang, and
inconsistencies that can affect model performance.
• Handling Ambiguity and Context: Many words have multiple meanings
(polysemy), and their meaning often depends on context.

• Multilingual and Low-Resource Languages: many languages are under-


represented in NLP datasets.
• Handling Long-Range Dependencies and Memory Constraints: Trans-
former models, which excel at handling context, can struggle with very
long sequences due to memory and computational constraints.

3
• Multi-Modal and Conversational Understanding : Many NLP tasks, such
as understanding video captions or social media posts, require integrating
information from multiple modalities (e.g., images, text, and audio), which
adds complexity.

2.2 Deep Learning Models for Image Processing


Some of the most well-known deep learning models used for image and Computer
Vision.These are essentially CNNs and Vision Transformers.

• AlexNet
Pioneered deep learning in computer vision, winning the 2012 ImageNet
competition. It is known for using ReLU activations and dropout for
improved training.
• VGGNet
Known for its simplicity and depth, VGGNet stacks multiple 3x3 convo-
lutional layers, followed by fully connected layers. It is popular for tasks
where simplicity and performance are essential.
• ResNet
Introduced the concept of residual connections, allowing very deep net-
works (up to hundreds of layers) without the vanishing gradient problem.
ResNet models (e.g., ResNet-50, ResNet-101) are popular for many com-
puter vision tasks.
• Inception (GoogLeNet)
Inception uses a combination of different-sized filters and pooling layers
in parallel, enabling the model to capture features at various scales within
each layer.
• YOLO (You Only Look Once)
A fast and efficient object detection model that processes an image in
one pass, making it suitable for real-time applications. Variants include
YOLOv3, YOLOv4, and YOLOv5.

• Mask R-CNN
Extends Faster R-CNN by adding a branch for predicting segmentation
masks on each Region of Interest (RoI). It is widely used for instance
segmentation in images.
• U-Net
a convolutional neural network architecture designed for precise image
segmentation, commonly used in biomedical image analysis for tasks like
tumor and organ segmentation.

4
• Vision Transformers
apply the Transformer architecture to image analysis, effectively captur-
ing long-range dependencies and achieving state-of-the-art performance in
tasks like image classification.

2.3 Deep Learning Models for Video Analytics


deep learning models commonly used video analytics are :

• 3D Convolutional Neural Networks (3D-CNNs): used for tasks like action


recognition and gesture detection.
• Convolutional LSTM (ConvLSTM): LSTM that incorporates convolutional
layers, allowing it to capture spatiotemporal dependencies
• I3D (Inflated 3D ConvNet): Extends 2D CNNs into 3D CNNs by inflating
filters and pooling kernels, allowing the model to capture spatiotemporal
features.
• YOLO and SSD Variants for Object Detection in Video: YOLO (You
Only Look Once) and SSD (Single Shot MultiBox Detector) for video,
often incorporating temporal information.

2.4 Deep Learning Models for NLP


Deep learning models commonly used in Natural Language Process-
ing (NLP) are :

• Recurrent Neural Networks (RNNs)


Basic RNNs
• Long Short-Term Memory (LSTM) Networks
LSTMs are a type of RNN that can learn long-term dependencies

• Gated Recurrent Units (GRUs)


simplified version of LSTMs with fewer parameters
• Word2Vec and GloVe
hese are word embedding models that convert words into dense vector
representations, capturing semantic relationships between words.

• Transformer
introduced self-attention mechanisms, allowing for efficient parallel pro-
cessing of sequential data
• BERT (Bidirectional Encoder Representations from Transformers)
pre-trained Transformer model that captures context in both directions

5
• GPT (Generative Pre-trained Transformer)
Transformer-based models that excel at generating human-like text
• XLNet
XLNet improves on BERT by leveraging autoregressive and bidirectional
contexts T5 (Text-to-Text Transfer Transformer)

2.5 Deep Learning Models for Speech Processing


The field of speech processing has undergone a transformative shift with the
advent of deep learning. The use of multiple processing layers has enabled the
creation of models capable of extracting intricate features from speech data.
This development has paved the way for unparalleled advancements in speech
recognition, text-to-speech synthesis, automatic speech recognition, and emo-
tion recognition, propelling the performance of these tasks to unprecedented
heights. The power of deep learning techniques has opened up new avenues
for research and innovation in the field of speech processing, with far-reaching
implications for a range of industries and applications.
RNNs and Transformers are two widely adopted neural network architec-
tures employed in the domain of Natural Lan guage Processing (NLP) and
speech processing. While RNNs process input words sequentially and preserve
a hidden state vector over time, Transformers analyze the entire sentence in
parallel and incorporate an internal attention mechanism. This unique feature
makes Transformers more efficient than RNNs. Moreover, Transformers em-
ploy an attention mechanism that evaluates the relevance of other input tokens
in encoding a specific token. This is particularly advanta geous in machine
translation, as it allows the Transformer to incorporate contextual information,
thereby enhancing translation accuracy. To achieve this, combine word vector
embeddings and positional encodings, which are subsequently subjected to a se-
quence of encoders and decoders. These fundamental differences between RNNs
and Transformers establish the latter as a promising option for various natu-
ral language processing tasks. Acomparative study on transformer vs. RNN
in speech applications found that transformer neural networks achieve state-of-
the-art performance in neural machine translation and other natural language
processing applications.
In addition to speech recognition, the transformer model has shown promis-
ing results in TTS applications. The trans former based TTS model generates
mel-spectrograms, followed by a WaveNet vocoder to output the final audio
results. Several neural network-based TTS models, such as Tacotron 2, Deep-
Voice 3, and transformer TTS, have outperformed traditional concatenative and
statistical parametric approaches in terms of speech quality.
In contrast to the strengths of Transformer-based architec tures in neural
speech synthesis, large language models based on Transformers such as BERT,
GPT , XLNet, , and T5 have limitations when it comes to speech processing.
One of the issues is that these models require discrete tokens as input, necessi-

6
tating using a tokenizer or a speech recognition system, introducing errors and
noise. Fur thermore, pre-training on large-scale text corpora can lead to domain
mismatch problems when processing speech data. To address these limitations,
dedicated frameworks have been developed for learning speech representations
using trans formers, including wav2vec, data2vec, Whisper , VALL-E, Unis-
peech, SpeechT5etc.

Speech Processing Models :

• Deep Speech : an end-to-end speech recognition model that utilizes RNNs


with attention mechanisms.
• WaveNet : deep generative model for raw audio that produces highly
realistic speech synthesis.
• Jasper and QuartzNet: based on convolutional architectures.
• VGGish : Based on VGG.
• Wav2Vec and Wav2Vec 2.0: Developed by Facebook AI, Wav2Vec models
are designed for self-supervised learning.

3 Role of GPUs in Deep Learning


Deep learning has been extensively researched in various areas and scales up
very fast in the last decade. It has deeply permeated into our daily life, such
as image classification, video synthesis, autonomous driving, voice recognition,
and personalized recommendation systems. The main challenge for most deep
learning models, including convolutional neural networks, recurrent neural net-
works, and recommendation models, is their large amount of computation. For-
tunately, most computations in deep learning applications are parallelizable,
therefore they can be effectively handled by throughput processors, such as
Graphics Processing Units (GPUs). GPUs support high throughput, parallel
processing performance, and high memory bandwidth and becomes the most
popularly adopted device for deep learning. As a matter of fact, many deep
learning workloads from mobile devices to data centers are performed by GPUs.
In particular, modern GPU systems provide specialized hardware modules and
software stacks for deep learning workloads. This section presents an analysis
on the evolution of GPU architectures, their feasiblity for DL workloads, re-
cent hardware and software innovations and the challenges for more efficient
acceleration of deep learning computation in GPUs.

3.1 Harnessing GPUs for Deep Learning


SIMD processing
GPUs are designed with thousands of smaller cores that can execute many op-
erations simultaneously. This parallelism makes them well-suited for the matrix

7
Figure 1: Standard Activation Function

and tensor operations central to deep learning, which involve large-scale compu-
tations on multidimensional data arrays.This model is called Single Instruction
Muliple Data (SIMD).

GPU computation model suits the computational needs for CNN


SIMD Model directly fits in for CNN computations GPUs are particularly ben-
eficial for deep learning models with many layers, such as Convolutional Neural
Networks (CNNs) for image recognition

DL Programming Frameworks leverage the GPU compuation model


for Tensor Computations
Deep learning frameworks like TensorFlow, PyTorch, and their wrappers like
Keras offer GPU support, making it easier to leverage GPUs without having to
write custom code for parallel computing. These frameworks use libraries like
CUDA and cuDNN, which are optimized for GPU computations, to accelerate
model training and inference.

Scalability for large Deep Learning Models


GPUs can be used in multi-GPU setups, either within a single machine or dis-
tributed across multiple machines. This allows for the training of very large
models, such as those used in natural language processing (NLP) and computer
vision, by distributing the computation across multiple GPUs.

GPUs are useful both in training of DL Models and inference


Besides training, GPUs are also beneficial for inference, especially for appli-
cations that require real-time processing, such as autonomous driving, video

8
analytics, or interactive AI systems. The parallelism of GPUs can handle the
rapid processing needed to make quick decisions based on incoming data.

3.2 NVidia - The Game Changer


NVidia has played a vital role in GPU architectural innovation and fostering the
growth of GPU accelerated deep learning. NVidia started promoting the use of
GPUs in a general purpose ( before that GPUs were primarily considered as an
accelerator for graphics pipelines) processing, coining the term GPGPU. The C-
bassed language CUDA developed by NVidia helped programmers to gain upto
100x times spped up for non-graphical compuations too. The computaional de-
mands of deep learning tasks (mostly training but also inference) demanding
parallel processing of large matrices (tensors) of floating points matched the
compuational model of GPUs and led to the quick adoption of GPUs as the
accelerator of choice by deep learning pratitioners. NVidia went further to col-
loborate with Google for tensorflow project and also designed the TPU (Tensor
Computing Unit). Today NVidia and other GPU manufacturers provide GPUs
designed specially for handling deep learning workloads.
Deep learning has led to many recent breakthroughs in AI such as Google
DeepMind’s AlphaGo, self-driving cars, intelligent voice assistants and many
more. With NVIDIA GPU-accelerated deep learning frameworks, researchers
and data scientists can significantly speed up deep learning training, that could
otherwise take days and weeks to just hours and days. When models are ready
for deployment, developers can rely on GPU-accelerated inference platforms for
the cloud, embedded device or self-driving cars, to deliver high-performance,
low-latency inference for the most computationally-intensive deep neural net-
works.

4 Deep Learning Frameworks


Deep learning (DL) frameworks offer building blocks for designing, training,
and validating deep neural networks through a high-level programming inter-
face. Widely-used DL frameworks, such as PyTorch, JAX, TensorFlow, PyTorch
Geometric, DGL, and others, rely on GPU-accelerated libraries, such as cuDNN,
NCCL, and DALI to deliver high-performance, multi-GPU-accelerated training.
The last decade has witnessed a rapid evolution of deep learning frameworks,
shaped by advances in hardware, software, and growing research in the field.
During this time, deep learning frameworks have transitioned from specialized
tools used by researchers to widely accessible platforms driving industry applica-
tions across fields like computer vision, natural language processing.The popular
frameworks with their prominent features have been listed below. The release of
PyTorch 0.1 in January 2017 marked the transition from a Cambrian-explosion-
like proliferation of deep learning libraries, wrappers, and data-exchange formats
into an era of consolidation and unification.In the roughly two years that fol-
lowed ( 2019-2020), the landscape changed drastically. The community largely

9
Figure 2: Popular Deep Learning Frameworks

consolidated behind either PyTorch or TensorFlow, with the adoption of other


libraries dwindling, except for those filling specific niches. The following hap-
pened during this period.

• Theano:
one of the first deep learning frameworks, has ceased active development.

• TensorFlow:
Consumed Keras entirely, promoting it to a first-class API
Provided an immediate-execution “eager mode”
Released TF 2.0 with eager mode by default
• JAX:
a library by Google that was developed independently from TensorFlow,
has started gaining traction as a NumPy equivalent with GPU, autograd
and JIT capabilities.
• PyTorch:
Consumed Caffe2 for its backend
Replaced most of the low-level code reused from the Lua-based Torch project
Added support for ONNX, a vendor-neutral model description and ex-
change format
Added a delayed-execution “graph mode” runtime called TorchScript
Replaced CNTK and Chainer as the framework of choice by their respective
corporate sponsors

10
Figure 3: Popular MLFrameworks

4.1 TensorFlow versus PyTorch


As of today, the two most popular Deep Learning frameworks that evolved out of
these set of competing Deep Learning frameworks are TensorFlow and PyTorch.

4.2 PyTorch
PyTorch was first introduced in 2016-17. Before PyTorch, deep learning frame-
works often focused on either speed or usability, but not both. PyTorch has
become a popular tool in the deep learning research community by combin-
ing a focus on usability with careful performance considerations. It provides
an imperative and Pythonic programming style that supports code as a model,
makes debugging easy, and is consistent with other popular scientific computing
libraries while remaining efficient and supporting hardware accelerators such as
GPUs.
The open source deep learning framework is a Python library that performs
immediate execution of dynamic tensor computations with automatic differen-
tiation and GPU acceleration and does so while maintaining performance com-
parable to the fastest current libraries for deep learning. Today, most of its core
is written in C++, one of the primary reasons PyTorch can achieve much lower
overhead compared to other frameworks. As of today, PyTorch appears to be
best suited for drastically shortening the design, training, and testing cycle for
new neural networks for specific purposes. Hence it became very popular in the
research communities.
PyTorch 2.0 marks a major advancement in the PyTorch framework, offer-
ing enhanced performance while maintaining backward compatibility and its
Python-centric approach, which has been key to its widespread adoption in the

11
AI/ML community.
PyTorch Advantages
• PyTorch is based on Python
PyTorch is Python-centric or “pythonic”, designed for deep integration
in Python code instead of being an interface to a deep learning library
written in some other language
• Easier to learn
• Debugging
PyTorch can be debugged using one of the many widely available Python
debugging tools (for example, Python’s pdb and ipdb tools).
• Dynamic computational graphs
PyTorch supports dynamic computational graphs, which means the net-
work behavior can be changed programmatically at runtime. This makes
optimizing the model much easier.
• Data parallelism
The data parallelism feature allows PyTorch to distribute computational
work among multiple CPU or GPU cores.
• Community
PyTorch has a very active community and forums (discuss.pytorch.org)
• Distributed Training
PyTorch offers native support for asynchronous execution of collective
operations and peer-to-peer communication, accessible from both Python
and C++.

4.3 TensorFlow
TensorFlow is a very popular end-to-end open-source platform for machine
learning. It was originally developed by researchers and engineers working on
the Google Brain team before it was open-sourced.
The TensorFlow software library replaced Google’s DistBelief framework and
runs on almost all available execution platforms (CPU, GPU, TPU, Mobile,
etc.). The framework provides a math library that includes basic arithmetic
operators and trigonometric functions.
TensorFlow is currently used by various international companies, such as
Google, Uber, Microsoft, and a wide range of universities.
Keras is the high-level API of the TensorFlow platform. It provides an ap-
proachable, efficient interface for solving machine learning (ML) problems, with
a focus on modern deep learning models. The TensorFlow Lite implementation
is specially designed for edge-based machine learning. TF Lite is optimized to
run various lightweight algorithms on various resource-constrained edge devices,
such as smartphones, microcontrollers, and other chips.
TensorFlow Advantages

12
• Support and library management
TensorFlow is backed by Google and has frequent releases with new fea-
tures. It is popularly used in production environments.
• Data visualization
TensorFlow provides a tool called TensorBoard to visualize data graphi-
cally. It also allows easy debugging of nodes, reduces the effort of looking
at the whole code, and effectively resolves the neural network.
• Keras compatibility
TensorFlow is compatible with Keras, which allows its users to code some
high-level functionality sections and provides system-specific functionality
to TensorFlow (pipelining, estimators, etc.).
• Very scalable
• Compatibility
TensorFlow is compatible with many languages, such as C++, JavaScript,
Python, C, Ruby, and Swift.

• Architectural Support
TensorFlow finds its use as a hardware acceleration library due to the
parallelism of work models. It uses different distribution strategies in
GPU and CPU systems. TensorFlow also has its architecture TPU, which
performs computations faster than GPU and CPU.

4.4 Keras
Keras is the high-level API of the TensorFlow platform. It provides an approach-
able, highly-productive interface for solving machine learning (ML) problems,
with a focus on modern deep learning.
Keras covers every step of the machine learning workflow, from data pro-
cessing to hyperparameter tuning to deployment. It was developed with a focus
on enabling fast experimentation. Keras is designed to reduce cognitive load by
achieving the following goals:

• Offer simple, consistent interfaces.


• Minimize the number of actions required for common use cases.

• Provide clear, actionable error messages.


• Follow the principle of progressive disclosure of complexity: It’s
easy to get started, and you can complete advanced workflows by learning
as you go.

• Help you write concise, readable code.

13
With Keras, you have full access to the scalability and cross-platform capa-
bilities of TensorFlow. You can run Keras on a TPU Pod or large clusters of
GPUs, and you can export Keras models to run in the browser or on mobile
devices. You can also serve Keras models via a web API. Although it is now
primarily tied to TensorFlow, Keras originally supported multiple backends, in-
cluding Theano and Microsoft Cognitive Toolkit (CNTK), providing flexibility
for different environments.

Figure 4: Keras and TensorFlow

4.5 Key Takeways - PyTorch versus TensorFlow


• PyTorch vs TensorFlow
PyTorch is favored for research and dynamic projects, while TensorFlow
excels in large-scale and production environments.
• Ease
PyTorch offers a more intuitive, Pythonic approach, ideal for beginners
and rapid prototyping.
• Performance and Scalability
TensorFlow is optimized for performance, particularly in large-scale appli-
cations. PyTorch provides flexibility and is beneficial for dynamic model
adjustments.
• Community and Resources
TensorFlow has a broad, established community with extensive resources,
whereas PyTorch has a rapidly growing community, especially popular in
academic research.

14
• Real-World Applications
PyTorch is prominent in academia and research-focused industries, while
TensorFlow is widely used in industry for large-scale applications.

Figure 5: PyTorch vs TensorFlow

References
[1] https://developer.nvidia.com/deep-learning.

[2] https://www.educative.io/blog/pytorch-vs-tensorflow.
[3] https://opencv.org/blog/deep-learning-with-computer-vision/.
[4] Ambuj Mehrish, Navonil Majumder, Rishabh Bharadwaj, Rada Mihalcea,
and Soujanya Poria. A review of deep learning techniques for speech pro-
cessing. Information Fusion, 99:101869, 2023.

15

You might also like