Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
Frameworks
Sudip Bhattacharya, BIT Durg
July 2024
Contents
1 Abstract 2
1
1 Abstract
Deep Learning has been found to be very effective in the areas of image/Video
processing, natural language processing and speech processing. Deeper and
deeper CNNs have with skip connections have been meeting higher benchmarks
with famous models like Reset and VGG. Video processing involves the temporal
dimension too so time series based models like RNNs and its variants with
the capability of remembering longer sequences like LSTMs and GRUs comes
into picture. Vision Transformers have been very succesful in many tasks as
compared to classical deep learning models in these domains. This pchapter
lists some prominent models designed by academia / industry for these tasks.
NLP and speech processing domains have been primarlity dominated by
RNNs and tranformer based models like BERTs. The recent advances in GPT
and the release of GPT based models in public domain like chatGPT have
traiggered a lot of reserach and businnes interests in these domains. This chapter
lists some prominent models in tasks performed in the realm of NLP and speech
processing.
GPUs have been the game changers in the evolution and explosive growth of
deep learning since the last 2 decades. The chapter highlights the suitability of
GPU by the virtue of its architecture for deep learning workloads. This chapter
also discusses the pivotal role of NVidia in promoting the GPU based ecosystem
for deep learning.
Deep learng software frameworks has been a competitive market with a
plethora of software libararies/frameworks which evolved in the last 2 decades.
This chapter lists some of the most successful frameworks that were available
in this period. Inspite of being a competitive space, the current time witnessed
only two major frameworks - PyTorch and Tensorflow( with Keras API). While
PyTorch is becoming the fist choice in the resesearch community , Tensorflow has
an edge in production environments. The section of DL Frameworks provides a
head to head comparison between PyTorch and Tensorflow.
2
• Computational Complexity : Training deep learning models, particularly
on large datasets, demands considerable computational resources, such as
high-performance GPUs or TPUs. This can be costly and time-consuming.
Handling high-resolution images or 3D data (like in medical imaging) re-
quires substantial memory, which can limit model size and batch process-
ing capabilities.
• Interpretability and Explainability: Deep learning models, especially deep
neural networks, are often regarded as ”black boxes.” Understanding how
they make decisions is challenging, which limits their application in critical
fields where explainability is essential, such as healthcare or autonomous
driving.
3
• Multi-Modal and Conversational Understanding : Many NLP tasks, such
as understanding video captions or social media posts, require integrating
information from multiple modalities (e.g., images, text, and audio), which
adds complexity.
• AlexNet
Pioneered deep learning in computer vision, winning the 2012 ImageNet
competition. It is known for using ReLU activations and dropout for
improved training.
• VGGNet
Known for its simplicity and depth, VGGNet stacks multiple 3x3 convo-
lutional layers, followed by fully connected layers. It is popular for tasks
where simplicity and performance are essential.
• ResNet
Introduced the concept of residual connections, allowing very deep net-
works (up to hundreds of layers) without the vanishing gradient problem.
ResNet models (e.g., ResNet-50, ResNet-101) are popular for many com-
puter vision tasks.
• Inception (GoogLeNet)
Inception uses a combination of different-sized filters and pooling layers
in parallel, enabling the model to capture features at various scales within
each layer.
• YOLO (You Only Look Once)
A fast and efficient object detection model that processes an image in
one pass, making it suitable for real-time applications. Variants include
YOLOv3, YOLOv4, and YOLOv5.
• Mask R-CNN
Extends Faster R-CNN by adding a branch for predicting segmentation
masks on each Region of Interest (RoI). It is widely used for instance
segmentation in images.
• U-Net
a convolutional neural network architecture designed for precise image
segmentation, commonly used in biomedical image analysis for tasks like
tumor and organ segmentation.
4
• Vision Transformers
apply the Transformer architecture to image analysis, effectively captur-
ing long-range dependencies and achieving state-of-the-art performance in
tasks like image classification.
• Transformer
introduced self-attention mechanisms, allowing for efficient parallel pro-
cessing of sequential data
• BERT (Bidirectional Encoder Representations from Transformers)
pre-trained Transformer model that captures context in both directions
5
• GPT (Generative Pre-trained Transformer)
Transformer-based models that excel at generating human-like text
• XLNet
XLNet improves on BERT by leveraging autoregressive and bidirectional
contexts T5 (Text-to-Text Transfer Transformer)
6
tating using a tokenizer or a speech recognition system, introducing errors and
noise. Fur thermore, pre-training on large-scale text corpora can lead to domain
mismatch problems when processing speech data. To address these limitations,
dedicated frameworks have been developed for learning speech representations
using trans formers, including wav2vec, data2vec, Whisper , VALL-E, Unis-
peech, SpeechT5etc.
7
Figure 1: Standard Activation Function
and tensor operations central to deep learning, which involve large-scale compu-
tations on multidimensional data arrays.This model is called Single Instruction
Muliple Data (SIMD).
8
analytics, or interactive AI systems. The parallelism of GPUs can handle the
rapid processing needed to make quick decisions based on incoming data.
9
Figure 2: Popular Deep Learning Frameworks
• Theano:
one of the first deep learning frameworks, has ceased active development.
• TensorFlow:
Consumed Keras entirely, promoting it to a first-class API
Provided an immediate-execution “eager mode”
Released TF 2.0 with eager mode by default
• JAX:
a library by Google that was developed independently from TensorFlow,
has started gaining traction as a NumPy equivalent with GPU, autograd
and JIT capabilities.
• PyTorch:
Consumed Caffe2 for its backend
Replaced most of the low-level code reused from the Lua-based Torch project
Added support for ONNX, a vendor-neutral model description and ex-
change format
Added a delayed-execution “graph mode” runtime called TorchScript
Replaced CNTK and Chainer as the framework of choice by their respective
corporate sponsors
10
Figure 3: Popular MLFrameworks
4.2 PyTorch
PyTorch was first introduced in 2016-17. Before PyTorch, deep learning frame-
works often focused on either speed or usability, but not both. PyTorch has
become a popular tool in the deep learning research community by combin-
ing a focus on usability with careful performance considerations. It provides
an imperative and Pythonic programming style that supports code as a model,
makes debugging easy, and is consistent with other popular scientific computing
libraries while remaining efficient and supporting hardware accelerators such as
GPUs.
The open source deep learning framework is a Python library that performs
immediate execution of dynamic tensor computations with automatic differen-
tiation and GPU acceleration and does so while maintaining performance com-
parable to the fastest current libraries for deep learning. Today, most of its core
is written in C++, one of the primary reasons PyTorch can achieve much lower
overhead compared to other frameworks. As of today, PyTorch appears to be
best suited for drastically shortening the design, training, and testing cycle for
new neural networks for specific purposes. Hence it became very popular in the
research communities.
PyTorch 2.0 marks a major advancement in the PyTorch framework, offer-
ing enhanced performance while maintaining backward compatibility and its
Python-centric approach, which has been key to its widespread adoption in the
11
AI/ML community.
PyTorch Advantages
• PyTorch is based on Python
PyTorch is Python-centric or “pythonic”, designed for deep integration
in Python code instead of being an interface to a deep learning library
written in some other language
• Easier to learn
• Debugging
PyTorch can be debugged using one of the many widely available Python
debugging tools (for example, Python’s pdb and ipdb tools).
• Dynamic computational graphs
PyTorch supports dynamic computational graphs, which means the net-
work behavior can be changed programmatically at runtime. This makes
optimizing the model much easier.
• Data parallelism
The data parallelism feature allows PyTorch to distribute computational
work among multiple CPU or GPU cores.
• Community
PyTorch has a very active community and forums (discuss.pytorch.org)
• Distributed Training
PyTorch offers native support for asynchronous execution of collective
operations and peer-to-peer communication, accessible from both Python
and C++.
4.3 TensorFlow
TensorFlow is a very popular end-to-end open-source platform for machine
learning. It was originally developed by researchers and engineers working on
the Google Brain team before it was open-sourced.
The TensorFlow software library replaced Google’s DistBelief framework and
runs on almost all available execution platforms (CPU, GPU, TPU, Mobile,
etc.). The framework provides a math library that includes basic arithmetic
operators and trigonometric functions.
TensorFlow is currently used by various international companies, such as
Google, Uber, Microsoft, and a wide range of universities.
Keras is the high-level API of the TensorFlow platform. It provides an ap-
proachable, efficient interface for solving machine learning (ML) problems, with
a focus on modern deep learning models. The TensorFlow Lite implementation
is specially designed for edge-based machine learning. TF Lite is optimized to
run various lightweight algorithms on various resource-constrained edge devices,
such as smartphones, microcontrollers, and other chips.
TensorFlow Advantages
12
• Support and library management
TensorFlow is backed by Google and has frequent releases with new fea-
tures. It is popularly used in production environments.
• Data visualization
TensorFlow provides a tool called TensorBoard to visualize data graphi-
cally. It also allows easy debugging of nodes, reduces the effort of looking
at the whole code, and effectively resolves the neural network.
• Keras compatibility
TensorFlow is compatible with Keras, which allows its users to code some
high-level functionality sections and provides system-specific functionality
to TensorFlow (pipelining, estimators, etc.).
• Very scalable
• Compatibility
TensorFlow is compatible with many languages, such as C++, JavaScript,
Python, C, Ruby, and Swift.
• Architectural Support
TensorFlow finds its use as a hardware acceleration library due to the
parallelism of work models. It uses different distribution strategies in
GPU and CPU systems. TensorFlow also has its architecture TPU, which
performs computations faster than GPU and CPU.
4.4 Keras
Keras is the high-level API of the TensorFlow platform. It provides an approach-
able, highly-productive interface for solving machine learning (ML) problems,
with a focus on modern deep learning.
Keras covers every step of the machine learning workflow, from data pro-
cessing to hyperparameter tuning to deployment. It was developed with a focus
on enabling fast experimentation. Keras is designed to reduce cognitive load by
achieving the following goals:
13
With Keras, you have full access to the scalability and cross-platform capa-
bilities of TensorFlow. You can run Keras on a TPU Pod or large clusters of
GPUs, and you can export Keras models to run in the browser or on mobile
devices. You can also serve Keras models via a web API. Although it is now
primarily tied to TensorFlow, Keras originally supported multiple backends, in-
cluding Theano and Microsoft Cognitive Toolkit (CNTK), providing flexibility
for different environments.
14
• Real-World Applications
PyTorch is prominent in academia and research-focused industries, while
TensorFlow is widely used in industry for large-scale applications.
References
[1] https://developer.nvidia.com/deep-learning.
[2] https://www.educative.io/blog/pytorch-vs-tensorflow.
[3] https://opencv.org/blog/deep-learning-with-computer-vision/.
[4] Ambuj Mehrish, Navonil Majumder, Rishabh Bharadwaj, Rada Mihalcea,
and Soujanya Poria. A review of deep learning techniques for speech pro-
cessing. Information Fusion, 99:101869, 2023.
15