0% found this document useful (0 votes)

15 views

Chapter 12 (Ditributing TensorFlow) Fa21-Bse-036

Uploaded by

SAMI Ul Mubeen

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Chapter 12 (Ditributing TensorFlow) Fa21-Bse-036

Uploaded by

SAMI Ul Mubeen

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

CHAPTER 12

Distributing TensorFlow Across Devices and

Servers
Multiple Devices on a Single Machine
 To get started, we'll look at how to set up your environment
to use multiple GPU cards on a single machine. This can
often provide a major performance boost without the added
complexity of a distributed setup. We'll cover the necessary
installation steps, including downloading and configuring
the CUDA and cuDNN libraries. Then we'll dive into how
to place operations on specific devices, either manually or
using TensorFlow's dynamic placer algorithm. We'll also
discuss strategies for managing GPU memory to avoid
running out of resources when running multiple TensorFlow
processes on the same machine.
Parallel Execution
Evaluating Source Nodes
 When TensorFlow runs a graph, it first identifies the list of nodes that need to

be evaluated and counts their dependencies. It then starts evaluating the

nodes with zero dependencies, known as source nodes. If these nodes are
placed on separate devices, they will be executed in parallel.
Leveraging Thread Pools
 TensorFlow manages thread pools on each device to parallelize operations.

Some operations also have multithreaded kernels that can utilize the intra-op
thread pools to split computations across multiple threads on the same
device.
Controlling Parallelism
 You can control the number of threads used in the inter-op and intra-op

thread pools by setting the appropriate configuration options. This allows you
to fine-tune the parallelism to match the characteristics of your hardware and
workload.
Multiple Devices Across Multiple Servers
Defining a Cluster
 To run a TensorFlow graph across multiple servers, you first need to define a cluster. A

cluster is composed of one or more TensorFlow servers, called tasks, typically spread
across several machines. Each task belongs to a job, which is a named group of tasks
with a common role, such as storing model parameters or performing computations.
Placing Operations
 You can use device blocks to pin operations on any device managed by any task in the

cluster, specifying the job name, task index, device type, and device index.
TensorFlow also provides the replica_device_setter() function to automatically
distribute variables across parameter servers in a round-robin fashion.
Sharing State
 In a distributed setup, variable state is managed by resource containers located on the

cluster, not by individual sessions. This allows multiple sessions to seamlessly share
the same variables, even if they are connected to different servers. TensorFlow also
provides queues and readers that can be shared across sessions to enable asynchronous
data loading.
Efficient Data Loading
Preloading the Data
 For datasets that can fit in memory, you can preload the training data into a variable and use

that variable in your graph. This ensures the data is only transferred once from the client to the
cluster, rather than being repeatedly loaded and fed through placeholders.

Reading Directly from the Graph

 For larger datasets, you can use TensorFlow's reader operations to read data directly from the

filesystem, without the data ever passing through the client. This allows you to build a data
loading pipeline that runs in parallel with the training computations.

Multithreaded Readers
 To further improve data loading throughput, you can use TensorFlow's Coordinator and

QueueRunner classes to manage multiple threads that simultaneously read from multiple files
and push the data into a shared queue.

Convenience Functions
 TensorFlow provides several convenience functions, such as string_input_producer() and

shuffle_batch(), to simplify the creation of these asynchronous data loading pipelines.

Parallelizing Neural Networks
One Network per Device
 The simplest way to parallelize neural networks is to place each one on a different

device, either on the same machine or across multiple machines in a cluster. This is
perfect for hyperparameter tuning or serving a high volume of queries.
In-Graph Replication
 For parallelizing the training of a large ensemble of neural networks, you can create a

single graph containing all the networks, each placed on a different device, plus the
computations needed to aggregate the individual predictions.
Between-Graph Replication
 Alternatively, you can create separate graphs for each neural network and coordinate

their execution using queues, with one client handling the input distribution and
another aggregating the outputs.
Scalable Performance
 By leveraging the distributed capabilities of TensorFlow, you can achieve near-linear

speedups when parallelizing neural network training or inference, allowing you to

tackle much larger and more complex models.
Distributing Computations Efficiently
Placement Strategies
 Carefully placing operations on the appropriate devices is crucial for
optimizing performance. TensorFlow provides several options, from manual
device blocks to dynamic placement algorithms, allowing you to control the
distribution of computations.
Controlling Parallelism
 Managing the thread pools and controlling the degree of parallelism is
important to avoid bottlenecks and ensure efficient utilization of your
hardware resources. TensorFlow gives you knobs to tune the inter-op and
intra-op parallelism.
Optimizing Data Flow
 Minimizing data transfers between devices and servers is key to achieving
high performance. Techniques like preloading data, using readers, and
managing queues can help optimize the data flow and reduce
communication overhead.
Coordinating Asynchronous
Computations
Leveraging Queues
 TensorFlow's queues provide a powerful mechanism for coordinating asynchronous computations,

such as loading data in the background while training a model. Queues allow you to decouple the data
pipeline from the training pipeline, improving overall throughput.

Controlling Dependencies
 Adding control dependencies between operations can help you postpone the execution of memory-

intensive or communication-heavy computations until they are truly needed, allowing other operations
to run in parallel and improving resource utilization.

Managing State
 The distributed nature of TensorFlow's resource containers allows you to seamlessly share variables,

queues, and other stateful objects across multiple sessions, simplifying the coordination of your
distributed computations.

Leveraging Coordinators
 TensorFlow's Coordinator and QueueRunner classes make it easier to manage the lifecycle of

asynchronous threads, ensuring they start and stop gracefully and avoiding deadlocks or other
concurrency issues.
Achieving Scalable Performance
Technique Benefits
Distributing computations across multiple Reduces training time for large neural
devices networks, allows exploring larger
hyperparameter spaces
Efficient data loading pipelines Ensures data is available when needed,
without becoming a bottleneck
Coordinating asynchronous computations Improves resource utilization, enables
overlapping of data loading and training
Leveraging TensorFlow's distributed Provides a scalable and flexible
capabilities framework for building high-performance
machine learning applications
Conclusion
Unlocking the Power of Distributed Computing:
 By mastering the techniques for distributing TensorFlow computations across

devices and servers, you can unlock the true potential of your hardware
resources and tackle much larger and more complex machine learning
problems. Whether it's speeding up the training of neural networks, exploring
a wider range of hyperparameters, or serving high volumes of queries, the
distributed capabilities of TensorFlow provide a powerful and flexible
foundation for building scalable, high-performance applications.

Embracing the Future of Distributed ML:

 As the field of machine learning continues to evolve, the ability to efficiently

leverage distributed computing resources will become increasingly important.

By understanding the concepts and techniques covered in this chapter, you'll
be well-equipped to build the next generation of powerful, scalable machine
learning systems using TensorFlow.

Professional Machine Learning Engineer V12.75
100% (1)
Professional Machine Learning Engineer V12.75
26 pages
Deep Learning TensorFlow and Keras
No ratings yet
Deep Learning TensorFlow and Keras
454 pages
Cours 3 - Custom Models and Training With TensorFlow
No ratings yet
Cours 3 - Custom Models and Training With TensorFlow
36 pages
Tensorflow Tutorial PDF
100% (6)
Tensorflow Tutorial PDF
90 pages
Unit1-Building Models With Tensorflow
No ratings yet
Unit1-Building Models With Tensorflow
17 pages
Abinitio Newtopics
No ratings yet
Abinitio Newtopics
5 pages
Preet Hi
No ratings yet
Preet Hi
75 pages
Tensorflow World Resources Readthedocs Io en Latest
No ratings yet
Tensorflow World Resources Readthedocs Io en Latest
21 pages
Tensor Flow Guide
No ratings yet
Tensor Flow Guide
25 pages
TensorFlow Roadmap
No ratings yet
TensorFlow Roadmap
22 pages
Introduction To TensorFlow
No ratings yet
Introduction To TensorFlow
3 pages
tf
No ratings yet
tf
13 pages
Appendix Tensorflow PDF
50% (8)
Appendix Tensorflow PDF
14 pages
Dzone Rc251 Gettingstartedwithtensorflow
No ratings yet
Dzone Rc251 Gettingstartedwithtensorflow
5 pages
DeepTrading With TensorFlow 1 - TodoTrader
No ratings yet
DeepTrading With TensorFlow 1 - TodoTrader
6 pages
Ansari H. Mastering TensorFlow. Unleashing the Power of Deep Learning...2024
No ratings yet
Ansari H. Mastering TensorFlow. Unleashing the Power of Deep Learning...2024
134 pages
Tensor Flow
No ratings yet
Tensor Flow
130 pages
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-3
No ratings yet
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-3
18 pages
Deep Learning
No ratings yet
Deep Learning
45 pages
TensorFlow
No ratings yet
TensorFlow
3 pages
MLT Unit 1 & 2
No ratings yet
MLT Unit 1 & 2
119 pages
09 Tensorflow101 Slide
No ratings yet
09 Tensorflow101 Slide
78 pages
CSE488_Lab7_Neural Networks and TensorFlow
No ratings yet
CSE488_Lab7_Neural Networks and TensorFlow
21 pages
Tensorflow Usage: Babii Andrii
No ratings yet
Tensorflow Usage: Babii Andrii
33 pages
What is TensorFlow
No ratings yet
What is TensorFlow
38 pages
Ultimate Guide To Tensorflow 2.0 in Python
No ratings yet
Ultimate Guide To Tensorflow 2.0 in Python
23 pages
A Quick Introduction To Tensorflow: Machine Learning Spring 2019
100% (1)
A Quick Introduction To Tensorflow: Machine Learning Spring 2019
22 pages
Tensor Flow
No ratings yet
Tensor Flow
19 pages
Tensorflow: Large-Scale Machine Learning On Heterogeneous Distributed Systems
No ratings yet
Tensorflow: Large-Scale Machine Learning On Heterogeneous Distributed Systems
4 pages
ML 2
No ratings yet
ML 2
4 pages
SRS Documentation - TensorFlow
No ratings yet
SRS Documentation - TensorFlow
16 pages
TensorFlow Overview
No ratings yet
TensorFlow Overview
4 pages
Tensorflow
No ratings yet
Tensorflow
29 pages
Article_python_TensorFlow: A system for large-scale machine learning
No ratings yet
Article_python_TensorFlow: A system for large-scale machine learning
18 pages
INT527_Unit1_1
No ratings yet
INT527_Unit1_1
13 pages
mlt ese
No ratings yet
mlt ese
21 pages
AML Lecture1.3
No ratings yet
AML Lecture1.3
72 pages
106106213
No ratings yet
106106213
637 pages
01 - Lecture Slide - Overview of Tensorflow
100% (1)
01 - Lecture Slide - Overview of Tensorflow
65 pages
1 TensorFlow
No ratings yet
1 TensorFlow
66 pages
Chapter DeepLearningwithTensorFlow
No ratings yet
Chapter DeepLearningwithTensorFlow
19 pages
Chap 3 TensorFlow
No ratings yet
Chap 3 TensorFlow
24 pages
Tensor Flow
No ratings yet
Tensor Flow
4 pages
Tensorflow PDF
No ratings yet
Tensorflow PDF
62 pages
Week 13 GCP Lec Notes
No ratings yet
Week 13 GCP Lec Notes
28 pages
Tensor Flow 101
100% (8)
Tensor Flow 101
58 pages
Tensor
No ratings yet
Tensor
19 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
57 pages
14 DL Frameworks
No ratings yet
14 DL Frameworks
30 pages
Lec2 - Intro To Tensorflow
No ratings yet
Lec2 - Intro To Tensorflow
120 pages
Aiml End 3
No ratings yet
Aiml End 3
2 pages
Deep learning1
No ratings yet
Deep learning1
23 pages
24-TensorFlow-Clipper
No ratings yet
24-TensorFlow-Clipper
35 pages
ML Unit-5
No ratings yet
ML Unit-5
19 pages
Tensorflow Object Detection Api Tutorial PDF
No ratings yet
Tensorflow Object Detection Api Tutorial PDF
41 pages
Mastering TensorFlow: From Basics to Expert Proficiency
From Everand
Mastering TensorFlow: From Basics to Expert Proficiency
William Smith
No ratings yet
chp 3 (3)
No ratings yet
chp 3 (3)
6 pages
Image Recognitiion
No ratings yet
Image Recognitiion
50 pages
0jhKAy5cS6K4SgMuXHuiyg - TensorFlow On Google Cloud - Course Summary
No ratings yet
0jhKAy5cS6K4SgMuXHuiyg - TensorFlow On Google Cloud - Course Summary
7 pages
Day 1 S2
No ratings yet
Day 1 S2
24 pages
TensorFlow
No ratings yet
TensorFlow
6 pages
Introduction to Artificial Neural Networks
No ratings yet
Introduction to Artificial Neural Networks
31 pages
Azure Data Engineering Project Part 1
No ratings yet
Azure Data Engineering Project Part 1
41 pages
Pipe-Routing Algorithm - Case Study PDF
No ratings yet
Pipe-Routing Algorithm - Case Study PDF
11 pages
Creating_Efficient_Data_Pipelines_for_Simulation_Projects
No ratings yet
Creating_Efficient_Data_Pipelines_for_Simulation_Projects
4 pages
Parallelizing High-Frequency Trading Applications by Using C++11 Attributes
No ratings yet
Parallelizing High-Frequency Trading Applications by Using C++11 Attributes
8 pages
Dw02 Mult 6 Stage
No ratings yet
Dw02 Mult 6 Stage
4 pages
Full download (Ebook) Multimedia Information Storage and Retrieval: Techniques and Technologies by Philip K. C. Tse ISBN 9781599042251, 1599042258 pdf docx
100% (4)
Full download (Ebook) Multimedia Information Storage and Retrieval: Techniques and Technologies by Philip K. C. Tse ISBN 9781599042251, 1599042258 pdf docx
67 pages
Micro Blaze Processor Archtechture
No ratings yet
Micro Blaze Processor Archtechture
5 pages
DML Dynamic Partial Reconfiguration With Scalable Task Scheduling for Multi-Applications on FPGAs
No ratings yet
DML Dynamic Partial Reconfiguration With Scalable Task Scheduling for Multi-Applications on FPGAs
15 pages
PCC-CS402
No ratings yet
PCC-CS402
7 pages
Excercise v1
No ratings yet
Excercise v1
20 pages
8 Sem Syllabus CE
No ratings yet
8 Sem Syllabus CE
13 pages
Xapp1209 Designing Protocol Processing Systems Hls
No ratings yet
Xapp1209 Designing Protocol Processing Systems Hls
24 pages
COA Notes
No ratings yet
COA Notes
99 pages
Questions Ch5 1
No ratings yet
Questions Ch5 1
2 pages
OpenCL Programming
100% (1)
OpenCL Programming
246 pages
Tessent Testkompress User'S Manual: Software Version 2015.2
No ratings yet
Tessent Testkompress User'S Manual: Software Version 2015.2
364 pages
FPGA Implementation of I2C and SPI Protocols
No ratings yet
FPGA Implementation of I2C and SPI Protocols
5 pages
Computer Architectur by FM Sir
No ratings yet
Computer Architectur by FM Sir
23 pages
Anna University QP COA
No ratings yet
Anna University QP COA
3 pages
Principles of Linear Pipelining
50% (2)
Principles of Linear Pipelining
71 pages
S8906: Fast Data Pipelines For Deep Learning Training: Przemek Tredak, Simon Layton
No ratings yet
S8906: Fast Data Pipelines For Deep Learning Training: Przemek Tredak, Simon Layton
41 pages
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
No ratings yet
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
82 pages
4-Data Processing Pipelines in Science and Business
100% (1)
4-Data Processing Pipelines in Science and Business
22 pages
Module-5 ACA PDF
100% (1)
Module-5 ACA PDF
30 pages
Pipeline 2
No ratings yet
Pipeline 2
10 pages
Computer Organization and Architecture: GATE CS Topic Wise Questions
No ratings yet
Computer Organization and Architecture: GATE CS Topic Wise Questions
52 pages
Taking Interviw
No ratings yet
Taking Interviw
15 pages
Pentium 4 Pipe Lining
100% (5)
Pentium 4 Pipe Lining
7 pages

Chapter 12 (Ditributing TensorFlow) Fa21-Bse-036

Uploaded by

Chapter 12 (Ditributing TensorFlow) Fa21-Bse-036

Uploaded by

CHAPTER 12

Distributing TensorFlow Across Devices and

be evaluated and counts their dependencies. It then starts evaluating the

Reading Directly from the Graph

shuffle_batch(), to simplify the creation of these asynchronous data loading pipelines.

speedups when parallelizing neural network training or inference, allowing you to

Embracing the Future of Distributed ML:

leverage distributed computing resources will become increasingly important.

You might also like