Tensor Processing Unit

The document discusses Google's Tensor Processing Unit (TPU), a custom ASIC chip designed for machine learning and neural network workloads. It provides a high-level overview of the TPU's history, architecture, and performance advantages compared to CPUs and GPUs. The TPU is optimized for neural network operations through its use of 8-16 bit integers, large on-chip memory, and a matrix multiplication unit containing over 65,000 arithmetic logic units connected in a systolic array configuration. This design enables it to perform tens of trillions of operations per second and powers various Google services involving machine learning.

Uploaded by

Osama Asghar

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

112 views

Tensor Processing Unit

Uploaded by

Osama Asghar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Tensor Processing Unit

Tensor Proces

By: Lucas Jodon

Yelman Khan
Overview
● History
● Neural Networks
● Architecture
● Performance
● Real-World Uses
● Future Development
History of TPUs
● Google began searching for a way to support neural networking for the development of their services such as
voice recognition
○ Using existing hardware, they would require twice as many data centers
○ Development of a new architecture instead
● Norman Jouppi begins work on a new architecture to support TensorFlow
○ FPGA’s were not power-efficient enough
○ ASIC design was selected for power and performance benefits
○ Device would execute CISC instructions on many networks
○ Device was made to be programmable, but operate on matrices instead of vector/scalar
○ Resulting device was comparable to a GPU or Signal Processor
Neural Networks
● First proposed in 1944 by Warren McCullough and Walter Pitts
○ Modeled loosely on human learning
● Neural nets are a method of machine learning
○ Computer learns to perform a task by analyzing training examples
○ EX: pair several audio files with the text words they mean, the machine will then find patterns between the audio data and the
labels
○ Each incoming pairing is given a weight, which is added to pre-existing node pairings
○ Once node weights pass a predefined threshold, the pairing is considered active
● Google began development on DistBelief in 2011
○ DistBelief became TensorFlow, which officially released version 1.0.0 in February 2017
○ TensorFlow is a software library with significant machine learning support
○ TensorFlow is intended to be a production grade library for dataflow implementation
Quantization in Neural Networks
● Precision of 32-bit/16-bit floating points usually not required
● Accuracy can be maintained with 8-bit integers
● Energy consumption and hardware footprint is reduced
Architecture Overview
● Large, on-chip DRAM required for
accessing pairing weight values.
● It is Possible to simultaneously
store weights and load activations.
○ TPU can do 64,000 of these
accumulates per cycle.
● First generation used 8-bit
operands and quantization
○ Second generation uses 16-bit
● Matrix Multiplication Unit has 256
× 256 (65,536) ALUs
Architecture Overview Continued
● Minimalistic hardware design used to
improve space and power consumption
○ No caches, branch prediction, out-of-order
execution, multiprocessing, speculative
prefetching, address coalescing,
multithreading, context switching, etc.
○ Minimalism is beneficial here because TPU is
required only to run neural network
prediction
● TPU chip is half the size of the other
chips
○ 28 nm process with a die size ≤ 331 mm
○ This is partially due to simplification of
control logic
TPU Stack
● TPU performs the actual neural
network calculation
● Wide range of neural network
models
● TPU stack translates the API
calls into TPU instructions
CPUs & GPUs
● CPUs and GPUs store values in registers
● A program tracks the read/operate/write operations
● A program tells ALUs :
○ Which Register to read from
○ What operation to perform
○ Which Register to write to
Performance
● TPU consists of Matrix Multiplier Unit (MXU)
● MXU performs hundreds of thousands of operations per
clock cycle
● Reads an input value only once
● Inputs are used many times without storing back to register
● Wires connect adjacent ALUs
● Multiplication and addition are performed in specific order
● Short and energy efficient
● Design is known as systolic array
Matrix Multiplication Unit
● Contains 256 x 256 = 65,536 ALUs
● TPU runs at 700 MHz
● Able to compute 46 x 1012
multiply-and-add operations per second
● Equivalent to 92 Teraops per second in
matrix unit
USES
● RankBrain algorithm used by Google search
● Google Photos
● Google Translate
● Google Cloud Platform
Future Development
● Google Cloud TPUs
○ Uses TPU version 2
○ Each TPU include a high-speed network
○ Allows to build machine learning supercomputers called “TPU Pods”
○ Improvement in training times
○ Allows mixing and matching with other hardware which includes Skylake CPUs and NVIDIA
GPUs
Works Cited
● First In-Depth Look at Google's TPU Architecture
https://www.nextplatform.com/2017/04/05/first-depth-look-googles-tpu-architecture/
● An in-depth look at Google's first Tensor Processing Unit (TPU) | Google Cloud Big Data and
Machine Learning Blog | Google Cloud Platform
https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-googles-first-tensor-processing-u
nit-tpu
● Google Cloud TPU Details Revealed
Servethehome - https://www.servethehome.com/google-cloud-tpu-details-revealed/
● TensorFlow - Google
● https://research.googleblog.com/2015/11/tensorflow-googles-latest-machine_9.html
● Explained: Neural networks
● Larry Hardesty | MIT News Office -
http://news.mit.edu/2017/explained-neural-networks-deep-learning-0414
● https://www.nextplatform.com/2017/04/05/first-depth-look-googles-tpu-architecture/
● Google cloud TPU -
https://www.blog.google/topics/google-cloud/google-cloud-offer-tpus-machine-learning/

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
FPGA-SoC Implementation of YOLOv4 For Flying-Object Detection
No ratings yet
FPGA-SoC Implementation of YOLOv4 For Flying-Object Detection
20 pages
85% of Quranic Word Urdu Book 30 Sep 2019 With Ism Maf'ool (2021 - 07 - 12 14 - 19 - 23 UTC)
100% (1)
85% of Quranic Word Urdu Book 30 Sep 2019 With Ism Maf'ool (2021 - 07 - 12 14 - 19 - 23 UTC)
48 pages
Microblaze MCS Tutorial v3
No ratings yet
Microblaze MCS Tutorial v3
15 pages
Digital Vending Digital Vending Machine Controller Machine Controller
No ratings yet
Digital Vending Digital Vending Machine Controller Machine Controller
37 pages
AC MQ: Figure 1.6 IAS Structure
No ratings yet
AC MQ: Figure 1.6 IAS Structure
12 pages
02-General Purpose Processors
No ratings yet
02-General Purpose Processors
37 pages
Papal Commentary On The Morining and Evening Psalms
100% (1)
Papal Commentary On The Morining and Evening Psalms
3 pages
Systolic Array
No ratings yet
Systolic Array
42 pages
Parallel Architecture Classification
50% (2)
Parallel Architecture Classification
41 pages
Coa
No ratings yet
Coa
11 pages
Superpipelining
No ratings yet
Superpipelining
7 pages
S.No Topics Lec: Advanced Computer Network ETCS-401
No ratings yet
S.No Topics Lec: Advanced Computer Network ETCS-401
4 pages
Instruction Pipeline
No ratings yet
Instruction Pipeline
27 pages
OpenCL Best Practices Guide
No ratings yet
OpenCL Best Practices Guide
54 pages
GPU Wiki
No ratings yet
GPU Wiki
9 pages
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
No ratings yet
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
48 pages
Session - 27 28 - HW Microprog Implementation
No ratings yet
Session - 27 28 - HW Microprog Implementation
13 pages
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
Chapter 06
No ratings yet
Chapter 06
76 pages
Patterson6e MIPS Ch04 PPT
No ratings yet
Patterson6e MIPS Ch04 PPT
137 pages
Unit-6: Pipeline & Vector Processing
No ratings yet
Unit-6: Pipeline & Vector Processing
41 pages
1-IAS Architecture-12-12-2022
No ratings yet
1-IAS Architecture-12-12-2022
34 pages
PowerPoint Slides To Chapter 07
No ratings yet
PowerPoint Slides To Chapter 07
49 pages
Architecture
No ratings yet
Architecture
21 pages
Computer Architecture and Parallel Processing
No ratings yet
Computer Architecture and Parallel Processing
29 pages
Week4 PDF
100% (1)
Week4 PDF
56 pages
Computer Organization & Architecture
No ratings yet
Computer Organization & Architecture
55 pages
Microblaze C Reference
No ratings yet
Microblaze C Reference
1 page
Csa Mod 2
100% (1)
Csa Mod 2
28 pages
Assignment
No ratings yet
Assignment
29 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
Advanced Computer Architecture: CSE-401 E
No ratings yet
Advanced Computer Architecture: CSE-401 E
71 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
Design Issues: SMT and CMP Architectures
No ratings yet
Design Issues: SMT and CMP Architectures
9 pages
Microprocessor - Overview: How Does A Microprocessor Work?
No ratings yet
Microprocessor - Overview: How Does A Microprocessor Work?
8 pages
MP Unit-6 Se-Ii
No ratings yet
MP Unit-6 Se-Ii
51 pages
Unit Iii - 80286
No ratings yet
Unit Iii - 80286
44 pages
Laboratory Experiment For Digital Electronics
No ratings yet
Laboratory Experiment For Digital Electronics
1 page
Superscalar Vs Superpipeline Processor
No ratings yet
Superscalar Vs Superpipeline Processor
17 pages
The History of Microprocessor
No ratings yet
The History of Microprocessor
13 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
36 pages
Unit 1 Introduction To Embedded Systems
No ratings yet
Unit 1 Introduction To Embedded Systems
61 pages
GPU
No ratings yet
GPU
17 pages
CH01-COA10e Stallings
No ratings yet
CH01-COA10e Stallings
48 pages
Eit Practical File (8719139)
No ratings yet
Eit Practical File (8719139)
54 pages
Microprocessors & Interfacing
No ratings yet
Microprocessors & Interfacing
43 pages
[FREE PDF sample] Multicore and GPU Programming An Integrated Approach 2nd Edition Gerassimos Barlas ebooks
100% (4)
[FREE PDF sample] Multicore and GPU Programming An Integrated Approach 2nd Edition Gerassimos Barlas ebooks
40 pages
Unit 2 MPMC Notes
No ratings yet
Unit 2 MPMC Notes
37 pages
Superscalar and Super Pipelined Processors
No ratings yet
Superscalar and Super Pipelined Processors
3 pages
Soft Computing Assignment
100% (1)
Soft Computing Assignment
13 pages
Chapters 1 and 3: ARM Processor Architecture
No ratings yet
Chapters 1 and 3: ARM Processor Architecture
44 pages
CS 8491 Computer Architecture
No ratings yet
CS 8491 Computer Architecture
103 pages
Hardwired and Microprogrammed Control2
No ratings yet
Hardwired and Microprogrammed Control2
4 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
ES & IOT UNIT 1 - Notes
No ratings yet
ES & IOT UNIT 1 - Notes
33 pages
ECE699 Lecture 10 Linux On Zynq
No ratings yet
ECE699 Lecture 10 Linux On Zynq
23 pages
report
No ratings yet
report
9 pages
Motivation_for_and_Evaluation_of_the_First_Tensor_Processing_Unit
No ratings yet
Motivation_for_and_Evaluation_of_the_First_Tensor_Processing_Unit
10 pages
Google TPU
No ratings yet
Google TPU
27 pages
Tensor Processing Unit
50% (2)
Tensor Processing Unit
23 pages
Lecture Notes For EE-226 Circuit Analysis-II: Dr. Ghulam Mustafa
No ratings yet
Lecture Notes For EE-226 Circuit Analysis-II: Dr. Ghulam Mustafa
14 pages
Hostel Allotment Females Feb2021
No ratings yet
Hostel Allotment Females Feb2021
4 pages
16B FYP REPORT TEMPLATE - EL Dept UIT (UPDATED)
No ratings yet
16B FYP REPORT TEMPLATE - EL Dept UIT (UPDATED)
26 pages
Calculus-Ii (Bs 2 Semester) Assignment No. 4 Deadline: April 24, 2020 Total Points: 04
No ratings yet
Calculus-Ii (Bs 2 Semester) Assignment No. 4 Deadline: April 24, 2020 Total Points: 04
2 pages
Intel® UHD Graphics 620 - Thursday, 03 March 2022
No ratings yet
Intel® UHD Graphics 620 - Thursday, 03 March 2022
2 pages
Rockee T. Bull: Technical Writing Cover Letter Sample
No ratings yet
Rockee T. Bull: Technical Writing Cover Letter Sample
1 page
Question: What's Meant by "Nafs"? Write Down Its Types With Some Explanation
No ratings yet
Question: What's Meant by "Nafs"? Write Down Its Types With Some Explanation
1 page
CEP Grouping (ECD Fall 2020)
No ratings yet
CEP Grouping (ECD Fall 2020)
1 page
Email Extractor-FAQ
No ratings yet
Email Extractor-FAQ
14 pages
Cep, Ecd2020
100% (1)
Cep, Ecd2020
2 pages
EDC Lab Manual (Exp - 5)
No ratings yet
EDC Lab Manual (Exp - 5)
8 pages
Powerful Pulleys - Lesson - TeachEngineering
No ratings yet
Powerful Pulleys - Lesson - TeachEngineering
8 pages
Paragraph #2 - Writing Guide
No ratings yet
Paragraph #2 - Writing Guide
4 pages
Nalanda Vidhyalaya: ANNUAL EXAM (2020-2021)
No ratings yet
Nalanda Vidhyalaya: ANNUAL EXAM (2020-2021)
2 pages
2016 Pascal Contest
No ratings yet
2016 Pascal Contest
5 pages
442 - Conditional Sentences If Clause Type 2 and Wish Clauses Test A1 A2 Level Exercises
No ratings yet
442 - Conditional Sentences If Clause Type 2 and Wish Clauses Test A1 A2 Level Exercises
2 pages
ASNS2613 Chinese Thought, Lecture 13 - Qing Dynasty Thought
No ratings yet
ASNS2613 Chinese Thought, Lecture 13 - Qing Dynasty Thought
11 pages
reading narration
No ratings yet
reading narration
2 pages
CS101 Quiz
No ratings yet
CS101 Quiz
6 pages
The Spiritual Warrior's Guide To Defeating Jezebel
100% (18)
The Spiritual Warrior's Guide To Defeating Jezebel
22 pages
N S S S S S: N N R R
No ratings yet
N S S S S S: N N R R
5 pages
11 Ems Reading Diagnostic Test Form
No ratings yet
11 Ems Reading Diagnostic Test Form
2 pages
Download Complete A Little Bit of Mantras An Introduction to Sacred Sounds Lily Cushman PDF for All Chapters
100% (3)
Download Complete A Little Bit of Mantras An Introduction to Sacred Sounds Lily Cushman PDF for All Chapters
62 pages
Greek Mythology and Gods Thesis Defense by Slidesgo
No ratings yet
Greek Mythology and Gods Thesis Defense by Slidesgo
53 pages
Allow Somebody To, Be Allowed To - Theory and Exercises
No ratings yet
Allow Somebody To, Be Allowed To - Theory and Exercises
3 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Passive Voice - 6 Tasks
No ratings yet
Passive Voice - 6 Tasks
3 pages
Tracing Changes Through A Thousand Changes
No ratings yet
Tracing Changes Through A Thousand Changes
3 pages
Curriculum Vitae-dikompresi
No ratings yet
Curriculum Vitae-dikompresi
19 pages
7UM61Gen Relay Man STG GTG
No ratings yet
7UM61Gen Relay Man STG GTG
420 pages
English Grade 5 Worksheet - Week 19: Name: .. Class: .
No ratings yet
English Grade 5 Worksheet - Week 19: Name: .. Class: .
9 pages
Vocalcom Requirements Network 1.0
No ratings yet
Vocalcom Requirements Network 1.0
4 pages
Figur 1: OPC Can Be Used To Conveniently Access S7, S5 and Other Controllers
No ratings yet
Figur 1: OPC Can Be Used To Conveniently Access S7, S5 and Other Controllers
6 pages
Selected Writings of Mahamahopadhyaya Gopinath Kaviraj
100% (6)
Selected Writings of Mahamahopadhyaya Gopinath Kaviraj
205 pages
Grade 5 - Unit 3 - Worksheet
No ratings yet
Grade 5 - Unit 3 - Worksheet
13 pages
K - Map Simplication and Binary Adder
No ratings yet
K - Map Simplication and Binary Adder
21 pages
Global Beginner Tests Unit 5 Work & Play Part A Grammar and Vocabulary
No ratings yet
Global Beginner Tests Unit 5 Work & Play Part A Grammar and Vocabulary
5 pages
7es DLP Final Output
No ratings yet
7es DLP Final Output
5 pages
Veritas - DLO - 9.8 - BOI Setup and Configuration Guide
No ratings yet
Veritas - DLO - 9.8 - BOI Setup and Configuration Guide
40 pages
The All India Muslim League
No ratings yet
The All India Muslim League
8 pages
Narrative Structure Powerpoint
No ratings yet
Narrative Structure Powerpoint
16 pages