1. GPU Unit-1

Uploaded by

Piyush Srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

1. GPU Unit-1

Uploaded by

Piyush Srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Traditional Computing and GPU History: BTech Exam Notes

1. Review of Traditional Computing Model

The traditional computing model, based on the von Neumann architecture, has been the cornerstone of
computer design since the 1940s. Named after mathematician and physicist John von Neumann, this
architecture defines the structure and operation of most modern computers.
Key Components :
1. Central Processing Unit (CPU)
o Control Unit (CU): Manages and coordinates computer operations
o Arithmetic Logic Unit (ALU): Performs arithmetic and logical operations
o Registers: Small, fast storage locations within the CPU
2. Memory Unit
o Random Access Memory (RAM): Volatile memory for temporary data storage
o Read-Only Memory (ROM): Non-volatile memory for permanent data storage
3. Input/Output Devices
o Input: Keyboard, mouse, sensors, etc.
o Output: Monitor, printer, speakers, etc.
4. System Bus
o Data Bus: Transfers data between components
o Address Bus: Carries memory addresses
o Control Bus: Carries control signals
Von Neumann Architecture Diagram :
Characteristics and Implications:
1. Sequential Execution: Instructions are fetched and executed one at a time, in order.
o Implication: Limits parallel processing capabilities
2. Stored Program Concept: Both data and instructions share the same memory.
o Advantage: Flexibility in programming
o Disadvantage: Potential security risks (e.g., buffer overflow attacks)
3. Von Neumann Bottleneck: The shared bus for data and instructions creates a performance
limitation.
o Solution attempts: Cache memory, Harvard architecture (separate data and instruction
memory)
4. Memory Hierarchy: Utilizes different types of memory to balance speed and capacity.
o Registers → Cache → RAM → Hard Drive
5. Fetch-Decode-Execute Cycle: The basic operational cycle of the CPU.
o Fetch: Retrieve instruction from memory
o Decode: Interpret the instruction
o Execute: Perform the operation
o Store: Save the result (if necessary)
Example: Simple Fetch-Decode-Execute Cycle
Consider a simple addition operation: A = B + C
1. Fetch: CPU retrieves the instruction "ADD B, C" from memory
2. Decode: CPU interprets this as an addition operation
3. Execute: ALU performs the addition
4. Store: Result is stored in register A
This cycle forms the basis of all operations in a von Neumann architecture computer.
2. Flynn's Taxonomy
Flynn's taxonomy, proposed by Michael J. Flynn in 1966, classifies computer architectures based on
the number of concurrent instruction streams and data streams available in the architecture.
Four Categories (Detailed):
1. SISD (Single Instruction, Single Data)
o Characteristics:
▪ One instruction stream processed by one CPU
▪ One data stream
o Operation: Sequential processing of a single data stream
o Examples:
▪ Early personal computers
▪ Traditional single-core processors
o Limitations:
▪ Limited parallelism
▪ Performance bottlenecks in complex computations
2. SIMD (Single Instruction, Multiple Data)
o Characteristics:
▪ One instruction applied to multiple data points simultaneously
▪ Multiple processing elements
o Operation: Parallel processing of multiple data elements with a single instruction
o Examples:
▪ Vector processors
▪ GPU operations
▪ Modern CPU extensions (e.g., SSE, AVX)
o Advantages:
▪ Efficient for data-parallel tasks
▪ Improved performance in multimedia and scientific applications
3. MISD (Multiple Instruction, Single Data)
o Characteristics:
▪ Multiple instructions operate on the same data
▪ Rarely implemented in practice
o Operation: Different operations applied to the same data stream
o Examples:
▪ Some fault-tolerant computer systems
▪ Theoretical use in cryptography
o Limitations:
▪ Limited practical applications
▪ Inefficient use of resources in most scenarios
4. MIMD (Multiple Instruction, Multiple Data)
o Characteristics:
▪ Multiple autonomous processors
▪ Each processor has its own instruction stream and data stream
o Operation: True parallel processing with independent tasks
o Examples:
▪ Modern multi-core CPUs
▪ Distributed computing systems
▪ Computer clusters
o Advantages:
▪ Highly flexible and scalable
▪ Suitable for a wide range of parallel computing tasks
o Challenges:
▪ Complexity in programming and coordination
▪ Potential for race conditions and deadlocks
Flynn's Taxonomy Diagram :

Implications and Modern Relevance:

1. Parallel Computing: SIMD and MIMD architectures form the basis of modern parallel
computing systems.
2. Heterogeneous Computing: Combining different categories (e.g., CPU-GPU systems) for
optimal performance in varied tasks.
3. Scalability: MIMD systems offer the best scalability for large-scale computing problems.
4. Programming Paradigms: Different architectures require different programming approaches
(e.g., vectorization for SIMD, multi-threading for MIMD).
5. Performance Optimization: Understanding these categories helps in choosing the right
architecture for specific computational problems.
Example: Matrix Multiplication in Different Architectures
Consider multiplying two 1000x1000 matrices:
• SISD: Sequential multiplication, potentially taking hours.
• SIMD: Vectorized operations, significantly faster than SISD.
• MIMD: Distributed across multiple cores or machines, potentially completing in seconds.
This example illustrates how the choice of architecture dramatically affects performance for specific
tasks.
3. Multithreading and Concurrency
Multithreading and concurrency are fundamental concepts in modern computing, enabling efficient
utilization of resources and improved performance in multi-core systems.
Multithreading (Detailed):
1. Definition: A programming and execution model that allows multiple threads of execution
within a single process.
2. Characteristics:
o Shares the same memory space and resources
o Lightweight compared to processes
o Faster context switching
3. Types of Multithreading:
o User-level threads: Managed by the application
o Kernel-level threads: Managed by the operating system
4. Benefits:
o Improved responsiveness in applications
o Efficient utilization of multi-core processors
o Better resource sharing
5. Challenges:
o Increased complexity in programming
o Potential for race conditions and deadlocks
o Overhead in thread creation and management
Concurrency (Detailed):
1. Definition: The ability of different parts of a program to be executed out-of-order or in partial
order without affecting the final outcome.
2. Characteristics:
o Does not necessarily imply parallelism
o Can be achieved on a single core through time-slicing
3. Models of Concurrency:
o Interleaving model: Alternating execution of concurrent tasks
o True parallelism: Simultaneous execution on multiple processors
4. Benefits:
o Improved system responsiveness
o Better resource utilization
o Simplified program structure for certain problems
5. Challenges:
o Difficulty in reasoning about program behavior
o Potential for non-deterministic outcomes
o Complexity in debugging
Key Concepts :
1. Thread:
o Definition: Smallest unit of execution within a process
o Components: Program counter, registers, stack
o States: New, Runnable, Running, Blocked, Terminated
2. Process:
o Definition: An instance of a computer program being executed
o Components: Code, data, resources, state
o Heavier weight than threads, with separate memory spaces
3. Context Switching:
o Definition: The process of saving and restoring the state of a thread or process
o Steps: Save current state, load new state, resume execution
o Cost: Time and resources required for switching
4. Race Condition:
o Definition: When multiple threads access shared data and try to change it
simultaneously
o Result: Non-deterministic and potentially incorrect program behavior
o Prevention: Proper synchronization mechanisms (e.g., locks, semaphores)
5. Deadlock:
o Definition: A situation where two or more threads are unable to proceed because each
is waiting for the other to release a resource
o Conditions (Coffman conditions):
▪ Mutual Exclusion
▪ Hold and Wait
▪ No Preemption
▪ Circular Wait
o Prevention: Careful resource allocation and release strategies
Advanced Concepts:
1. Thread Synchronization:
o Mutex (Mutual Exclusion)
o Semaphores
o Condition Variables
o Monitors
2. Producer-Consumer Problem:
o Classic synchronization scenario
o Involves a shared buffer, producer threads, and consumer threads
3. Reader-Writer Problem:
o Multiple readers can access data simultaneously
o Writers need exclusive access
4. Thread Pools:
o Reuse of threads to reduce creation overhead
o Common in web servers and database systems

4. Brief History of GPU Computing

The evolution of GPU (Graphics Processing Unit) computing represents a significant shift in
computational paradigms, moving from specialized graphics hardware to general-purpose computing
devices.
Detailed Timeline:
1. 1970s-1980s: Early Graphics Chips
o Purpose: Basic sprite and bitmap operations
o Examples: Atari 2600, Nintendo Entertainment System
o Limitations: Fixed-function pipelines, limited programmability
2. 1990s: Introduction of 3D Graphics Accelerators
o Key Development: Hardware acceleration of 3D graphics operations
o Notable Product: 3dfx Voodoo (1996)
o Impact: Revolutionized PC gaming and 3D graphics
3. 1999: NVIDIA Introduces the term "GPU"
o Product: NVIDIA GeForce 256
o Significance: First chip marketed as a "Graphics Processing Unit"
o Capabilities: Hardware transform and lighting (T&L)
4. 2001: Programmable Shaders Introduced
o Types: Pixel Shaders and Vertex Shaders
o Impact: Increased flexibility in graphics rendering
o Products: NVIDIA GeForce 3, ATI Radeon 8500
5. 2006: NVIDIA Introduces CUDA
o Full Name: Compute Unified Device Architecture
o Significance: Enabled general-purpose computing on GPUs
o Features: C-like programming model for GPU computing
6. 2008: OpenCL Introduced
o Purpose: Open standard for heterogeneous computing
o Advantage: Cross-platform and vendor-neutral
o Supported by: AMD, NVIDIA, Intel, and others
7. 2010s: Rise of GPGPU (General-Purpose computing on GPUs)
o Applications: Scientific simulations, AI, cryptography
o Developments: Integration of GPU computing in major frameworks (e.g.,
TensorFlow, PyTorch)
8. Present: GPUs in AI and High-Performance Computing
o Key Areas: Deep Learning, Big Data Analytics, Scientific Computing
o Trends: Specialized AI accelerators, integration with CPUs

GPU vs. CPU Architecture (Detailed Comparison):

Key Differences :
1. Processing Units:
o CPU: Few complex cores optimized for sequential processing
o GPU: Many simple cores designed for parallel processing
2. Memory Hierarchy:
o CPU: Large, multi-level cache hierarchy
o GPU: Smaller cache, focus on high-bandwidth memory access
3. Instruction Handling:
o CPU: Complex branch prediction and out-of-order execution
o GPU: Simpler in-order execution, optimized for throughput
4. Latency vs. Throughput:
o CPU: Optimized for low latency on single tasks
o GPU: Optimized for high throughput on parallel tasks
5. Programming Model:
o CPU: Versatile, suitable for various programming paradigms
o GPU: Specialized for data-parallel computations
Applications of GPU Computing (Detailed):
1. Computer Graphics and Video Processing
o 3D rendering, real-time ray tracing
o Video encoding/decoding, color grading
2. Scientific Simulations
o Fluid dynamics, molecular dynamics
o Climate modeling, astrophysics simulations
3. Cryptography and Cryptocurrency Mining
o Hash calculations for blockchain
o Encryption/decryption algorithms
4. Machine Learning and Artificial Intelligence
o Neural network training and inference
o Computer vision, natural language processing
5. Big Data Analytics
o Real-time data processing
o Large-scale graph analytics
6. Financial Modeling and Risk Analysis
o Monte Carlo simulations
o High-frequency trading algorithms
7. Medical Imaging
o CT scan reconstruction

Top Notch Fundamentals Unit 7 Assessment
0% (4)
Top Notch Fundamentals Unit 7 Assessment
6 pages
MEM23004 - Task 1 - Knowledge Questions #1
No ratings yet
MEM23004 - Task 1 - Knowledge Questions #1
31 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Introduction Mod1
No ratings yet
Introduction Mod1
120 pages
Unit -01 easid
No ratings yet
Unit -01 easid
18 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Architecture of Parallel Computing
No ratings yet
Architecture of Parallel Computing
6 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
downloadfile (3)
No ratings yet
downloadfile (3)
16 pages
Chapter 02 - Asynchronous and Parallel Programming in .NET
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in .NET
55 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
Evolution Computer1
No ratings yet
Evolution Computer1
17 pages
UNIT1
No ratings yet
UNIT1
11 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Computer Architecture
No ratings yet
Computer Architecture
23 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
Parallel Scalable Models
No ratings yet
Parallel Scalable Models
61 pages
Module 4- Architecture
No ratings yet
Module 4- Architecture
22 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
L2 Computer Architecture (1)_075755
No ratings yet
L2 Computer Architecture (1)_075755
12 pages
PDA_2
No ratings yet
PDA_2
105 pages
Unit 1
No ratings yet
Unit 1
22 pages
Computer Architecture
No ratings yet
Computer Architecture
6 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
Micro 3
No ratings yet
Micro 3
4 pages
COA - Unit 4
No ratings yet
COA - Unit 4
84 pages
Introduction To Computer Architecture
No ratings yet
Introduction To Computer Architecture
2 pages
01. Stored Program Concept (1)
No ratings yet
01. Stored Program Concept (1)
27 pages
StudM1p1Parallel Computer Modelsppt1shared
No ratings yet
StudM1p1Parallel Computer Modelsppt1shared
107 pages
Parallel Computing Terminology
No ratings yet
Parallel Computing Terminology
11 pages
Lec1 Introduction to Parallel Computing (2)
No ratings yet
Lec1 Introduction to Parallel Computing (2)
40 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
CC UNIT-1 Material
No ratings yet
CC UNIT-1 Material
26 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Lect6-SPC_ Flynns
No ratings yet
Lect6-SPC_ Flynns
16 pages
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
No ratings yet
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
48 pages
DS1822 -Parallel Computing - Unit 1
No ratings yet
DS1822 -Parallel Computing - Unit 1
23 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
Model
No ratings yet
Model
14 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Flynns
No ratings yet
Flynns
41 pages
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
Basic of Morality VALED
No ratings yet
Basic of Morality VALED
4 pages
Short Stories Unit-Tell Tale Heart
50% (2)
Short Stories Unit-Tell Tale Heart
24 pages
Xlstat 14
No ratings yet
Xlstat 14
3 pages
Maam Marina Grade 1
No ratings yet
Maam Marina Grade 1
12 pages
SAP HANA Master Guide En
No ratings yet
SAP HANA Master Guide En
80 pages
Meeting The Educational Philosophers: Self-Learning Task # 3
No ratings yet
Meeting The Educational Philosophers: Self-Learning Task # 3
3 pages
Arpit Ghura Resume
No ratings yet
Arpit Ghura Resume
1 page
Yu 2022 Plasma Sources Sci. Technol. 31 035012
No ratings yet
Yu 2022 Plasma Sources Sci. Technol. 31 035012
15 pages
Introduction To Human Resource Development: Werner & Desimone (2006) 1
No ratings yet
Introduction To Human Resource Development: Werner & Desimone (2006) 1
51 pages
ASSIGNENT[INDIGENOUS LANGUAGES]
No ratings yet
ASSIGNENT[INDIGENOUS LANGUAGES]
36 pages
Math 4 - Q3 - Week 1 - Module 1 - Parallel, Intersecting & Perpendicular Lines - For Reproduction
100% (1)
Math 4 - Q3 - Week 1 - Module 1 - Parallel, Intersecting & Perpendicular Lines - For Reproduction
13 pages
The BSK Voice, Vol. 9, No. 1 (Spring 2012)
No ratings yet
The BSK Voice, Vol. 9, No. 1 (Spring 2012)
6 pages
STP Segmentation Targeting Positioning
No ratings yet
STP Segmentation Targeting Positioning
58 pages
Counseling Midterm Reviewer
No ratings yet
Counseling Midterm Reviewer
7 pages
OJT Pointers
No ratings yet
OJT Pointers
2 pages
Basic Details: Guru Jambheshwar University of Science & Technology
No ratings yet
Basic Details: Guru Jambheshwar University of Science & Technology
3 pages
Social Mobilization
100% (1)
Social Mobilization
6 pages
High School Unit Plan
No ratings yet
High School Unit Plan
17 pages
Summer Project Guideline
No ratings yet
Summer Project Guideline
10 pages
Cwts Student Module 2 PDF
No ratings yet
Cwts Student Module 2 PDF
31 pages
Impact of Spiritual Belief On Hope and Optimism Among Shrines and Non Shrines Visitors
No ratings yet
Impact of Spiritual Belief On Hope and Optimism Among Shrines and Non Shrines Visitors
61 pages
Final Project
No ratings yet
Final Project
1 page
To The Emotions: Lecture 5:the Science of Emotion
No ratings yet
To The Emotions: Lecture 5:the Science of Emotion
31 pages
Management Information Systems: Jane P. Laudon Kenneth C. Laudon Prentice-Hall Inc
No ratings yet
Management Information Systems: Jane P. Laudon Kenneth C. Laudon Prentice-Hall Inc
29 pages
Bermuda Triangle Rules
No ratings yet
Bermuda Triangle Rules
5 pages
preview-9780822375036_A35675364
No ratings yet
preview-9780822375036_A35675364
32 pages
Ver3 Math10 Students q1 Mod2 Arithmetic Sequence v3b
No ratings yet
Ver3 Math10 Students q1 Mod2 Arithmetic Sequence v3b
36 pages
Characteristics of Manager
No ratings yet
Characteristics of Manager
25 pages

1. GPU Unit-1

Uploaded by

1. GPU Unit-1

Uploaded by

Traditional Computing and GPU History: BTech Exam Notes

1. Review of Traditional Computing Model

Implications and Modern Relevance:

4. Brief History of GPU Computing

GPU vs. CPU Architecture (Detailed Comparison):

You might also like