0% found this document useful (0 votes)
13 views18 pages

Lecture-1-02.01.2025

CS528 is a course on High Performance Computing covering topics such as parallel processing concepts, memory hierarchy designs, cache optimization techniques, and GPU architectures. The course includes assessments like class tests and mid-semester exams, with a total weightage of 100%. Additional information includes course-related communication details and a focus on memory systems and their classifications.

Uploaded by

Munesh Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views18 pages

Lecture-1-02.01.2025

CS528 is a course on High Performance Computing covering topics such as parallel processing concepts, memory hierarchy designs, cache optimization techniques, and GPU architectures. The course includes assessments like class tests and mid-semester exams, with a total weightage of 100%. Additional information includes course-related communication details and a focus on memory systems and their classifications.

Uploaded by

Munesh Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

CS528: HIGH PERFORMANCE

COMPUTING

Monday: 17:00 – 17:55


Tuesday: 16:00 – 16:55
Wednesday: 15:00 – 15:55
Thursday: 14:00 – 14:55
Room No: 5G2
CS528 High Performance Computing 3-0-0-6
Parallel Processing Concepts; Levels and model of parallelism: instruction, transaction, task, thread, memory,
function, data flow models, demand-driven computation

Memory Hierarchy Designs: Cache Recap, Virtual Memory Review, Address Translation

Cache Optimization Techniques- Improving Hit Time, Reducing Miss Penalty, Miss rate reduction techniques,
Software and Hardware Prefetching Techniques to reduce Miss Penalty

Instruction Level Parallelism: Basics, Dependences and Hazards, Dynamic Scheduling, Branch Prediction,
Hardware Speculation

Data level Parallelism: VLIW, SIMD, Data Alignment and Reordering

Thread Level Parallelism: Software and Hardware Multithreading, Block Multithreading, Interleaved
Multithreading and Simultaneous Multithreading

Memory Centric Computing: Processing Near Memory, Emerging Memory Technology, Flash Memory, Solid State
Drives

GPU: Architectures and Programming


Texts
◦1. J. L. Hennessy and D. A. Patterson, Computer Architecture: A
Quantitative Approach, 5th Edition, Morgan Kaufmann, 2012.
Assessments

Assessment Marks Weightage


Class Test – 1 20 10%
Mid-Semester 30 30%
Class Test -2 20 10%
End-Semester 50 50%

This is tentative, any changes will be informed


Additional info
◦Course related email’s subject prefix by CS528:
- Email ID: [email protected]
Processing Storage of Movement Controlling
the data the data of the data
The CPU

Main Memory

Input/Output (I/O)

System Interconnection
ARITHMETIC AND LOGIC CONTROL UNIT – TAKES REGISTERS – CONTAIN
UNIT – DATA DATA, SEND IT TO DATA USED FOR
PROCESSING FUNCTIONS PROCESSING AND SEND EXECUTION
OF A COMPUTER IT TO THE OUTPUT
Motherboard
Memory System

◦ A memory system is a hierarchy of storage devices with


different capacities, costs, and access times.
◦ CPU registers hold the most frequently used data.
◦ Small, fast cache memories nearby the CPU act as staging areas
for a subset of the data and instructions stored in the relatively
slow main memory.
◦ The main memory stages data stored on large, slow disks,
which in turn often serve as staging areas for data stored on the
disks or tapes of other machines connected by networks.
Memory System
◦ Memory is one of the most important functional units of a
computer.
– Used to store both instructions and data.
– Stores as bits (0’s and 1’s), usually organized in terms of
bytes.
◦ How are the data stored in memory accessed?
◦ Every memory location has a unique address.
◦ A memory is said to be byte addressable if every byte of data has a unique
address.
◦ Some memory systems are word addressable also (every addressed
locations consists of multiple bytes, say, 32 bits or 4 bytes).
Processor-Connect-Memory

◦ Address bus provides the


address of the memory
location to be accessed.
◦ Unidirectional
◦ Data bus transfers the data
read from memory, or data
to be written into memory.
◦ Bidirectional.
◦ Control bus provides
various signals
Memory Module

◦ Maximum number of memory


locations = 2n
◦ Number of bits stored in every
addressable location = m
◦ Signals
◦ RD/WR’ = (Read=1, Write =0)
◦ CS’ = 0 is enable , otherwise data
bus is in high impedance state
Classification of Memory Systems
◦ Volatile v/s Non-volatile
◦ Volatile – Example- CMOS static/dynamic memory
◦ Non-volatile – Example – ROM, Magnetic Disk, CD/DVD, SSD, Flash Drive, Resistive
Memory
◦ Random-access v/s Direct Sequential Access
◦ Random access – RAM and ROM – here the read/write time is independent of the
memory location being accessed
◦ Sequential access – Magnetic Tape – data is access sequentially in a particular order
◦ Direct or Semi-random access – Magnetic Disk- access can be made directly to the
track after which the access will be sequentially
◦ Read only versus Random Access
◦ Read Only Memory – ROM, PROM, EPROM, EEPROM
◦ RAM – SRAM (retained as long as power is ON), DRAM (periodic refresh- tiny
capacitors)
Access Time, Latency and Bandwidth
◦Terminologies used to measure speed of the memory system.
◦ Memory Access Time: Time between initiation of an operation (Read
or Write) and completion of that operation.
◦ Latency: Initial delay from the initiation of an operation to the time
the first data is available.
◦ Bandwidth: Maximum speed of data transfer in bytes per second.
◦In modern memory organizations, every read request reads a
block of words into some high-speed registers (LATENCY),
from where data are supplied to the processor one by one
(ACCESS TIME).
Design Issue of Memory System

◦The most important issue is


to bridge the processor-
memory gap that has been
widening with every passing
year.
◦ Advancements in memory
technology are unable to cope
with faster advancements in
processor technology.
Overcoming Design Issues
Using Cache – increases the effective speed of
memory system

• A fast memory (possibly organized in several levels)


that sits between processor and main memory.
• Faster than main memory and relatively small.
• Frequently accessed data and instructions are
stored here.
• Cache memory makes use of the fast SRAM
technology.

Using Virtual memory – increases the effective


size of the memory system

Technique used by the operating system to provide an illusion of very large memory to the processor.
• Program and data are actually stored on secondary memory that is much larger.
• Transfer parts of program and data from secondary memory to main memory only when needed.
Thank you

You might also like