0% found this document useful (0 votes)
17 views

Onur Comparch Fall2021 Lecture1 Intro Afterlecture

The document is an introductory lecture on computer architecture by Prof. Onur Mutlu from ETH Zürich, outlining his background, research interests, and current research directions. Key topics include secure and reliable architectures, energy efficiency, low-latency systems, and architectures for AI and bioinformatics. The lecture emphasizes the importance of co-design across various levels of computer architecture to achieve optimal performance and energy efficiency.

Uploaded by

jonathanj302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Onur Comparch Fall2021 Lecture1 Intro Afterlecture

The document is an introductory lecture on computer architecture by Prof. Onur Mutlu from ETH Zürich, outlining his background, research interests, and current research directions. Key topics include secure and reliable architectures, energy efficiency, low-latency systems, and architectures for AI and bioinformatics. The lecture emphasizes the importance of co-design across various levels of computer architecture to achieve optimal performance and energy efficiency.

Uploaded by

jonathanj302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 300

Computer Architecture

Lecture 1: Introduction and Basics

Prof. Onur Mutlu


ETH Zürich
Fall 2021
30 September 2021
Brief Self Introduction
n Onur Mutlu
q Full Professor @ ETH Zurich ITET (INFK), since September 2015
q Strecker Professor @ Carnegie Mellon University ECE/CS, 2009-2016, 2016-…
q PhD from UT-Austin, worked at Google, VMware, Microsoft Research, Intel, AMD
q https://people.inf.ethz.ch/omutlu/
q [email protected] (Best way to reach me)
q https://people.inf.ethz.ch/omutlu/projects.htm

n Research and Teaching in:


q Computer architecture, computer systems, hardware security, bioinformatics
q Memory and storage systems
q Hardware security, safety, predictability
q Fault tolerance
q Hardware/software cooperation
q Architectures for bioinformatics, health, medicine
q …
2
Current Research Mission
Computer architecture, HW/SW, systems, bioinformatics, security

Hybrid Main Memory

Heterogeneous Persistent Memory/Storage


Processors and
Accelerators

Graphics and Vision Processing

Build fundamentally better architectures


Four Key Current Directions
n Fundamentally Secure/Reliable/Safe Architectures

n Fundamentally Energy-Efficient Architectures


q Memory-centric (Data-centric) Architectures

n Fundamentally Low-Latency and Predictable Architectures

n Architectures for AI/ML, Genomics, Medicine, Health

4
The Transformation Hierarchy

Problem
Algorithm
Program/Language
System Software
Computer Architecture SW/HW Interface Computer Architecture
(expanded view) (narrow view)
Micro-architecture
Logic
Devices
Electrons

5
Axiom
To achieve the highest energy efficiency and performance:

we must take the expanded view


of computer architecture

Problem
Algorithm
Program/Language
System Software Co-design across the hierarchy:
SW/HW Interface Algorithms to devices
Micro-architecture
Logic Specialize as much as possible
Devices within the design goals
Electrons
6
Current Research Mission & Major Topics
Build fundamentally better architectures
n Data-centric arch. for low energy & high perf.
Problem q Proc. in Mem/DRAM, NVM, unified mem/storage
Algorithm n Low-latency & predictable architectures
Program/Language q Low-latency, low-energy yet low-cost memory
System Software q QoS-aware and predictable memory systems
SW/HW Interface
Micro-architecture
n Fundamentally secure/reliable/safe arch.
q Tolerating all bit flips; patchable HW; secure mem
Logic
Devices n Architectures for ML/AI/Genomics/Graph/Med
Electrons q Algorithm/arch./logic co-design; full heterogeneity
Broad research n Data-driven and data-aware architectures
spanning apps, systems, logic
with architecture at the center q ML/AI-driven architectural controllers and design
q Expressive memory and expressive systems
7
SAFARI Research Group

https://safari.ethz.ch
8
Onur Mutlu’s SAFARI Research Group
Computer architecture, HW/SW, systems, bioinformatics, security, memory
https://safari.ethz.ch/safari-newsletter-april-2020/

38+ Researchers

https://safari.ethz.ch
SAFARI Newsletter January 2021 Edition
n https://safari.ethz.ch/safari-newsletter-january-2021/

10
SAFARI PhD and Post-Doc Alumni
n https://safari.ethz.ch/safari-alumni/
n Nastaran Hajinazar (ETH Zurich)
n Gagandeep Singh (ETH Zurich)
n Amirali Boroumand (Stanford Univ)
n Jeremie Kim (ETH Zurich)
n Nandita Vijaykumar (Univ. of Toronto, Assistant Professor)
n Kevin Hsieh (Microsoft Research, Senior Researcher)
n Justin Meza (Facebook)
n Mohammed Alser (ETH Zurich)
n Yixin Luo (Google)
n Kevin Chang (Facebook)
n Rachata Ausavarungnirun (KMUNTB, Assistant Professor)
n Gennady Pekhimenko (Univ. of Toronto, Assistant Professor)
n Vivek Seshadri (Microsoft Research)
n Donghyuk Lee (NVIDIA Research, Senior Researcher)
n Yoongu Kim (Google)
n Lavanya Subramanian (Intel Labs à Facebook)

n Samira Khan (Univ. of Virginia, Assistant Professor)


n Saugata Ghose (Univ. of Illinois, Assistant Professor)
n Jawad Haj-Yahya (Huawei Research Zurich, Principal Researcher)

11
SAFARI Research Group: Introduction and Research

n Onur Mutlu,
"SAFARI Research Group: Introduction & Research"
Talk at ETH Future Computing Laboratory Welcome
Workshop (EFCL), Virtual, 6 July 2021.
[Slides (pptx) (pdf)]

12
A Talk on Impactful Research & Teaching

13
https://www.youtube.com/watch?v=83tlorht7Mc&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=54
Principle: Teaching and Research


Teaching drives Research
Research drives Teaching

14
15
https://www.youtube.com/onurmutlulectures
Online Courses & Lectures
n First Computer Architecture & Digital Design Course
q Digital Design and Computer Architecture
q Spring 2021 Livestream Edition:
https://www.youtube.com/watch?v=LbC0EZY8yw4&list=PL5Q
2soXY2Zi_uej3aY39YB5pfW4SJ7LlN

n Advanced Computer Architecture Course


q Computer Architecture
q Fall 2020 Edition:
https://www.youtube.com/watch?v=c3mPdZA-
Fmc&list=PL5Q2soXY2Zi9xidyIgBxUz7xRPS-wisBN

https://www.youtube.com/onurmutlulectures 16
DDCA (Spring 2021)

n https://safari.ethz.ch/digitaltechnik/
spring2021/doku.php?id=schedule

n https://www.youtube.com/watch?v
=LbC0EZY8yw4&list=PL5Q2soXY2Zi
_uej3aY39YB5pfW4SJ7LlN

n Bachelor’s course
q 2nd semester at ETH Zurich
q Rigorous introduction into “How
Computers Work”
q Digital Design/Logic
q Computer Architecture
q 10 FPGA Lab Assignments

17
Comp Arch (Fall 2020)

n https://safari.ethz.ch/architecture/fall20
20/doku.php?id=schedule

n https://www.youtube.com/watch?v=c3
mPdZA-
Fmc&list=PL5Q2soXY2Zi9xidyIgBxUz7x
RPS-wisBN

n Master’s level course


q Taken by Bachelor’s/Masters/PhD
students
q Cutting-edge research topics +
fundamentals in Computer Architecture
q 5 Simulator-based Lab Assignments
q Potential research exploration
q Many research readings

18
Seminar (Spring’21)
n https://safari.ethz.ch/architecture_semin
ar/spring2021/doku.php?id=schedule

n https://www.youtube.com/watch?v=t3m
93ZpLOyw&list=PL5Q2soXY2Zi_awYdjm
WVIUegsbY7TPGW4

n Critical analysis course


q Taken by Bachelor’s/Masters/PhD
students
q Cutting-edge research topics +
fundamentals in Computer Architecture
q 20+ research papers, presentations,
analyses

19
Hands-On Projects & Seminars Courses
n https://safari.ethz.ch/projects_and_seminars/doku.php

20
Principle: Insight and Ideas

Focus on Insight
Encourage New Ideas

21
Principle: Learning and Scholarship

Focus on
learning and scholarship

22
SAFARI Live Seminars (Past Talks)

https://safari.ethz.ch/safari-seminar-series/
SAFARI Live Seminars (Upcoming Talk)

https://safari.ethz.ch/safari-seminar-series/
Open-Source Artifacts

https://github.com/CMU-SAFARI

25
Open Source Tools: SAFARI GitHub

https://github.com/CMU-SAFARI/ 26
Some Open Source Tools (I)
n Rowhammer – Program to Induce RowHammer Errors
q https://github.com/CMU-SAFARI/rowhammer
n Ramulator – Fast and Extensible DRAM Simulator
q https://github.com/CMU-SAFARI/ramulator
n MemSim – Simple Memory Simulator
q https://github.com/CMU-SAFARI/memsim
n NOCulator – Flexible Network-on-Chip Simulator
q https://github.com/CMU-SAFARI/NOCulator
n SoftMC – FPGA-Based DRAM Testing Infrastructure
q https://github.com/CMU-SAFARI/SoftMC

n Other open-source software from my group


q https://github.com/CMU-SAFARI/

27
Some Open Source Tools (II)
n MQSim – A Fast Modern SSD Simulator
q https://github.com/CMU-SAFARI/MQSim
n Mosaic – GPU Simulator Supporting Concurrent Applications
q https://github.com/CMU-SAFARI/Mosaic
n IMPICA – Processing in 3D-Stacked Memory Simulator
q https://github.com/CMU-SAFARI/IMPICA
n SMLA – Detailed 3D-Stacked Memory Simulator
q https://github.com/CMU-SAFARI/SMLA
n HWASim – Simulator for Heterogeneous CPU-HWA Systems
q https://github.com/CMU-SAFARI/HWASim

n Other open-source software from my group


q https://github.com/CMU-SAFARI/

28
More Open Source Tools (III)

https://github.com/CMU-SAFARI/ 29
30
Papers, Talks, Videos, Artifacts

n All are openly available at

https://people.inf.ethz.ch/omutlu/projects.htm

http://scholar.google.com/citations?user=7XyGUGkAAAAJ&hl=en

https://www.youtube.com/onurmutlulectures

https://github.com/CMU-SAFARI/

31
Principle: Environment of Freedom

Create an environment
that values
free exploration,
openness, collaboration,
hard work, creativity 32
My Suggestions to You
Suggestion to Researchers: Principle: Passion

Follow Your Passion


(Do not get derailed
by naysayers)
Principle: Build Infrastructure

Build Infrastructure to
Enable Your Passion
Principle: Work Hard

Work Hard to
Enable Your Passion
Suggestion to Researchers: Principle: Resilience

Be Resilient
Principle: Learning and Scholarship

Focus on
learning and scholarship
Principle: Learning and Scholarship

The quality of your work


defines your impact
Principle: Good Mindset, Goals & Focus

You can make a


good impact
on the world
Research & Teaching: Some Overview Talks
https://www.youtube.com/onurmutlulectures

n Future Computing Architectures


q https://www.youtube.com/watch?v=kgiZlSOcGFM&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=1

n Enabling In-Memory Computation


q https://www.youtube.com/watch?v=njX_14584Jw&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=16

n Accelerating Genome Analysis


q https://www.youtube.com/watch?v=r7sn41lH-4A&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=41

n Rethinking Memory System Design


q https://www.youtube.com/watch?v=F7xZLNMIY1E&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=3

n Intelligent Architectures for Intelligent Machines


q https://www.youtube.com/watch?v=c6_LgzuNdkw&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=25

n The Story of RowHammer


q https://www.youtube.com/watch?v=sgd7PHQQ1AI&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=39

41
An Interview on Research and Education

n Computing Research and Education (@ ISCA 2019)


q https://www.youtube.com/watch?v=8ffSEKZhmvo&list=PL5Q2
soXY2Zi_4oP9LdL3cc8G6NIjD2Ydz

n Maurice Wilkes Award Speech (10 minutes)


q https://www.youtube.com/watch?v=tcQ3zZ3JpuA&list=PL5Q2
soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=15

42
More Thoughts and Suggestions
n Onur Mutlu,
"Some Reflections (on DRAM)"
Award Speech for ACM SIGARCH Maurice Wilkes Award, at the ISCA Awards
Ceremony, Phoenix, AZ, USA, 25 June 2019.
[Slides (pptx) (pdf)]
[Video of Award Acceptance Speech (Youtube; 10 minutes) (Youku; 13 minutes)]
[Video of Interview after Award Acceptance (Youtube; 1 hour 6 minutes) (Youku;
1 hour 6 minutes)]
[News Article on "ACM SIGARCH Maurice Wilkes Award goes to Prof. Onur Mutlu"]

n Onur Mutlu,
"How to Build an Impactful Research Group"
57th Design Automation Conference Early Career Workshop (DAC), Virtual,
19 July 2020.
[Slides (pptx) (pdf)]
More Thoughts and Suggestions (II)
n Onur Mutlu,
"Computer Architecture: Why Is It So Important and Exciting Today?"
Invited Lecture at Izmir Institute of Technology (IYTE) , Virtual, 16 October
2020.
[Slides (pptx) (pdf)]
[Talk Video (2 hours 12 minutes)]

n Onur Mutlu,
"Applying to Graduate School & Doing Impactful Research"
Invited Panel Talk at the 3rd Undergraduate Mentoring Workshop, held with the
48th International Symposium on Computer Architecture (ISCA), Virtual, 18 June
2021.
[Slides (pptx) (pdf)]
[Talk Video (50 minutes)]

44
A Talk on Impactful Research & Teaching

45
https://www.youtube.com/watch?v=83tlorht7Mc&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=54
Required Reading

https://safari.ethz.ch/architecture/fall2021/lib/exe/fetch.php?media=youandyourresearch.pdf

46
How to Approach This Course

“Formative Experience”

47
How to Approach This Course

“High investment,
high return”

48
How to Approach This Course

“Recorded lectures
allowed me to go over
the lectures when
necessary”
49
How to Approach This Course

“YouTube allows me to
watch the lectures on
my TV”

50
How to Approach This Course

“The lecturer is very


responsive to
questions and remarks
from students”
51
How to Approach This Course

“Perhaps even better


than in-person classes
as questions can be
asked asynchronously”
52
How to Approach This Course

“Easy to understand
course format with
homework, labs, and
lectures”
53
How to Approach This Course

“Paper reviews +
assignments + labs,
a really great plan to
learn in a
comprehensive way”
54
How to Approach This Course

“the course was


fantastic and I would
do it again at any
time”
55
How to Approach This Course

Learning experience
Long-term tradeoff
analysis
Critical thinking &
decision making
56
How to Approach This Course

Concepts & Ideas


Fundamentals
Cutting-edge
Hands-on learning
57
How to Approach This Course

Your mindset
will determine
what you
get out of the course
58
Required Reading on Mindset & More

https://safari.ethz.ch/architecture/fall2021/lib/exe/fetch.php?media=youandyourresearch.pdf

59
Required Reading on Mindset & More

https://safari.ethz.ch/architecture/fall2021/lib/exe/fetch.php?media=youandyourresearch.pdf
60
Why Study Computer
Architecture?

61
Computer Architecture
n is the science and art of designing computing platforms
(hardware, interface, system SW, and programming model)

n to achieve a set of design goals


q E.g., highest performance on earth on workloads X, Y, Z
q E.g., longest battery life at a form factor that fits in your
pocket with cost < $$$ CHF
q E.g., best average performance across all known workloads at
the best performance/cost ratio
q …

q Designing a supercomputer is different from designing a


smartphone à But, many fundamental principles are similar
62
Different Platforms, Different Goals

63
Source: http://www.sia-online.org (semiconductor industry association)
Different Platforms, Different Goals

Source: https://iq.intel.com/5-awesome-uses-for-drone-technology/
64
Different Platforms, Different Goals

Source: https://taxistartup.com/wp-content/uploads/2015/03/UK-Self-Driving-Cars.jpg 65
Different Platforms, Different Goals

Source: http://sm.pcmag.com/pcmag_uk/photo/g/google-self-driving-car-the-guts/google-self-driving-car-the-guts_dwx8.jpg 66
Different Platforms, Different Goals

Source: http://datacentervoice.com/wp-content/uploads/2015/10/data-center.jpg
67
Different Platforms, Different Goals

Source: https://fossbytes.com/wp-content/uploads/2015/06/Supercomputer-TIANHE2-china.jpg 68
Different Platforms, Different Goals

Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, ISCA 2017.

69
Different Platforms, Different Goals

250 TFLOPS per chip in 2021


New ML applications (vs. TPU3): vs 90 TFLOPS in TPU3
• Computer vision
• Natural Language Processing (NLP)
• Recommender system
• Reinforcement learning that plays Go 1 ExaFLOPS per board
https://spectrum.ieee.org/tech-talk/computing/hardware/heres-how-googles-tpu-v4-ai-chip-stacked-up-in-training-tests
70
Different Platforms, Different Goals
n ML accelerator: 260 mm2, 6 billion transistors,
600 GFLOPS GPU, 12 ARM 2.2 GHz CPUs.
n Two redundant chips for better safety.

https://youtu.be/Ucp0TTmvqOE?t=4236 71
Different Platforms, Different Goals

n The largest ML
accelerator chip (2021)

n 850,000 cores

Cerebras WSE-2 Largest GPU


2.6 Trillion transistors 54.2 Billion transistors
46,225 mm2 826 mm2
NVIDIA Ampere GA100
https://www.anandtech.com/show/14758/hot-chips-31-live-blogs-cerebras-wafer-scale-deep-learning
https://www.cerebras.net/cerebras-wafer-scale-engine-why-we-need-big-chips-for-deep-learning/
72
Different Platforms, Different Goals
Mohammed Alser, Zülal Bingöl, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, Onur Mutlu
“Accelerating Genome Analysis: A Primer on an Ongoing Journey” IEEE Micro, August 2020.

MinION from ONT

SmidgION from ONT


73
73
Different Platforms, Different Goals
Main Memory

DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM


Chip Chip Chip Chip Chip Chip Chip Chip
PIM-enabled
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
Chip Chip Chip Chip Chip Chip Chip Chip memory
x2
Host
CPU 0

DRAM
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip CPU 1
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM-enabled
x10 memory
PIM-enabled Memory
PIM-enabled
Main Memory
memory
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
Chip Chip Chip Chip Chip Chip Chip Chip

DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM


Chip Chip Chip Chip Chip Chip Chip Chip

x2
Host
CPU 1
DRAM
CPU 0
PIM PIM PIM PIM PIM PIM PIM PIM
Chip Chip Chip Chip Chip Chip Chip Chip

PIM PIM PIM PIM PIM PIM PIM PIM


Chip Chip Chip Chip Chip Chip Chip Chip
x10
PIM-enabled Memory

PIM-enabled
memory

74
https://arxiv.org/pdf/2105.03814.pdf
What is Computer Architecture?

n The science and art of designing, selecting, and


interconnecting hardware components and designing the
hardware/software interface to create a computing system
that meets functional, performance, energy consumption,
cost, and other specific goals.

75
The Transformation Hierarchy

Problem
Algorithm
Program/Language
System Software
Computer Architecture SW/HW Interface Computer Architecture
(expanded view) (narrow view)
Micro-architecture
Logic
Devices
Electrons

76
Why Study Computer Architecture?
n Enable better systems: make computers faster, cheaper,
smaller, more reliable, …
q By exploiting advances and changes in underlying technology/circuits

n Enable new applications


q Life-like 3D visualization 20 years ago? Virtual reality?
q Self-driving cars?
q Personalized genomics? Personalized medicine?

n Enable better solutions to problems


q Software innovation is built on trends and changes in computer architecture
n > 50% performance improvement per year has enabled this innovation

n Understand why computers work the way they do


77
Computer Architecture Today (I)
n Today is a very exciting time to study computer architecture

n Industry is in a large paradigm shift (to novel architectures)


– many different potential system designs possible

n Many difficult problems motivating and caused by the shift


q Huge hunger for data and new data-intensive applications
q Power/energy/thermal constraints
q Complexity of design
q Difficulties in technology scaling
q Memory bottleneck
q Reliability problems
q Programmability problems
q Security and privacy issues

n No clear, definitive answers to these problems


78
Computer Architecture Today (II)
n These problems affect all parts of the computing stack – if
we do not change the way we design systems
Problem
Many new demands
from the top Algorithm
(Look Up) Program/Language User Fast changing
demands and
personalities
Runtime System of users
(Look Up)
(VM, OS, MM)
ISA
Microarchitecture
Many new issues Logic
at the bottom Circuits
(Look Down)
Electrons

n No clear, definitive answers to these problems


79
Computer Architecture Today (III)
n Computing landscape is very different from 10-20 years ago
n Both UP (software and humanity trends) and DOWN
(technologies and their issues), FORWARD and BACKWARD,
and the resulting requirements and constraints

Hybrid Main Memory

Heterogeneous Persistent Memory/Storage


Processors and
Accelerators Every component and its
interfaces, as well as
entire system designs
are being re-examined
General Purpose GPUs

80
Axiom
To achieve the highest energy efficiency and performance:

we must take the expanded view


of computer architecture

Problem
Algorithm
Program/Language
System Software Co-design across the hierarchy:
SW/HW Interface Algorithms to devices
Micro-architecture
Logic Specialize as much as possible
Devices within the design goals
Electrons
81
Historical: Opportunities at the Bottom

https://en.wikipedia.org/wiki/There%27s_Plenty_of_Room_at_the_Bottom 82
Historical: Opportunities at the Bottom (II)

https://en.wikipedia.org/wiki/There%27s_Plenty_of_Room_at_the_Bottom 83
Historical: Opportunities at the Top

https://science.sciencemag.org/content/368/6495/eaam9744 84
Axiom, Revisited

There is plenty of room both at the top and at the bottom

but much more so

when you

communicate well between and optimize across

the top and the bottom

85
Hence the Expanded View

Problem
Algorithm
Program/Language
System Software
Computer Architecture SW/HW Interface
(expanded view)
Micro-architecture
Logic
Devices
Electrons

86
Some Cross-Layer Design Examples
(Foreshadowing)

87
EDEN: Data-Aware Efficient DNN Inference
n Skanda Koppula, Lois Orosa, A. Giray Yaglikci, Roknoddin Azizi, Taha Shahroodi,
Konstantinos Kanellopoulos, and Onur Mutlu,
"EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network
Inference Using Approximate DRAM"
Proceedings of the 52nd International Symposium on Microarchitecture (MICRO),
Columbus, OH, USA, October 2019.
[Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[Poster (pptx) (pdf)]
[Lightning Talk Video (90 seconds)]
[Full Talk Lecture (38 minutes)]

88
SMASH: SW/HW Indexing Acceleration
n Konstantinos Kanellopoulos, Nandita Vijaykumar, Christina Giannoula,
Roknoddin Azizi, Skanda Koppula, Nika Mansouri Ghiasi, Taha Shahroodi, Juan
Gomez-Luna, and Onur Mutlu,
"SMASH: Co-designing Software Compression and Hardware-
Accelerated Indexing for Efficient Sparse Matrix Operations"
Proceedings of the 52nd International Symposium on
Microarchitecture (MICRO), Columbus, OH, USA, October 2019.
[Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[Poster (pptx) (pdf)]
[Lightning Talk Video (90 seconds)]
[Full Talk Lecture (30 minutes)]

89
GenASM: HW/SW Approximate String Matching Accelerator
n Damla Senol Cali, Gurpreet S. Kalsi, Zulal Bingol, Can Firtina, Lavanya Subramanian, Jeremie S.
Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand,
Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu,
"GenASM: A High-Performance, Low-Power Approximate String Matching
Acceleration Framework for Genome Sequence Analysis"
Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual,
October 2020.
[Lighting Talk Video (1.5 minutes)]
[Lightning Talk Slides (pptx) (pdf)]
[Talk Video (18 minutes)]
[Slides (pptx) (pdf)]

90
SW/HW Climate Modeling Accelerator
n Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner, Juan
Gómez-Luna, Sander Stuijk, Onur Mutlu, and Henk Corporaal,
"NERO: A Near High-Bandwidth Memory Stencil Accelerator for
Weather Prediction Modeling"
Proceedings of the 30th International Conference on Field-Programmable Logic
and Applications (FPL), Gothenburg, Sweden, September 2020.
[Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[Talk Video (23 minutes)]
Nominated for the Stamatis Vassiliadis Memorial Award.

91
HW/SW Time Series Analysis Accelerator
n Ivan Fernandez, Ricardo Quislant, Christina Giannoula, Mohammed Alser, Juan
Gómez-Luna, Eladio Gutiérrez, Oscar Plata, and Onur Mutlu,
"NATSA: A Near-Data Processing Accelerator for Time Series Analysis"
Proceedings of the 38th IEEE International Conference on Computer
Design (ICCD), Virtual, October 2020.
[Slides (pptx) (pdf)]
[Talk Video (10 minutes)]
[Source Code]

92
FPGA-based Processing Near Memory
n Gagandeep Singh, Mohammed Alser, Damla Senol Cali, Dionysios
Diamantopoulos, Juan Gómez-Luna, Henk Corporaal, and Onur Mutlu,
"FPGA-based Near-Memory Acceleration of Modern Data-Intensive
Applications"
IEEE Micro (IEEE MICRO), 2021.

93
Accelerating Genome Analysis
n Mohammed Alser, Zulal Bingol, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can
Alkan, and Onur Mutlu,
"Accelerating Genome Analysis: A Primer on an Ongoing Journey"
IEEE Micro (IEEE MICRO), Vol. 40, No. 5, pages 65-75, September/October 2020.
[Slides (pptx)(pdf)]
[Talk Video (1 hour 2 minutes)]

94
Graph Processing Accelerator w/ PIM
n Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu,
and Kiyoung Choi,
"A Scalable Processing-in-Memory Accelerator for
Parallel Graph Processing"
Proceedings of the 42nd International Symposium on
Computer Architecture (ISCA), Portland, OR, June 2015.
[Slides (pdf)] [Lightning Session Slides (pdf)]

95
Processing in Memory for Mobile Workloads
n Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata
Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki
Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu,
"Google Workloads for Consumer Devices: Mitigating Data
Movement Bottlenecks"
Proceedings of the 23rd International Conference on Architectural
Support for Programming Languages and Operating
Systems (ASPLOS), Williamsburg, VA, USA, March 2018.

96
Accelerating Linked Data Structures
n Kevin Hsieh, Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali
Boroumand, Saugata Ghose, and Onur Mutlu,
"Accelerating Pointer Chasing in 3D-Stacked Memory:
Challenges, Mechanisms, Evaluation"
Proceedings of the 34th IEEE International Conference on Computer
Design (ICCD), Phoenix, AZ, USA, October 2016.

97
Expressive (Memory) Interfaces
n Nandita Vijaykumar, Abhilasha Jain, Diptesh Majumdar, Kevin Hsieh, Gennady
Pekhimenko, Eiman Ebrahimi, Nastaran Hajinazar, Phillip B. Gibbons and Onur Mutlu,
"A Case for Richer Cross-layer Abstractions: Bridging the Semantic Gap
with Expressive Memory"
Proceedings of the 45th International Symposium on Computer Architecture (ISCA),
Los Angeles, CA, USA, June 2018.
[Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)]
[Lightning Talk Video]

98
One Problem: Limited SW/HW Communication

99
A Solution: More Expressive Interfaces

100
X-MeM Aids Many Optimizations
Expressive (Memory) Interfaces for GPUs
n Nandita Vijaykumar, Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons and Onur Mutlu,
"The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express
Data Locality in GPUs"
Proceedings of the 45th International Symposium on Computer Architecture (ISCA),
Los Angeles, CA, USA, June 2018.
[Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)]
[Lightning Talk Video]

102
Heterogeneous-Reliability Memory
n Yixin Luo, Sriram Govindan, Bikash Sharma, Mark Santaniello, Justin Meza, Aman
Kansal, Jie Liu, Badriddine Khessib, Kushagra Vaid, and Onur Mutlu,
"Characterizing Application Memory Error Vulnerability to Optimize
Data Center Cost via Heterogeneous-Reliability Memory"
Proceedings of the 44th Annual IEEE/IFIP International Conference on
Dependable Systems and Networks (DSN), Atlanta, GA, June 2014. [Summary]
[Slides (pptx) (pdf)] [Coverage on ZDNet]

103
Exploiting Memory Error Tolerance
with Hybrid Memory Systems
Memory error vulnerability

Vulnerable Tolerant
data data

Reliable memory Low-cost memory

• ECCOn Microsoft’s Web •Search


Vulnerable
protected NoECCworkload
or Tolerant
Parity
Reduces server hardware
data
• Well-tested chips cost by 4.7 % data
• Less-tested chips
Achieves single server availability target of 99.90 %
App/Data A App/Data B App/Data C
Heterogeneous-Reliability Memory [DSN 2014]
104
Heterogeneous-Reliability Memory
App 1 App 1 App 2 App 2 App 3 App 3
data A data B data A data B data A data B

Step 1: Characterize and classify


application memory error tolerance
App 1 App 2 App 2
App 1 App 3 App 3
data A data A data B
data B data A data B
Vulnerable Tolerant
Step 2: Map application data to the HRM system
enabled by SW/HW cooperative solutions
Reliable Unreliable
Reliable Parity memory
Low-cost memory
memory + software recovery (Par+R)
105
Rethinking Virtual Memory
n Nastaran Hajinazar, Pratyush Patel, Minesh Patel, Konstantinos Kanellopoulos, Saugata
Ghose, Rachata Ausavarungnirun, Geraldo Francisco de Oliveira Jr., Jonathan Appavoo,
Vivek Seshadri, and Onur Mutlu,
"The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual
Memory Framework"
Proceedings of the 47th International Symposium on Computer Architecture (ISCA),
Valencia, Spain, June 2020.
[Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[ARM Research Summit Poster (pptx) (pdf)]
[Talk Video (26 minutes)]
[Lightning Talk Video (3 minutes)]

106
Many Interesting Things
Are Happening Today
in Computer Architecture

107
Many Interesting Things
Are Happening Today
in Computer Architecture

Performance
and
Energy Efficiency
108
Intel Optane Persistent Memory (2019)

n Non-volatile main memory


n Based on 3D-XPoint Technology

109
https://www.storagereview.com/intel_optane_dc_persistent_memory_module_pmm
PCM as Main Memory: Idea in 2009
n Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger,
"Architecting Phase Change Memory as a Scalable DRAM
Alternative"
Proceedings of the 36th International Symposium on Computer
Architecture (ISCA), pages 2-13, Austin, TX, June 2009. Slides
(pdf)

110
PCM as Main Memory: Idea in 2009
n Benjamin C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao,
Engin Ipek, Onur Mutlu, and Doug Burger,
"Phase Change Technology and the Future of Main Memory"
IEEE Micro, Special Issue: Micro's Top Picks from 2009 Computer
Architecture Conferences (MICRO TOP PICKS), Vol. 30, No. 1,
pages 60-70, January/February 2010.

111
Cerebras’s Wafer Scale Engine (2019)

n The largest ML
accelerator chip

n 400,000 cores

Cerebras WSE Largest GPU


1.2 Trillion transistors 21.1 Billion transistors
46,225 mm2 815 mm2
NVIDIA TITAN V
https://www.anandtech.com/show/14758/hot-chips-31-live-blogs-cerebras-wafer-scale-deep-learning
112
https://www.cerebras.net/cerebras-wafer-scale-engine-why-we-need-big-chips-for-deep-learning/
Cerebras’s Wafer Scale Engine-2 (2021)

n The largest ML
accelerator chip (2021)

n 850,000 cores

Cerebras WSE-2 Largest GPU


2.6 Trillion transistors 54.2 Billion transistors
46,225 mm2 826 mm2
NVIDIA Ampere GA100
https://www.anandtech.com/show/14758/hot-chips-31-live-blogs-cerebras-wafer-scale-deep-learning
https://www.cerebras.net/cerebras-wafer-scale-engine-why-we-need-big-chips-for-deep-learning/
113
UPMEM Processing-in-DRAM Engine (2019)
n Processing in DRAM Engine
n Includes standard DIMM modules, with a large
number of DPU processors combined with DRAM chips.

n Replaces standard DIMMs


q DDR4 R-DIMM modules
n 8GB+128 DPUs (16 PIM chips)
n Standard 2x-nm DRAM process
q Large amounts of compute & memory bandwidth

https://www.anandtech.com/show/14750/hot-chips-31-analysis-inmemory-processing-by-upmem
114
https://www.upmem.com/video-upmem-presenting-its-true-processing-in-memory-solution-hot-chips-2019/
UPMEM Memory Modules
• E19: 8 chips DIMM (1 rank). DPUs @ 267 MHz
• P21: 16 chips DIMM (2 ranks). DPUs @ 350 MHz

www.upmem.com 115
2,560-DPU Processing-in-Memory System
Main Memory

DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM


Chip Chip Chip Chip Chip Chip Chip Chip
PIM-enabled
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
Chip Chip Chip Chip Chip Chip Chip Chip memory
x2
Host
CPU 0

DRAM
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip CPU 1
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM-enabled
x10 memory
PIM-enabled Memory
PIM-enabled
Main Memory
memory
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
Chip Chip Chip Chip Chip Chip Chip Chip

DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM


Chip Chip Chip Chip Chip Chip Chip Chip

x2
Host
CPU 1
DRAM
CPU 0
PIM PIM PIM PIM PIM PIM PIM PIM
Chip Chip Chip Chip Chip Chip Chip Chip

PIM PIM PIM PIM PIM PIM PIM PIM


Chip Chip Chip Chip Chip Chip Chip Chip
x10
PIM-enabled Memory

PIM-enabled
memory

116
https://arxiv.org/pdf/2105.03814.pdf
More on the UPMEM PIM System

https://www.youtube.com/watch?v=Sscy1Wrr22A&list=PL5Q2soXY2Zi9xidyIgBxUz7xRPS-wisBN&index=26
Experimental Analysis of the UPMEM PIM Engine

https://arxiv.org/pdf/2105.03814.pdf
Understanding a Modern PIM Architecture

https://www.youtube.com/watch?v=D8Hjy2iU9l4&list=PL5Q2soXY2Zi_tOTAYm--dYByNPL7JhwR9 119
More on Analysis of the UPMEM PIM Engine

https://www.youtube.com/watch?v=D8Hjy2iU9l4&list=PL5Q2soXY2Zi_tOTAYm--dYByNPL7JhwR9
More on Analysis of the UPMEM PIM Engine

https://www.youtube.com/watch?v=Pp9jSU2b9oM&list=PL5Q2soXY2Zi8_VVChACnON4sfh2bJ5IrD&index=159
FPGA-based Processing Near Memory
n Gagandeep Singh, Mohammed Alser, Damla Senol Cali, Dionysios
Diamantopoulos, Juan Gómez-Luna, Henk Corporaal, and Onur Mutlu,
"FPGA-based Near-Memory Acceleration of Modern Data-Intensive
Applications"
IEEE Micro (IEEE MICRO), to appear, 2021.

122
Samsung Function-in-Memory DRAM (2021)

https://news.samsung.com/global/samsung-develops-industrys-first-high-bandwidth-memory-with-ai-processing-power 123
Samsung Function-in-Memory DRAM (2021)

124
Samsung Function-in-Memory DRAM (2021)

125
Samsung Function-in-Memory DRAM (2021)

126
Samsung Function-in-Memory DRAM (2021)

127
Samsung AxDIMM (2021)
n DDR5-PIM Baseline System

q DLRM recommendation system

AxDIMM System

Ke et al. "Near-Memory Processing in Action: Accelerating Personalized Recommendation with AxDIMM", IEEE Micro (2021) 128
Processing in Memory:
Two Approaches

1. Processing near Memory


2. Processing using Memory

129
Specialized Processing in Memory (2015)
n Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu,
and Kiyoung Choi,
"A Scalable Processing-in-Memory Accelerator for
Parallel Graph Processing"
Proceedings of the 42nd International Symposium on
Computer Architecture (ISCA), Portland, OR, June 2015.
[Slides (pdf)] [Lightning Session Slides (pdf)]

130
Simple Processing in Memory (2015)
n Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi,
"PIM-Enabled Instructions: A Low-Overhead,
Locality-Aware Processing-in-Memory Architecture"
Proceedings of the 42nd International Symposium on
Computer Architecture (ISCA), Portland, OR, June 2015.
[Slides (pdf)] [Lightning Session Slides (pdf)]
Processing in Memory on Mobile Devices
n Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata
Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki
Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu,
"Google Workloads for Consumer Devices: Mitigating Data
Movement Bottlenecks"
Proceedings of the 23rd International Conference on Architectural
Support for Programming Languages and Operating
Systems (ASPLOS), Williamsburg, VA, USA, March 2018.

132
Efficient Synchronization for NDP
n Christina Giannoula, Nandita Vijaykumar, Nikela Papadopoulou,
Vasileios Karakostas, Ivan Fernandez, Juan Gómez-Luna, Lois Orosa,
Nectarios Koziris, Georgios Goumas, and Onur Mutlu,
"SynCron: Efficient Synchronization Support for Near-Data-
Processing Architectures"
Proceedings of the 27th International Symposium on High-Performance
Computer Architecture (HPCA), Virtual, February-March 2021.

133
Accelerating GPU Execution with PIM (I)
n Kevin Hsieh, Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike
O'Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W. Keckler,
"Transparent Offloading and Mapping (TOM): Enabling
Programmer-Transparent Near-Data Processing in GPU
Systems"
Proceedings of the 43rd International Symposium on Computer
Architecture (ISCA), Seoul, South Korea, June 2016.
[Slides (pptx) (pdf)]
[Lightning Session Slides (pptx) (pdf)]

134
Accelerating Linked Data Structures
n Kevin Hsieh, Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali
Boroumand, Saugata Ghose, and Onur Mutlu,
"Accelerating Pointer Chasing in 3D-Stacked Memory:
Challenges, Mechanisms, Evaluation"
Proceedings of the 34th IEEE International Conference on Computer
Design (ICCD), Phoenix, AZ, USA, October 2016.

135
DAMOV Analysis Methodology & Workloads

https://arxiv.org/pdf/2105.03725.pdf
More on DAMOV Analysis Methodology & Workloads

https://www.youtube.com/watch?v=GWideVyo0nM&list=PL5Q2soXY2Zi_tOTAYm--dYByNPL7JhwR9&index=3
DAMOV is Open-Source
• We open-source our benchmark suite and our toolchain

DAMOV-SIM
DAMOV
Benchmarks

44
More on DAMOV

n Geraldo F. Oliveira, Juan Gomez-Luna, Lois Orosa, Saugata


Ghose, Nandita Vijaykumar, Ivan fernandez, Mohammad
Sadrosadati, and Onur Mutlu,
"DAMOV: A New Methodology and Benchmark Suite for
Evaluating Data Movement Bottlenecks"
Preprint in arXiv, 8 May 2021.
[arXiv preprint]
[DAMOV Suite and Simulator Source Code]

139
Processing in Memory:
Two Approaches

1. Processing near Memory


2. Processing using Memory

140
In-DRAM Processing (2013)
n Vivek Seshadri et al., “Ambit: In-Memory Accelerator
for Bulk Bitwise Operations Using Commodity DRAM
Technology,” MICRO 2017.

141
In-DRAM Bulk Bitwise Execution (2017)
n Vivek Seshadri and Onur Mutlu,
"In-DRAM Bulk Bitwise Execution Engine"
Invited Book Chapter in Advances in Computers, to appear
in 2020.
[Preliminary arXiv version]

142
SIMDRAM Framework
n Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, Joao Dinis Ferreira, Nika Mansouri
Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gomez-Luna, and Onur Mutlu,
"SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM"
Proceedings of the 26th International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS), Virtual, March-April 2021.
[2-page Extended Abstract]
[Short Talk Slides (pptx) (pdf)]
[Talk Slides (pptx) (pdf)]
[Short Talk Video (5 mins)]
[Full Talk Video (27 mins)]

143
Bulk Data Copy and Initialization in DRAM
n Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata
Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Michael A.
Kozuch, Phillip B. Gibbons, and Todd C. Mowry,
"RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and
Initialization"
Proceedings of the 46th International Symposium on Microarchitecture
(MICRO), Davis, CA, December 2013. [Slides (pptx) (pdf)] [Lightning Session
Slides (pptx) (pdf)] [Poster (pptx) (pdf)]

144
LISA: Increasing Connectivity in DRAM
n Kevin K. Chang, Prashant J. Nair, Saugata Ghose, Donghyuk Lee,
Moinuddin K. Qureshi, and Onur Mutlu,
"Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast
Inter-Subarray Data Movement in DRAM"
Proceedings of the 22nd International Symposium on High-
Performance Computer Architecture (HPCA), Barcelona, Spain,
March 2016.
[Slides (pptx) (pdf)]
[Source Code]

145
FIGARO: Fine-Grained In-DRAM Copy
n Yaohua Wang, Lois Orosa, Xiangjun Peng, Yang Guo, Saugata Ghose,
Minesh Patel, Jeremie S. Kim, Juan Gómez Luna, Mohammad
Sadrosadati, Nika Mansouri Ghiasi, and Onur Mutlu,
"FIGARO: Improving System Performance via Fine-Grained In-
DRAM Data Relocation and Caching"
Proceedings of the 53rd International Symposium on
Microarchitecture (MICRO), Virtual, October 2020.

146
Network-On-Memory: Fast Inter-Bank Copy

n Seyyed Hossein SeyyedAghaei Rezaei, Mehdi Modarressi, Rachata


Ausavarungnirun, Mohammad Sadrosadati, Onur Mutlu, and Masoud
Daneshtalab,
"NoM: Network-on-Memory for Inter-Bank Data Transfer in
Highly-Banked Memories"
IEEE Computer Architecture Letters (CAL), to appear in 2020.

147
In-DRAM Physical Unclonable Functions
n Jeremie S. Kim, Minesh Patel, Hasan Hassan, and Onur Mutlu,
"The DRAM Latency PUF: Quickly Evaluating Physical Unclonable
Functions by Exploiting the Latency-Reliability Tradeoff in Modern DRAM
Devices"
Proceedings of the 24th International Symposium on High-Performance Computer
Architecture (HPCA), Vienna, Austria, February 2018.
[Lightning Talk Video]
[Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)]
[Full Talk Lecture Video (28 minutes)]

148
In-DRAM True Random Number Generation
n Jeremie S. Kim, Minesh Patel, Hasan Hassan, Lois Orosa, and Onur Mutlu,
"D-RaNGe: Using Commodity DRAM Devices to Generate True Random
Numbers with Low Latency and High Throughput"
Proceedings of the 25th International Symposium on High-Performance Computer
Architecture (HPCA), Washington, DC, USA, February 2019.
[Slides (pptx) (pdf)]
[Full Talk Video (21 minutes)]
[Full Talk Lecture Video (27 minutes)]
Top Picks Honorable Mention by IEEE Micro.

149
Processing in Memory:
Two Approaches

1. Processing near Memory


2. Processing using Memory

150
PIM Review and Open Problems

Onur Mutlu, Saugata Ghose, Juan Gomez-Luna, and Rachata Ausavarungnirun,


"A Modern Primer on Processing in Memory"
Invited Book Chapter in Emerging Computing: From Devices to Systems -
Looking Beyond Moore and Von Neumann, Springer, to be published in 2021.

151
https://arxiv.org/pdf/1903.03988.pdf
152
153
PIM Review and Open Problems (II)

Saugata Ghose, Amirali Boroumand, Jeremie S. Kim, Juan Gomez-Luna, and Onur Mutlu,
"Processing-in-Memory: A Workload-Driven Perspective"
Invited Article in IBM Journal of Research & Development, Special Issue on
Hardware for Artificial Intelligence, to appear in November 2019.
[Preliminary arXiv version]
154
https://arxiv.org/pdf/1907.12947.pdf
A Tutorial on PIM
n Onur Mutlu,
"Memory-Centric Computing Systems"
Invited Tutorial at 66th International Electron Devices
Meeting (IEDM), Virtual, 12 December 2020.
[Slides (pptx) (pdf)]
[Executive Summary Slides (pptx) (pdf)]
[Tutorial Video (1 hour 51 minutes)]
[Executive Summary Video (2 minutes)]
[Abstract and Bio]
[Related Keynote Paper from VLSI-DAT 2020]
[Related Review Paper on Processing in Memory]

https://www.youtube.com/watch?v=H3sEaINPBOE

https://www.youtube.com/onurmutlulectures 155
A Tutorial on PIM
n Onur Mutlu,
"Memory-Centric Computing Systems"
Invited Tutorial at 66th International Electron Devices
Meeting (IEDM), Virtual, 12 December 2020.
[Slides (pptx) (pdf)]
[Executive Summary Slides (pptx) (pdf)]
[Tutorial Video (1 hour 51 minutes)]
[Executive Summary Video (2 minutes)]
[Abstract and Bio]
[Related Keynote Paper from VLSI-DAT 2020]
[Related Review Paper on Processing in Memory]

https://www.youtube.com/watch?v=H3sEaINPBOE

https://www.youtube.com/onurmutlulectures 156
Detailed Lectures on PIM (I)
n Computer Architecture, Fall 2020, Lecture 6
q Computation in Memory (ETH Zürich, Fall 2020)

q https://www.youtube.com/watch?v=oGcZAGwfEUE&list=PL5Q2soXY2Zi9xidyIgBxUz
7xRPS-wisBN&index=12

n Computer Architecture, Fall 2020, Lecture 7


q Near-Data Processing (ETH Zürich, Fall 2020)

q https://www.youtube.com/watch?v=j2GIigqn1Qw&list=PL5Q2soXY2Zi9xidyIgBxUz7
xRPS-wisBN&index=13

n Computer Architecture, Fall 2020, Lecture 11a


q Memory Controllers (ETH Zürich, Fall 2020)

q https://www.youtube.com/watch?v=TeG773OgiMQ&list=PL5Q2soXY2Zi9xidyIgBxUz
7xRPS-wisBN&index=20

n Computer Architecture, Fall 2020, Lecture 12d


q Real Processing-in-DRAM with UPMEM (ETH Zürich, Fall 2020)

q https://www.youtube.com/watch?v=Sscy1Wrr22A&list=PL5Q2soXY2Zi9xidyIgBxUz7
xRPS-wisBN&index=25
https://www.youtube.com/onurmutlulectures 157
Detailed Lectures on PIM (II)
n Computer Architecture, Fall 2020, Lecture 15
q Emerging Memory Technologies (ETH Zürich, Fall 2020)

q https://www.youtube.com/watch?v=AlE1rD9G_YU&list=PL5Q2soXY2Zi9xidyIgBxUz
7xRPS-wisBN&index=28

n Computer Architecture, Fall 2020, Lecture 16a


q Opportunities & Challenges of Emerging Memory Technologies
(ETH Zürich, Fall 2020)
q https://www.youtube.com/watch?v=pmLszWGmMGQ&list=PL5Q2soXY2Zi9xidyIgBx
Uz7xRPS-wisBN&index=29

n Computer Architecture, Fall 2020, Guest Lecture


q In-Memory Computing: Memory Devices & Applications (ETH
Zürich, Fall 2020)
q https://www.youtube.com/watch?v=wNmqQHiEZNk&list=PL5Q2soXY2Zi9xidyIgBxU
z7xRPS-wisBN&index=41

https://www.youtube.com/onurmutlulectures 158
Many Interesting Things
Are Happening Today
in Computer Architecture

Performance
and
Energy Efficiency
159
TESLA Full Self-Driving Computer (2019)
n ML accelerator: 260 mm2, 6 billion transistors,
600 GFLOPS GPU, 12 ARM 2.2 GHz CPUs.
n Two redundant chips for better safety.

https://youtu.be/Ucp0TTmvqOE?t=4236 160
Google TPU Generation I (~2016)

Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, ISCA 2017.

161
Google TPU Generation II (2017)

4 TPU chips
vs 1 chip in TPU1

High Bandwidth Memory


vs DDR3

Floating point operations


vs FP16

45 TFLOPS per chip


vs 23 TOPS

https://www.nextplatform.com/2017/05/17/first-depth-look-googles-new-second-generation-tpu/ Designed for training


and inference
vs only inference

162
Google TPU Generation III (2019)

32GB HBM per chip 4 Matrix Units per chip 90 TFLOPS per chip
vs 16GB HBM in TPU2 vs 2 Matrix Units in TPU2 vs 45 TFLOPS in TPU2
https://cloud.google.com/tpu/docs/system-architecture 163
Google TPU Generation IV (2021)

250 TFLOPS per chip in 2021


New ML applications (vs. TPU3): vs 90 TFLOPS in TPU3
• Computer vision
• Natural Language Processing (NLP)
• Recommender system
• Reinforcement learning that plays Go 1 ExaFLOPS per board
https://spectrum.ieee.org/tech-talk/computing/hardware/heres-how-googles-tpu-v4-ai-chip-stacked-up-in-training-tests
164
An Example Modern Systolic Array: TPU (II)

Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, ISCA 2017.
165
An Example Modern Systolic Array: TPU (III)

166
Many (Other) AI/ML Chips
n Alibaba
n Amazon
n Facebook
n Google
n Huawei
n Intel
n Microsoft
n NVIDIA
n Tesla
n Many Others and Many Startups…

n Many More to Come…


167
Many (Other) AI/ML Chips (2019)
n Alibaba
n Amazon
n Facebook
n Google
n Huawei
n Microsoft
n NVIDIA
n Tesla
n Many Startups…

n Many More to Come…

168
https://basicmi.github.io/AI-Chip/
Many (Other) AI/ML Chips (2021)
n Alibaba
n Amazon
n Facebook
n Google
n Huawei
n Microsoft
n NVIDIA
n Tesla
n Many Startups…

n Many More to Come…

169
https://basicmi.github.io/AI-Chip/
Computer Architecture
Lecture 1: Introduction and Basics

Prof. Onur Mutlu


ETH Zürich
Fall 2021
30 September 2021
Further Slides for Your Own Study
(May Be Covered in Future Lectures)

171
Many Interesting Things
Are Happening Today
in Computer Architecture

172
Many Interesting Things
Are Happening Today
in Computer Architecture

Reliability
Security
Safety
173
Security: RowHammer (2014)

174
The Story of RowHammer
n One can predictably induce bit flips in commodity DRAM chips
q >80% of the tested DRAM chips are vulnerable

n First example of how a simple hardware failure mechanism


can create a widespread system security vulnerability

175
Modern DRAM is Prone to Disturbance Errors

Row of Cells Wordline


Row Row
Victim
Row Opened
Hammered Closed
Row VHIGH
LOW
Row Row
Victim
Row

Repeatedly reading a row enough times (before memory gets


refreshed) induces disturbance errors in adjacent rows in
most real DRAM chips you can buy today
Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM 176
Disturbance Errors, (Kim et al., ISCA 2014)
Most DRAM Modules Are Vulnerable
A company B company C company
86% 83% 88%
(37/43) (45/54) (28/32)

Up to Up to Up to
1.0×107 2.7×106 3.3×105
errors errors errors
Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM
Disturbance Errors, (Kim et al., ISCA 2014) 177
One Can Take Over an Otherwise-Secure System

Flipping Bits in Memory Without Accessing Them:


An Experimental Study of DRAM Disturbance Errors
(Kim et al., ISCA 2014)

Exploiting the DRAM rowhammer bug to


gain kernel privileges (Seaborn, 2015)

178
Security: RowHammer (2014)

179
More Security Implications (I)
“We can gain unrestricted access to systems of website visitors.”

Rowhammer.js: A Remote Software-Induced Fault Attack in JavaScript (DIMVA’16)


Source: https://lab.dsst.io/32c3-slides/7197.html
180
More Security Implications (II)
“Can gain control of a smart phone deterministically”

Drammer: Deterministic Rowhammer


Source: https://fossbytes.com/drammer-rowhammer-attack-android-root-devices/ Attacks on Mobile Platforms, CCS’16181
More Security Implications (III)
n Using an integrated GPU in a mobile system to remotely
escalate privilege via the WebGL interface

182
More Security Implications (IV)
n Rowhammer over RDMA (I)

183
More Security Implications (V)
n Rowhammer over RDMA (II)

184
More Security Implications (VI)
n IEEE S&P 2020
More Security Implications (VII)
n USENIX Security 2019
More Security Implications (VIII)
n USENIX Security 2020
RowHammer: Seven Years Ago…
n Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk
Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu,
"Flipping Bits in Memory Without Accessing Them: An
Experimental Study of DRAM Disturbance Errors"
Proceedings of the 41st International Symposium on Computer
Architecture (ISCA), Minneapolis, MN, June 2014.
[Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)] [Source Code
and Data]

188
RowHammer: 2019 and Beyond…
n Onur Mutlu and Jeremie Kim,
"RowHammer: A Retrospective"
IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems (TCAD) Special Issue on Top Picks in Hardware and
Embedded Security, 2019.
[Preliminary arXiv version]
[Slides from COSADE 2019 (pptx)]
[Slides from VLSI-SOC 2020 (pptx) (pdf)]
[Talk Video (1 hr 15 minutes, with Q&A)]

189
RowHammer in 2020
RowHammer in 2020 (I)
n Jeremie S. Kim, Minesh Patel, A. Giray Yaglikci, Hasan Hassan,
Roknoddin Azizi, Lois Orosa, and Onur Mutlu,
"Revisiting RowHammer: An Experimental Analysis of Modern
Devices and Mitigation Techniques"
Proceedings of the 47th International Symposium on Computer
Architecture (ISCA), Valencia, Spain, June 2020.
[Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[Talk Video (20 minutes)]
[Lightning Talk Video (3 minutes)]

191
Key Takeaways from 1580 Chips
• Newer DRAM chips are more vulnerable to
RowHammer

• There are chips today whose weakest cells fail after only
4800 hammers

• Chips of newer DRAM technology nodes can exhibit


RowHammer bit flips 1) in more rows and 2) farther
away from the victim row.

• Existing mitigation mechanisms are NOT effective

192
RowHammer in 2020 (II)
n Pietro Frigo, Emanuele Vannacci, Hasan Hassan, Victor van der Veen, Onur Mutlu,
Cristiano Giuffrida, Herbert Bos, and Kaveh Razavi,
"TRRespass: Exploiting the Many Sides of Target Row Refresh"
Proceedings of the 41st IEEE Symposium on Security and Privacy (S&P), San Francisco,
CA, USA, May 2020.
[Slides (pptx) (pdf)]
[Lecture Slides (pptx) (pdf)]
[Talk Video (17 minutes)]
[Lecture Video (59 minutes)]
[Source Code]
[Web Article]
Best paper award.
Pwnie Award 2020 for Most Innovative Research. Pwnie Awards 2020

193
RowHammer in 2020 (III)
n Lucian Cojocar, Jeremie Kim, Minesh Patel, Lillian Tsai, Stefan Saroiu,
Alec Wolman, and Onur Mutlu,
"Are We Susceptible to Rowhammer? An End-to-End
Methodology for Cloud Providers"
Proceedings of the 41st IEEE Symposium on Security and
Privacy (S&P), San Francisco, CA, USA, May 2020.
[Slides (pptx) (pdf)]
[Talk Video (17 minutes)]

194
BlockHammer Solution in 2021
n A. Giray Yaglikci, Minesh Patel, Jeremie S. Kim, Roknoddin Azizi, Ataberk Olgun,
Lois Orosa, Hasan Hassan, Jisung Park, Konstantinos Kanellopoulos, Taha
Shahroodi, Saugata Ghose, and Onur Mutlu,
"BlockHammer: Preventing RowHammer at Low Cost by Blacklisting
Rapidly-Accessed DRAM Rows"
Proceedings of the 27th International Symposium on High-Performance
Computer Architecture (HPCA), Virtual, February-March 2021.
[Slides (pptx) (pdf)]
[Short Talk Slides (pptx) (pdf)]
[Talk Video (22 minutes)]
[Short Talk Video (7 minutes)]

195
Detailed Lectures on RowHammer
n Computer Architecture, Fall 2020, Lecture 4b
q RowHammer (ETH Zürich, Fall 2020)

q https://www.youtube.com/watch?v=KDy632z23UE&list=PL5Q2soXY2Zi9xidyIgBxUz
7xRPS-wisBN&index=8
n Computer Architecture, Fall 2020, Lecture 5a
q RowHammer in 2020: TRRespass (ETH Zürich, Fall 2020)

q https://www.youtube.com/watch?v=pwRw7QqK_qA&list=PL5Q2soXY2Zi9xidyIgBxU
z7xRPS-wisBN&index=9
n Computer Architecture, Fall 2020, Lecture 5b
q RowHammer in 2020: Revisiting RowHammer (ETH Zürich, Fall 2020)

q https://www.youtube.com/watch?v=gR7XR-
Eepcg&list=PL5Q2soXY2Zi9xidyIgBxUz7xRPS-wisBN&index=10
n Computer Architecture, Fall 2020, Lecture 5c
q Secure and Reliable Memory (ETH Zürich, Fall 2020)

q https://www.youtube.com/watch?v=HvswnsfG3oQ&list=PL5Q2soXY2Zi9xidyIgBxUz
7xRPS-wisBN&index=11

https://www.youtube.com/onurmutlulectures 196
The Story of RowHammer Lecture …
n Onur Mutlu,
"The Story of RowHammer"
Keynote Talk at Secure Hardware, Architectures, and Operating Systems
Workshop (SeHAS), held with HiPEAC 2021 Conference, Virtual, 19 January 2021.
[Slides (pptx) (pdf)]
[Talk Video (1 hr 15 minutes, with Q&A)]

197
Two Upcoming RowHammer Papers at MICRO 2021

n Lois Orosa, Abdullah Giray Yaglikci, Haocong Luo, Ataberk Olgun, Jisung
Park, Hasan Hassan, Minesh Patel, Jeremie S. Kim, Onur Mutlu,
"A Deeper Look into RowHammer's Sensitivities: Experimental
Analysis of Real DRAM Chips and Implications on Future Attacks
and Defenses"
MICRO 2021

199
Two Upcoming RowHammer Papers at MICRO 2021

n Hasan Hassan, Yahya Can Tugrul, Jeremie S. Kim, Victor van der Veen,
Kaveh Razavi, Onur Mutlu,
"Uncovering In-DRAM RowHammer Protection Mechanisms: A
New Methodology, Custom RowHammer Patterns, and
Implications"
MICRO 2021

200
TRRespass Key Takeaways

RowHammer is still
an open problem

Security by obscurity
is likely not a good solution
201
Security: Meltdown and Spectre (2018)

Source: J. Masters, Redhat, FOSDEM 2018 keynote talk. 202


Meltdown and Spectre
n Someone can steal secret data from the system even though
q your program and data are perfectly correct and
q your hardware behaves according to the specification and
q there are no software vulnerabilities/bugs

n Why?
q Speculative execution leaves traces of secret data in the
processor’s cache (internal storage)
n It brings data that is not supposed to be brought/accessed if there
was no speculative execution
q A malicious program can inspect the contents of the cache to
“infer” secret data that it is not supposed to access
q A malicious program can actually force another program to
speculatively execute code that leaves traces of secret data
203
More on Meltdown/Spectre Vulnerabilities

Source: https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html
204
Many Interesting Things
Are Happening Today
in Computer Architecture

205
Many Interesting Things
Are Happening Today
in Computer Architecture

More Demanding Workloads

206
Increasingly Demanding Applications

Dream

and, they will come


As applications push boundaries, computing platforms will become increasingly strained.

207
New Genome Sequencing Technologies

Oxford Nanopore MinION

Senol Cali+, “Nanopore Sequencing Technology and Tools for Genome


Assembly: Computational Analysis of the Current State, Bottlenecks
Data → performance & energy bottleneck
and Future Directions,” Briefings in Bioinformatics, 2018.
[Preliminary arxiv.org version]

208
Why Do We Care? An Example

209
Source: https://nanoporetech.com/about-us/news/200-oxford-nanopore-sequencers-have-left-uk-china-support-rapid-near-sample
Population-Scale Microbiome Profiling

https://blog.wego.com/7-crowded-places-and-events-that-you-will-love/ 210
City-Scale Microbiome Profiling

Afshinnekoo+, "Geospatial Resolution of Human and


Bacterial Diversity with City-Scale Metagenomics", Cell
Systems, 2015
211
Example: Rapid Surveillance of Ebola Outbreak

Quick+, “Real-time, portable genome sequencing for Ebola surveillance”, Nature, 2016
212
High-Throughput Genome Sequencers
Oxford
Nanopore
PromethION

Pacific
Biosciences
Illumina MiSeq
Sequel II

Oxford Nanopore MinION

Oxford
Nanopore
SmidgION
Illumina NovaSeq 6000 Pacific Biosciences RS II
… and more! All produce data with different properties.
213
High-Throughput Genome Sequencers
Mohammed Alser, Zülal Bingöl, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, Onur Mutlu
“Accelerating Genome Analysis: A Primer on an Ongoing Journey” IEEE Micro, August 2020.

MinION from ONT

SmidgION from ONT


214
214
The Genomic Era
development of high-throughput
sequencing (HTS) technologies

Number of Genomes
Sequenced

http://www.economist.com/news/21631808-so-much-genetic-data-so-many-uses-genes-unzipped 215
C CA TC AT TT AA AT
G C AC
A C G
T

C 0 1 2

AA 1 0 1 2

CC 2 1 0 1 2

TT 2 1 0 1 2
Billions of Short Reads AA 2 1 2 1 2

CCCCCCT AT AT AT ACGT ACT AGT ACGT TG 2 2 2 1 2

AA 3 2 2 2 2

ACGAC T T TAGT ACGT ACGT TA 3 3 3 2 3

T AT AT AT ACGT ACT AGT ACGT AC 4 3 3 2 3

CT 4 4 3 2

GT 5 4 3

ACGT ACG CCCCT ACGT A Short Read Read


T AT AT AT ACGT ACT AGT ACGT
Alignment
ACGAC T T TAGT ACGT ACGT
T AT AT AT ACGT ACT AAAGT ACGT
T AT AT AT ACGT ACT AGT ACGT
ACG T T T T TAAAACGT A
T AT AT AT ACGT ACT AGT ACGT
ACGAC GGGGAGT ACGT ACGT
... ...
Reference Genome

1 Sequencing Genome Read Mapping 2


Analysis

Data → performance & energy bottleneck

3 Variant Calling Scientific Discovery 4


Software Acceleration: Eliminate Useless Work
n Download the source code and try for yourself
q Download link to FastHASH

217
Shifted Hamming Distance: SIMD Acceleration
https://github.com/CMU-SAFARI/Shifted-Hamming-Distance

Xin+, "Shifted Hamming Distance: A Fast and Accurate SIMD-friendly Filter


to Accelerate Alignment Verification in Read Mapping”, Bioinformatics 2015.

218
GateKeeper: FPGA-Based Alignment Filtering

1
st

Alignment
Filter FPGA-based
Alignment Filter.
Low Speed & High Accuracy
Medium Speed, Medium Accuracy
High Speed, Low Accuracy

x10 12 Hardware Accelerator


x10 3
mappings mappings
CA TC A
T T
T A
A A
G T
C A
A C
C G
T

C 0 1 2

AA 1 0 1 2

CC 2 1 0 1 2

TT 2 1 0 1 2

AA 2 1 2 1 2

TG 2 2 2 1 2

AA 3 2 2 2 2

TA 3 3 3 2 3

AC 4 3 3 2 3

CT 4 4 3 2

Billions of Short Reads GT 5 4 3

High throughput DNA Read Pre-Alignment Filtering Read Alignment


1 2 3
sequencing (HTS) technologies Fast & Low False Positive Rate Slow & Zero False Positives
219
GateKeeper: FPGA-Based Alignment Filtering
n Mohammed Alser, Hasan Hassan, Hongyi Xin, Oguz Ergin, Onur
Mutlu, and Can Alkan
"GateKeeper: A New Hardware Architecture for
Accelerating Pre-Alignment in DNA Short Read Mapping"
Bioinformatics, [published online, May 31], 2017.
[Source Code]
[Online link at Bioinformatics Journal]

220
In-Memory DNA Sequence Analysis
n Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan
Hassan, Oguz Ergin, Can Alkan, and Onur Mutlu,
"GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-
Memory Technologies"
BMC Genomics, 2018.
Proceedings of the 16th Asia Pacific Bioinformatics Conference (APBC), Yokohama, Japan, January
2018.
[Slides (pptx) (pdf)]
[Source Code]
[arxiv.org Version (pdf)]
[Talk Video at AACBB 2019]

221
Shouji (障子) [Alser+, Bioinformatics 2019]
Mohammed Alser, Hasan Hassan, Akash Kumar, Onur Mutlu, and Can Alkan,
"Shouji: A Fast and Efficient Pre-Alignment Filter for Sequence Alignment"
Bioinformatics, [published online, March 28], 2019.
[Source Code]
[Online link at Bioinformatics Journal]

222
SneakySnake [Alser+, Bioinformatics 2020]
Mohammed Alser, Taha Shahroodi, Juan-Gomez Luna, Can Alkan, and Onur Mutlu,
"SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment
Filter for CPUs, GPUs, and FPGAs"
Bioinformatics, to appear in 2020.
[Source Code]
[Online link at Bioinformatics Journal]

223
GenASM Framework [MICRO 2020]
n Damla Senol Cali, Gurpreet S. Kalsi, Zulal Bingol, Can Firtina, Lavanya Subramanian, Jeremie S.
Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand,
Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu,
"GenASM: A High-Performance, Low-Power Approximate String Matching
Acceleration Framework for Genome Sequence Analysis"
Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual,
October 2020.
[Lighting Talk Video (1.5 minutes)]
[Lightning Talk Slides (pptx) (pdf)]
[Talk Video (18 minutes)]
[Slides (pptx) (pdf)]

224
Future of Genome Sequencing & Analysis
Mohammed Alser, Zülal Bingöl, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, Onur Mutlu
“Accelerating Genome Analysis: A Primer on an Ongoing Journey” IEEE Micro, August 2020.

MinION from ONT

SmidgION from ONT


225
225
COVID-19 Nanopore Sequencing (I)

• From ONT (https://nanoporetech.com/covid-19/overview)


226
COVID-19 Nanopore Sequencing (II)

• From ONT (https://nanoporetech.com/covid-19/overview)


227
Accelerating Genome Analysis: Overview
n Mohammed Alser, Zulal Bingol, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can
Alkan, and Onur Mutlu,
"Accelerating Genome Analysis: A Primer on an Ongoing Journey"
IEEE Micro (IEEE MICRO), Vol. 40, No. 5, pages 65-75, September/October 2020.
[Slides (pptx)(pdf)]
[Talk Video (1 hour 2 minutes)]

228
More on Fast Genome Analysis …
n Onur Mutlu,
"Accelerating Genome Analysis: A Primer on an Ongoing Journey"
Invited Lecture at Technion, Virtual, 26 January 2021.
[Slides (pptx) (pdf)]
[Talk Video (1 hour 37 minutes, including Q&A)]
[Related Invited Paper (at IEEE Micro, 2020)]

229
Detailed Lectures on Genome Analysis
n Computer Architecture, Fall 2020, Lecture 3a
q Introduction to Genome Sequence Analysis (ETH Zürich, Fall 2020)
q https://www.youtube.com/watch?v=CrRb32v7SJc&list=PL5Q2soXY2Zi9xidyIgBxUz7
xRPS-wisBN&index=5

n Computer Architecture, Fall 2020, Lecture 8


q Intelligent Genome Analysis (ETH Zürich, Fall 2020)
q https://www.youtube.com/watch?v=ygmQpdDTL7o&list=PL5Q2soXY2Zi9xidyIgBxU
z7xRPS-wisBN&index=14

n Computer Architecture, Fall 2020, Lecture 9a


q GenASM: Approx. String Matching Accelerator (ETH Zürich, Fall 2020)
q https://www.youtube.com/watch?v=XoLpzmN-
Pas&list=PL5Q2soXY2Zi9xidyIgBxUz7xRPS-wisBN&index=15

n Accelerating Genomics Project Course, Fall 2020, Lecture 1


q Accelerating Genomics (ETH Zürich, Fall 2020)

q https://www.youtube.com/watch?v=rgjl8ZyLsAg&list=PL5Q2soXY2Zi9E2bBVAgCqL
gwiDRQDTyId

https://www.youtube.com/onurmutlulectures 230
Many Interesting Things
Are Happening Today
in Computer Architecture

More Demanding Workloads

231
The Problem

Computing
is Bottlenecked by Data

232
Data is Key for AI, ML, Genomics, …

n Important workloads are all data intensive

n They require rapid and efficient processing of large amounts


of data

n Data is increasing
q We can generate more than we can process

233
Data is Key for Future Workloads

In-memory Databases Graph/Tree Processing


[Mao+, EuroSys’12; [Xu+, IISWC’12; Umuroglu+, FPL’15]
Clapp+ (Intel), IISWC’15]

In-Memory Data Analytics Datacenter Workloads


[Clapp+ (Intel), IISWC’15; [Kanev+ (Google), ISCA’15]
Awan+, BDCloud’15]
Data Overwhelms Modern Machines

In-memory Databases Graph/Tree Processing


[Mao+, EuroSys’12; [Xu+, IISWC’12; Umuroglu+, FPL’15]
Clapp+ (Intel), IISWC’15]
Data → performance & energy bottleneck

In-Memory Data Analytics Datacenter Workloads


[Clapp+ (Intel), IISWC’15; [Kanev+ (Google), ISCA’15]
Awan+, BDCloud’15]
Data is Key for Future Workloads

Chrome TensorFlow Mobile


Google’s web browser Google’s machine learning
framework

Video Playback Video Capture


Google’s video codec Google’s video codec
Data Overwhelms Modern Machines

Chrome TensorFlow Mobile


Google’s web browserGoogle’s machine learning
Data → performance & energy bottleneck
framework

Video Playback Video Capture


Google’s video codec Google’s video codec
Data Movement Overwhelms Modern Machines
n Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric Shiu, Rahul
Thakur, Daehyun Kim, Aki Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu,
"Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks"
Proceedings of the 23rd International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS), Williamsburg, VA, USA, March 2018.

62.7% of the total system energy


is spent on data movement

238
Data Movement vs. Computation Energy

Dally, HiPEAC 2015

A memory access consumes ~100-1000X


the energy of a complex addition
239
Many Interesting Things
Are Happening Today
in Computer Architecture

240
Many Novel Concepts Investigated Today
n New Computing Paradigms (Rethinking the Full Stack)
q Processing in Memory, Processing Near Data
q Neuromorphic Computing
q Fundamentally Secure and Dependable Computers

n New Accelerators (Algorithm-Hardware Co-Designs)


q Artificial Intelligence & Machine Learning
q Graph Analytics
q Genome Analysis

n New Memories and Storage Systems


q Non-Volatile Main Memory
q Intelligent Memory
241
Increasingly Demanding Applications

Dream

and, they will come


As applications push boundaries, computing platforms will become increasingly strained.

242
Increasingly Diverging/Complex Tradeoffs

Dally, HiPEAC 2015

243
Increasingly Diverging/Complex Tradeoffs

Dally, HiPEAC 2015

A memory access consumes ~1000X


the energy of a complex addition
244
Increasingly Complex Systems

Past systems

Microprocessor Main Memory Storage (SSD/HDD)

245
Increasingly Complex Systems
FPGAs
Modern systems

Hybrid Main Memory

Heterogeneous Persistent Memory/Storage


Processors and
Accelerators

(General Purpose) GPUs

246
Computer Architecture Today
n Computing landscape is very different from 10-20 years ago

n Applications and technology both demand novel architectures

Hybrid Main Memory

Heterogeneous Persistent Memory/Storage


Processors and
Accelerators Every component and its
interfaces, as well as
entire system designs
are being re-examined
General Purpose GPUs

247
Computer Architecture Today (II)
n You can revolutionize the way computers are built, if you
understand both the hardware and the software (and
change each accordingly)

n You can invent new paradigms for computation,


communication, and storage

n Recommended book: Thomas Kuhn, “The Structure of


Scientific Revolutions” (1962)
q Pre-paradigm science: no clear consensus in the field
q Normal science: dominant theory used to explain/improve
things (business as usual); exceptions considered anomalies
q Revolutionary science: underlying assumptions re-examined

248
Computer Architecture Today (II)
n You can revolutionize the way computers are built, if you
understand both the hardware and the software (and
change each accordingly)

n You can invent new paradigms for computation,


communication, and storage

n Recommended book: Thomas Kuhn, “The Structure of


Scientific Revolutions” (1962)
q Pre-paradigm science: no clear consensus in the field
q Normal science: dominant theory used to explain/improve
things (business as usual); exceptions considered anomalies
q Revolutionary science: underlying assumptions re-examined

249
Takeaways
n It is an exciting time to be understanding and designing
computing architectures

n Many challenging and exciting problems in platform design


q That no one has tackled (or thought about) before
q That can have huge impact on the world’s future

n Driven by huge hunger for data (Big Data), new applications


(ML/AI, graph analytics, genomics), ever-greater realism, …
q We can easily collect more data than we can analyze/understand

n Driven by significant difficulties in keeping up with that


hunger at the technology layer
q Five walls: Energy, reliability, complexity, security, scalability

250
Let’s Start with Some Fundamentals

251
Question: What Is This?

Source: By Toni_V, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=4087256 252


Answer: The First Major Piece of a Famous Architect
n Bahnhof Stadelhofen: “The train station has several of
the features that became signatures of his work; straight
lines and right angles are rare.“

n ETH Alumnus, PhD in Civil Engineering

Source: By 準建築人手札網站 Forgemind ArchiMedia - Flickr: IMG_2489.JPG, CC BY 2.0, 253


https://commons.wikimedia.org/w/index.php?curid=31493356, https://en.wikipedia.org/wiki/Santiago_Calatrava
Compare To This

Source: http://cookiemagik.deviantart.com/art/Train-station-207266944 254


Question 2: What Is This?

255
Source: https://www.dezeen.com/2016/08/29/santiago-calatrava-oculus-world-trade-center-transportation-hub-new-york-photographs-hufton-crow/
Answer: Masterpiece of a Famous Architect

Source: https://en.wikipedia.org/wiki/World_Trade_Center_station_(PATH) 256


Strengths and Praise

Source: https://en.wikipedia.org/wiki/World_Trade_Center_station_(PATH) 257


Design Constraints and Criticism

Source: https://en.wikipedia.org/wiki/World_Trade_Center_station_(PATH) 258


Source: https://en.wikipedia.org/wiki/Stegosaurus

Susannah Maidment et al. & Natural History Museum, London - Maidment SCR, Brassey C, Barrett PM (2015)
The Postcranial Skeleton of an Exceptionally Complete Individual of the Plated Dinosaur Stegosaurus stenops
(Dinosauria: Thyreophora) from the Upper Jurassic Morrison Formation of Wyoming, U.S.A. PLoS ONE 10(10):
e0138352. doi:10.1371/journal.pone.0138352
259
Design Constraints: Noone is Immune

Source: https://en.wikipedia.org/wiki/World_Trade_Center_station_(PATH) 260


Question: What Is This?

261
262
Answer: Masterpiece of Another Famous Architect

Source: https://en.wikipedia.org/wiki/Fallingwater 263


Your First Comp Arch Assignment
n Go and visit Bahnhof Stadelhofen
q Extra credit: Repeat for Oculus
q Extra+ credit: Repeat for Fallingwater

n Appreciate the beauty & out-of-the-box and creative thinking


n Think about tradeoffs in the design of the Bahnhof
q Strengths, weaknesses, goals of design
n Derive principles on your own for good design and innovation

n Due date: Any time during this course


q Later during the course is better
q Apply what you have learned in this course
q Think out-of-the-box
264
But First, Today’s First Assignment
n Find The Differences Of This and That

265
Find The Differences of
This and That

266
This

Source: By Toni_V from Zurich, Switzerland - Stadelhofen2, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=4087256 267
That

Source: http://cookiemagik.deviantart.com/art/Train-station-207266944 268


Many Tradeoffs Between Two Designs
n You can list them after you complete the first assignment…

269
Aside: Evaluation Criteria for the Designs
n Functionality (Does it meet the specification?)
n Reliability
n Space requirement
n Cost
n Expandability
n Comfort level of users
n Happiness level of users
n Aesthetics
n …

n How to evaluate goodness of design is always a critical


question.
270
A Key Question
n How was Calavatra able to design especially his key buildings?
n Can have many guesses
q (Ultra) hard work, perseverance, dedication (over decades)
q Experience
q Creativity, Out-of-the-box thinking
q A good understanding of past designs
q Good judgment and intuition
q Strong skill combination (math, architecture, art, engineering, …)
q Funding ($$$$), luck, initiative, entrepreneurialism
q Strong understanding of and commitment to fundamentals
q Principled design
q …

n (You will be exposed to and hopefully develop/enhance many


of these skills in this course)
271
Principled Design
n “To me, there are two overriding principles to be found in
nature which are most appropriate for building:
q one is the optimal use of material,
q the other the capacity of organisms to change shape, to grow,
and to move.”
q Santiago Calatrava

n “Calatrava's constructions are inspired by natural forms like


plants, bird wings, and the human body.”

Source: http://www.arcspace.com/exhibitions/unsorted/santiago-calatrava/ 272


Gare do Oriente, Lisbon, Revisited

Source: By Martín Gómez Tagle - Lisbon, Portugal, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=13764903 273
Source: http://www.arcspace.com/exhibitions/unsorted/santiago-calatrava/
A Principled Design

274
What Does This Remind You Of?

Source: https://www.dezeen.com/2016/08/29/santiago-calatrava-oculus-world-trade-center-transportation-hub-new-york-photographs-hufton-crow/ 275


What About This?

Source: De Galván - Puente del Alamillo.jpg on Enciclopedia.us.es, GFDL, https://commons.wikimedia.org/w/index.php?curid=15026095 276


Milwaukee Art Museum

Source: By Andrew C. from Flagstaff, USA - Flickr, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=379223 277


Athens Olympic Stadium

Source: By Spyrosdrakopoulos - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=16172519


278
City of Arts and Sciences, Valencia

Source: CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=172107 279


Florida Polytechnic University (I)

280
Source: http://www.architectmagazine.com/design/buildings/florida-polytechnic-university-designed-by-santiago-calatrava_o
Oculus, New York City

281
Source: https://www.dezeen.com/2016/08/29/santiago-calatrava-oculus-world-trade-center-transportation-hub-new-york-photographs-hufton-crow/
A Quote from The Other Famous Architect
n “architecture […] based upon principle, and not upon
precedent” (Frank Lloyd Wright)

Source: http://www.fallingwater.org/ 282


A Principled Design

283
Another View

Source: https://roadtrippers.com/stories/falling-water 284


Yet Another View

Source: By Carol M. Highsmith - http://www.loc.gov/pictures/collection/highsm/item/2010630255/, Public Domain, https://commons.wikimedia.org/w/index.php?curid=29385254 285


286
Major High-Level Goals of This Course
n Understand the principles
n Understand the precedents

n Based on such understanding:


q Enable you to evaluate tradeoffs of different designs and ideas
q Enable you to develop principled designs
q Enable you to develop novel, out-of-the-box designs

n The focus is on:


q Principles, precedents, and how to use them for new designs

n In Computer Architecture

287
Role of the (Computer) Architect

from Yale Patt’s lecture notes


Role of The (Computer) Architect
n Look backward (to the past)
q Understand tradeoffs and designs, upsides/downsides, past
workloads. Analyze and evaluate the past.
n Look forward (to the future)
q Be the dreamer and create new designs. Listen to dreamers.
q Push the state of the art. Evaluate new design choices.
n Look up (towards problems in the computing stack)
q Understand important problems and their nature.
q Develop architectures and ideas to solve important problems.
n Look down (towards device/circuit technology)
q Understand the capabilities of the underlying technology.
q Predict and adapt to the future of technology (you are
designing for N years ahead). Enable the future technology.
289
Takeaways
n Being an architect is not easy
n You need to consider many things in designing a new
system + have good intuition/insight into ideas/tradeoffs

n But, it is fun and can be very rewarding


n And, enables a great future
q E.g., many scientific and everyday-life innovations would not
have been possible without architectural innovation that
enabled very high performance systems
q E.g., your mobile phones
q E.g., self-driving vehicles

n This course will enable you to become a good computer


architect
290
So, I Hope You Are Here for This
“C” as a model of computation
Comp. Systems
Programmer’s view of how
a computer system works

n How does an assembly


program end up executing as Architect/microarchitect’s view:
digital logic? How to design a computer that
meets system design goals.
n What happens in-between? Choices critically affect both
n How is a computer designed the SW programmer and
the HW designer
using logic gates and wires
to satisfy specific goals?
HW designer’s view of how
a computer system works
Digital Design Digital logic as a
model of computation
291
Levels of Transformation
“The purpose of computing is [to gain] insight” (Richard Hamming)
We gain and generate insight by solving problems
How do we ensure problems are solved by electrons?

Algorithm Problem

Step-by-step procedure that is Algorithm


guaranteed to terminate where Program/Language
each step is precisely stated Runtime System
and can be carried out by a ISA
(VM, OS, MM) (Instruction Set Architecture)
computer
ISA (Architecture)
- Finiteness Interface/contract between
Microarchitecture
- Definiteness SW and HW.
Logic
- Effective computability
Devices What the programmer
Many algorithms for the same Electrons assumes hardware will
problem satisfy.
Microarchitecture Digital logic circuits
An implementation of the ISA Building blocks of micro-arch (e.g., gates)
292
Aside: An Important Work By Hamming
n Hamming, “Error Detecting and Error Correcting Codes,”
Bell System Technical Journal 1950.

n Introduced the concept of Hamming distance


q number of locations in which the corresponding symbols of
two equal-length strings is different
n Developed a theory of codes used for error detection and
correction

n Also see:
q Hamming, “You and Your Research,” Talk at Bell Labs, 1986.
q http://www.cs.virginia.edu/~robins/YouAndYourResearch.html

293
Levels of Transformation, Revisited
n A user-centric view: computer designed for users
Problem
Algorithm
Program/Language User

Runtime System
(VM, OS, MM)
ISA
Microarchitecture
Logic
Devices
Electrons

n The entire stack should be optimized for user


294
The Power of Abstraction
n Levels of transformation create abstractions
q Abstraction: A higher level only needs to know about the
interface to the lower level, not how the lower level is
implemented
q E.g., high-level language programmer does not really need to
know what the ISA is and how a computer executes instructions

n Abstraction improves productivity


q No need to worry about decisions made in underlying levels
q E.g., programming in Java vs. C vs. assembly vs. binary vs. by
specifying control signals of each transistor every cycle

n Then, why would you want to know what goes on


underneath or above?

295
Crossing the Abstraction Layers
n As long as everything goes well, not knowing what happens
underneath (or above) is not a problem.
n What if
q The program you wrote is running slow?
q The program you wrote does not run correctly?
q The program you wrote consumes too much energy?
q Your system just shut down and you have no idea why?
q Someone just compromised your system and you have no idea how?

n What if
q The hardware you designed is too hard to program?
q The hardware you designed is too slow because it does not provide the
right primitives to the software?

n What if
q You want to design a much more efficient and higher performance system?
296
Crossing the Abstraction Layers
n Two key goals of this course are

q to understand how a processor works underneath the


software layer and how decisions made in hardware affect the
software/programmer

q to enable you to be comfortable in making design and


optimization decisions that cross the boundaries of different
layers and system components

297
An Example: Multi-Core Systems
Multi-Core
Chip

L2 CACHE 1
L2 CACHE 0
SHARED L3 CACHE

DRAM INTERFACE
CORE 0 CORE 1

DRAM BANKS
DRAM MEMORY
CONTROLLER
L2 CACHE 2

L2 CACHE 3

CORE 2 CORE 3

*Die photo credit: AMD Barcelona


298
Another Example: Memory Refresh

47%

15%

Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. 299
Computer Architecture
Lecture 1: Introduction and Basics

Prof. Onur Mutlu


ETH Zürich
Fall 2021
30 September 2021

You might also like