Onur Comparch Fall2021 Lecture1 Intro Afterlecture
Onur Comparch Fall2021 Lecture1 Intro Afterlecture
4
The Transformation Hierarchy
Problem
Algorithm
Program/Language
System Software
Computer Architecture SW/HW Interface Computer Architecture
(expanded view) (narrow view)
Micro-architecture
Logic
Devices
Electrons
5
Axiom
To achieve the highest energy efficiency and performance:
Problem
Algorithm
Program/Language
System Software Co-design across the hierarchy:
SW/HW Interface Algorithms to devices
Micro-architecture
Logic Specialize as much as possible
Devices within the design goals
Electrons
6
Current Research Mission & Major Topics
Build fundamentally better architectures
n Data-centric arch. for low energy & high perf.
Problem q Proc. in Mem/DRAM, NVM, unified mem/storage
Algorithm n Low-latency & predictable architectures
Program/Language q Low-latency, low-energy yet low-cost memory
System Software q QoS-aware and predictable memory systems
SW/HW Interface
Micro-architecture
n Fundamentally secure/reliable/safe arch.
q Tolerating all bit flips; patchable HW; secure mem
Logic
Devices n Architectures for ML/AI/Genomics/Graph/Med
Electrons q Algorithm/arch./logic co-design; full heterogeneity
Broad research n Data-driven and data-aware architectures
spanning apps, systems, logic
with architecture at the center q ML/AI-driven architectural controllers and design
q Expressive memory and expressive systems
7
SAFARI Research Group
https://safari.ethz.ch
8
Onur Mutlu’s SAFARI Research Group
Computer architecture, HW/SW, systems, bioinformatics, security, memory
https://safari.ethz.ch/safari-newsletter-april-2020/
38+ Researchers
https://safari.ethz.ch
SAFARI Newsletter January 2021 Edition
n https://safari.ethz.ch/safari-newsletter-january-2021/
10
SAFARI PhD and Post-Doc Alumni
n https://safari.ethz.ch/safari-alumni/
n Nastaran Hajinazar (ETH Zurich)
n Gagandeep Singh (ETH Zurich)
n Amirali Boroumand (Stanford Univ)
n Jeremie Kim (ETH Zurich)
n Nandita Vijaykumar (Univ. of Toronto, Assistant Professor)
n Kevin Hsieh (Microsoft Research, Senior Researcher)
n Justin Meza (Facebook)
n Mohammed Alser (ETH Zurich)
n Yixin Luo (Google)
n Kevin Chang (Facebook)
n Rachata Ausavarungnirun (KMUNTB, Assistant Professor)
n Gennady Pekhimenko (Univ. of Toronto, Assistant Professor)
n Vivek Seshadri (Microsoft Research)
n Donghyuk Lee (NVIDIA Research, Senior Researcher)
n Yoongu Kim (Google)
n Lavanya Subramanian (Intel Labs à Facebook)
11
SAFARI Research Group: Introduction and Research
n Onur Mutlu,
"SAFARI Research Group: Introduction & Research"
Talk at ETH Future Computing Laboratory Welcome
Workshop (EFCL), Virtual, 6 July 2021.
[Slides (pptx) (pdf)]
12
A Talk on Impactful Research & Teaching
13
https://www.youtube.com/watch?v=83tlorht7Mc&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=54
Principle: Teaching and Research
…
Teaching drives Research
Research drives Teaching
…
14
15
https://www.youtube.com/onurmutlulectures
Online Courses & Lectures
n First Computer Architecture & Digital Design Course
q Digital Design and Computer Architecture
q Spring 2021 Livestream Edition:
https://www.youtube.com/watch?v=LbC0EZY8yw4&list=PL5Q
2soXY2Zi_uej3aY39YB5pfW4SJ7LlN
https://www.youtube.com/onurmutlulectures 16
DDCA (Spring 2021)
n https://safari.ethz.ch/digitaltechnik/
spring2021/doku.php?id=schedule
n https://www.youtube.com/watch?v
=LbC0EZY8yw4&list=PL5Q2soXY2Zi
_uej3aY39YB5pfW4SJ7LlN
n Bachelor’s course
q 2nd semester at ETH Zurich
q Rigorous introduction into “How
Computers Work”
q Digital Design/Logic
q Computer Architecture
q 10 FPGA Lab Assignments
17
Comp Arch (Fall 2020)
n https://safari.ethz.ch/architecture/fall20
20/doku.php?id=schedule
n https://www.youtube.com/watch?v=c3
mPdZA-
Fmc&list=PL5Q2soXY2Zi9xidyIgBxUz7x
RPS-wisBN
18
Seminar (Spring’21)
n https://safari.ethz.ch/architecture_semin
ar/spring2021/doku.php?id=schedule
n https://www.youtube.com/watch?v=t3m
93ZpLOyw&list=PL5Q2soXY2Zi_awYdjm
WVIUegsbY7TPGW4
19
Hands-On Projects & Seminars Courses
n https://safari.ethz.ch/projects_and_seminars/doku.php
20
Principle: Insight and Ideas
Focus on Insight
Encourage New Ideas
21
Principle: Learning and Scholarship
Focus on
learning and scholarship
22
SAFARI Live Seminars (Past Talks)
https://safari.ethz.ch/safari-seminar-series/
SAFARI Live Seminars (Upcoming Talk)
https://safari.ethz.ch/safari-seminar-series/
Open-Source Artifacts
https://github.com/CMU-SAFARI
25
Open Source Tools: SAFARI GitHub
https://github.com/CMU-SAFARI/ 26
Some Open Source Tools (I)
n Rowhammer – Program to Induce RowHammer Errors
q https://github.com/CMU-SAFARI/rowhammer
n Ramulator – Fast and Extensible DRAM Simulator
q https://github.com/CMU-SAFARI/ramulator
n MemSim – Simple Memory Simulator
q https://github.com/CMU-SAFARI/memsim
n NOCulator – Flexible Network-on-Chip Simulator
q https://github.com/CMU-SAFARI/NOCulator
n SoftMC – FPGA-Based DRAM Testing Infrastructure
q https://github.com/CMU-SAFARI/SoftMC
27
Some Open Source Tools (II)
n MQSim – A Fast Modern SSD Simulator
q https://github.com/CMU-SAFARI/MQSim
n Mosaic – GPU Simulator Supporting Concurrent Applications
q https://github.com/CMU-SAFARI/Mosaic
n IMPICA – Processing in 3D-Stacked Memory Simulator
q https://github.com/CMU-SAFARI/IMPICA
n SMLA – Detailed 3D-Stacked Memory Simulator
q https://github.com/CMU-SAFARI/SMLA
n HWASim – Simulator for Heterogeneous CPU-HWA Systems
q https://github.com/CMU-SAFARI/HWASim
28
More Open Source Tools (III)
https://github.com/CMU-SAFARI/ 29
30
Papers, Talks, Videos, Artifacts
https://people.inf.ethz.ch/omutlu/projects.htm
http://scholar.google.com/citations?user=7XyGUGkAAAAJ&hl=en
https://www.youtube.com/onurmutlulectures
https://github.com/CMU-SAFARI/
31
Principle: Environment of Freedom
Create an environment
that values
free exploration,
openness, collaboration,
hard work, creativity 32
My Suggestions to You
Suggestion to Researchers: Principle: Passion
Build Infrastructure to
Enable Your Passion
Principle: Work Hard
Work Hard to
Enable Your Passion
Suggestion to Researchers: Principle: Resilience
Be Resilient
Principle: Learning and Scholarship
Focus on
learning and scholarship
Principle: Learning and Scholarship
41
An Interview on Research and Education
42
More Thoughts and Suggestions
n Onur Mutlu,
"Some Reflections (on DRAM)"
Award Speech for ACM SIGARCH Maurice Wilkes Award, at the ISCA Awards
Ceremony, Phoenix, AZ, USA, 25 June 2019.
[Slides (pptx) (pdf)]
[Video of Award Acceptance Speech (Youtube; 10 minutes) (Youku; 13 minutes)]
[Video of Interview after Award Acceptance (Youtube; 1 hour 6 minutes) (Youku;
1 hour 6 minutes)]
[News Article on "ACM SIGARCH Maurice Wilkes Award goes to Prof. Onur Mutlu"]
n Onur Mutlu,
"How to Build an Impactful Research Group"
57th Design Automation Conference Early Career Workshop (DAC), Virtual,
19 July 2020.
[Slides (pptx) (pdf)]
More Thoughts and Suggestions (II)
n Onur Mutlu,
"Computer Architecture: Why Is It So Important and Exciting Today?"
Invited Lecture at Izmir Institute of Technology (IYTE) , Virtual, 16 October
2020.
[Slides (pptx) (pdf)]
[Talk Video (2 hours 12 minutes)]
n Onur Mutlu,
"Applying to Graduate School & Doing Impactful Research"
Invited Panel Talk at the 3rd Undergraduate Mentoring Workshop, held with the
48th International Symposium on Computer Architecture (ISCA), Virtual, 18 June
2021.
[Slides (pptx) (pdf)]
[Talk Video (50 minutes)]
44
A Talk on Impactful Research & Teaching
45
https://www.youtube.com/watch?v=83tlorht7Mc&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=54
Required Reading
https://safari.ethz.ch/architecture/fall2021/lib/exe/fetch.php?media=youandyourresearch.pdf
46
How to Approach This Course
“Formative Experience”
47
How to Approach This Course
“High investment,
high return”
48
How to Approach This Course
“Recorded lectures
allowed me to go over
the lectures when
necessary”
49
How to Approach This Course
“YouTube allows me to
watch the lectures on
my TV”
50
How to Approach This Course
“Easy to understand
course format with
homework, labs, and
lectures”
53
How to Approach This Course
“Paper reviews +
assignments + labs,
a really great plan to
learn in a
comprehensive way”
54
How to Approach This Course
Learning experience
Long-term tradeoff
analysis
Critical thinking &
decision making
56
How to Approach This Course
Your mindset
will determine
what you
get out of the course
58
Required Reading on Mindset & More
https://safari.ethz.ch/architecture/fall2021/lib/exe/fetch.php?media=youandyourresearch.pdf
59
Required Reading on Mindset & More
https://safari.ethz.ch/architecture/fall2021/lib/exe/fetch.php?media=youandyourresearch.pdf
60
Why Study Computer
Architecture?
61
Computer Architecture
n is the science and art of designing computing platforms
(hardware, interface, system SW, and programming model)
63
Source: http://www.sia-online.org (semiconductor industry association)
Different Platforms, Different Goals
Source: https://iq.intel.com/5-awesome-uses-for-drone-technology/
64
Different Platforms, Different Goals
Source: https://taxistartup.com/wp-content/uploads/2015/03/UK-Self-Driving-Cars.jpg 65
Different Platforms, Different Goals
Source: http://sm.pcmag.com/pcmag_uk/photo/g/google-self-driving-car-the-guts/google-self-driving-car-the-guts_dwx8.jpg 66
Different Platforms, Different Goals
Source: http://datacentervoice.com/wp-content/uploads/2015/10/data-center.jpg
67
Different Platforms, Different Goals
Source: https://fossbytes.com/wp-content/uploads/2015/06/Supercomputer-TIANHE2-china.jpg 68
Different Platforms, Different Goals
Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, ISCA 2017.
69
Different Platforms, Different Goals
https://youtu.be/Ucp0TTmvqOE?t=4236 71
Different Platforms, Different Goals
n The largest ML
accelerator chip (2021)
n 850,000 cores
DRAM
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip CPU 1
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM-enabled
x10 memory
PIM-enabled Memory
PIM-enabled
Main Memory
memory
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
Chip Chip Chip Chip Chip Chip Chip Chip
x2
Host
CPU 1
DRAM
CPU 0
PIM PIM PIM PIM PIM PIM PIM PIM
Chip Chip Chip Chip Chip Chip Chip Chip
PIM-enabled
memory
74
https://arxiv.org/pdf/2105.03814.pdf
What is Computer Architecture?
75
The Transformation Hierarchy
Problem
Algorithm
Program/Language
System Software
Computer Architecture SW/HW Interface Computer Architecture
(expanded view) (narrow view)
Micro-architecture
Logic
Devices
Electrons
76
Why Study Computer Architecture?
n Enable better systems: make computers faster, cheaper,
smaller, more reliable, …
q By exploiting advances and changes in underlying technology/circuits
80
Axiom
To achieve the highest energy efficiency and performance:
Problem
Algorithm
Program/Language
System Software Co-design across the hierarchy:
SW/HW Interface Algorithms to devices
Micro-architecture
Logic Specialize as much as possible
Devices within the design goals
Electrons
81
Historical: Opportunities at the Bottom
https://en.wikipedia.org/wiki/There%27s_Plenty_of_Room_at_the_Bottom 82
Historical: Opportunities at the Bottom (II)
https://en.wikipedia.org/wiki/There%27s_Plenty_of_Room_at_the_Bottom 83
Historical: Opportunities at the Top
https://science.sciencemag.org/content/368/6495/eaam9744 84
Axiom, Revisited
when you
85
Hence the Expanded View
Problem
Algorithm
Program/Language
System Software
Computer Architecture SW/HW Interface
(expanded view)
Micro-architecture
Logic
Devices
Electrons
86
Some Cross-Layer Design Examples
(Foreshadowing)
87
EDEN: Data-Aware Efficient DNN Inference
n Skanda Koppula, Lois Orosa, A. Giray Yaglikci, Roknoddin Azizi, Taha Shahroodi,
Konstantinos Kanellopoulos, and Onur Mutlu,
"EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network
Inference Using Approximate DRAM"
Proceedings of the 52nd International Symposium on Microarchitecture (MICRO),
Columbus, OH, USA, October 2019.
[Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[Poster (pptx) (pdf)]
[Lightning Talk Video (90 seconds)]
[Full Talk Lecture (38 minutes)]
88
SMASH: SW/HW Indexing Acceleration
n Konstantinos Kanellopoulos, Nandita Vijaykumar, Christina Giannoula,
Roknoddin Azizi, Skanda Koppula, Nika Mansouri Ghiasi, Taha Shahroodi, Juan
Gomez-Luna, and Onur Mutlu,
"SMASH: Co-designing Software Compression and Hardware-
Accelerated Indexing for Efficient Sparse Matrix Operations"
Proceedings of the 52nd International Symposium on
Microarchitecture (MICRO), Columbus, OH, USA, October 2019.
[Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[Poster (pptx) (pdf)]
[Lightning Talk Video (90 seconds)]
[Full Talk Lecture (30 minutes)]
89
GenASM: HW/SW Approximate String Matching Accelerator
n Damla Senol Cali, Gurpreet S. Kalsi, Zulal Bingol, Can Firtina, Lavanya Subramanian, Jeremie S.
Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand,
Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu,
"GenASM: A High-Performance, Low-Power Approximate String Matching
Acceleration Framework for Genome Sequence Analysis"
Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual,
October 2020.
[Lighting Talk Video (1.5 minutes)]
[Lightning Talk Slides (pptx) (pdf)]
[Talk Video (18 minutes)]
[Slides (pptx) (pdf)]
90
SW/HW Climate Modeling Accelerator
n Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner, Juan
Gómez-Luna, Sander Stuijk, Onur Mutlu, and Henk Corporaal,
"NERO: A Near High-Bandwidth Memory Stencil Accelerator for
Weather Prediction Modeling"
Proceedings of the 30th International Conference on Field-Programmable Logic
and Applications (FPL), Gothenburg, Sweden, September 2020.
[Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[Talk Video (23 minutes)]
Nominated for the Stamatis Vassiliadis Memorial Award.
91
HW/SW Time Series Analysis Accelerator
n Ivan Fernandez, Ricardo Quislant, Christina Giannoula, Mohammed Alser, Juan
Gómez-Luna, Eladio Gutiérrez, Oscar Plata, and Onur Mutlu,
"NATSA: A Near-Data Processing Accelerator for Time Series Analysis"
Proceedings of the 38th IEEE International Conference on Computer
Design (ICCD), Virtual, October 2020.
[Slides (pptx) (pdf)]
[Talk Video (10 minutes)]
[Source Code]
92
FPGA-based Processing Near Memory
n Gagandeep Singh, Mohammed Alser, Damla Senol Cali, Dionysios
Diamantopoulos, Juan Gómez-Luna, Henk Corporaal, and Onur Mutlu,
"FPGA-based Near-Memory Acceleration of Modern Data-Intensive
Applications"
IEEE Micro (IEEE MICRO), 2021.
93
Accelerating Genome Analysis
n Mohammed Alser, Zulal Bingol, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can
Alkan, and Onur Mutlu,
"Accelerating Genome Analysis: A Primer on an Ongoing Journey"
IEEE Micro (IEEE MICRO), Vol. 40, No. 5, pages 65-75, September/October 2020.
[Slides (pptx)(pdf)]
[Talk Video (1 hour 2 minutes)]
94
Graph Processing Accelerator w/ PIM
n Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu,
and Kiyoung Choi,
"A Scalable Processing-in-Memory Accelerator for
Parallel Graph Processing"
Proceedings of the 42nd International Symposium on
Computer Architecture (ISCA), Portland, OR, June 2015.
[Slides (pdf)] [Lightning Session Slides (pdf)]
95
Processing in Memory for Mobile Workloads
n Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata
Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki
Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu,
"Google Workloads for Consumer Devices: Mitigating Data
Movement Bottlenecks"
Proceedings of the 23rd International Conference on Architectural
Support for Programming Languages and Operating
Systems (ASPLOS), Williamsburg, VA, USA, March 2018.
96
Accelerating Linked Data Structures
n Kevin Hsieh, Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali
Boroumand, Saugata Ghose, and Onur Mutlu,
"Accelerating Pointer Chasing in 3D-Stacked Memory:
Challenges, Mechanisms, Evaluation"
Proceedings of the 34th IEEE International Conference on Computer
Design (ICCD), Phoenix, AZ, USA, October 2016.
97
Expressive (Memory) Interfaces
n Nandita Vijaykumar, Abhilasha Jain, Diptesh Majumdar, Kevin Hsieh, Gennady
Pekhimenko, Eiman Ebrahimi, Nastaran Hajinazar, Phillip B. Gibbons and Onur Mutlu,
"A Case for Richer Cross-layer Abstractions: Bridging the Semantic Gap
with Expressive Memory"
Proceedings of the 45th International Symposium on Computer Architecture (ISCA),
Los Angeles, CA, USA, June 2018.
[Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)]
[Lightning Talk Video]
98
One Problem: Limited SW/HW Communication
99
A Solution: More Expressive Interfaces
100
X-MeM Aids Many Optimizations
Expressive (Memory) Interfaces for GPUs
n Nandita Vijaykumar, Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons and Onur Mutlu,
"The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express
Data Locality in GPUs"
Proceedings of the 45th International Symposium on Computer Architecture (ISCA),
Los Angeles, CA, USA, June 2018.
[Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)]
[Lightning Talk Video]
102
Heterogeneous-Reliability Memory
n Yixin Luo, Sriram Govindan, Bikash Sharma, Mark Santaniello, Justin Meza, Aman
Kansal, Jie Liu, Badriddine Khessib, Kushagra Vaid, and Onur Mutlu,
"Characterizing Application Memory Error Vulnerability to Optimize
Data Center Cost via Heterogeneous-Reliability Memory"
Proceedings of the 44th Annual IEEE/IFIP International Conference on
Dependable Systems and Networks (DSN), Atlanta, GA, June 2014. [Summary]
[Slides (pptx) (pdf)] [Coverage on ZDNet]
103
Exploiting Memory Error Tolerance
with Hybrid Memory Systems
Memory error vulnerability
Vulnerable Tolerant
data data
106
Many Interesting Things
Are Happening Today
in Computer Architecture
107
Many Interesting Things
Are Happening Today
in Computer Architecture
Performance
and
Energy Efficiency
108
Intel Optane Persistent Memory (2019)
109
https://www.storagereview.com/intel_optane_dc_persistent_memory_module_pmm
PCM as Main Memory: Idea in 2009
n Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger,
"Architecting Phase Change Memory as a Scalable DRAM
Alternative"
Proceedings of the 36th International Symposium on Computer
Architecture (ISCA), pages 2-13, Austin, TX, June 2009. Slides
(pdf)
110
PCM as Main Memory: Idea in 2009
n Benjamin C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao,
Engin Ipek, Onur Mutlu, and Doug Burger,
"Phase Change Technology and the Future of Main Memory"
IEEE Micro, Special Issue: Micro's Top Picks from 2009 Computer
Architecture Conferences (MICRO TOP PICKS), Vol. 30, No. 1,
pages 60-70, January/February 2010.
111
Cerebras’s Wafer Scale Engine (2019)
n The largest ML
accelerator chip
n 400,000 cores
n The largest ML
accelerator chip (2021)
n 850,000 cores
https://www.anandtech.com/show/14750/hot-chips-31-analysis-inmemory-processing-by-upmem
114
https://www.upmem.com/video-upmem-presenting-its-true-processing-in-memory-solution-hot-chips-2019/
UPMEM Memory Modules
• E19: 8 chips DIMM (1 rank). DPUs @ 267 MHz
• P21: 16 chips DIMM (2 ranks). DPUs @ 350 MHz
www.upmem.com 115
2,560-DPU Processing-in-Memory System
Main Memory
DRAM
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip CPU 1
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM-enabled
x10 memory
PIM-enabled Memory
PIM-enabled
Main Memory
memory
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
Chip Chip Chip Chip Chip Chip Chip Chip
x2
Host
CPU 1
DRAM
CPU 0
PIM PIM PIM PIM PIM PIM PIM PIM
Chip Chip Chip Chip Chip Chip Chip Chip
PIM-enabled
memory
116
https://arxiv.org/pdf/2105.03814.pdf
More on the UPMEM PIM System
https://www.youtube.com/watch?v=Sscy1Wrr22A&list=PL5Q2soXY2Zi9xidyIgBxUz7xRPS-wisBN&index=26
Experimental Analysis of the UPMEM PIM Engine
https://arxiv.org/pdf/2105.03814.pdf
Understanding a Modern PIM Architecture
https://www.youtube.com/watch?v=D8Hjy2iU9l4&list=PL5Q2soXY2Zi_tOTAYm--dYByNPL7JhwR9 119
More on Analysis of the UPMEM PIM Engine
https://www.youtube.com/watch?v=D8Hjy2iU9l4&list=PL5Q2soXY2Zi_tOTAYm--dYByNPL7JhwR9
More on Analysis of the UPMEM PIM Engine
https://www.youtube.com/watch?v=Pp9jSU2b9oM&list=PL5Q2soXY2Zi8_VVChACnON4sfh2bJ5IrD&index=159
FPGA-based Processing Near Memory
n Gagandeep Singh, Mohammed Alser, Damla Senol Cali, Dionysios
Diamantopoulos, Juan Gómez-Luna, Henk Corporaal, and Onur Mutlu,
"FPGA-based Near-Memory Acceleration of Modern Data-Intensive
Applications"
IEEE Micro (IEEE MICRO), to appear, 2021.
122
Samsung Function-in-Memory DRAM (2021)
https://news.samsung.com/global/samsung-develops-industrys-first-high-bandwidth-memory-with-ai-processing-power 123
Samsung Function-in-Memory DRAM (2021)
124
Samsung Function-in-Memory DRAM (2021)
125
Samsung Function-in-Memory DRAM (2021)
126
Samsung Function-in-Memory DRAM (2021)
127
Samsung AxDIMM (2021)
n DDR5-PIM Baseline System
AxDIMM System
Ke et al. "Near-Memory Processing in Action: Accelerating Personalized Recommendation with AxDIMM", IEEE Micro (2021) 128
Processing in Memory:
Two Approaches
129
Specialized Processing in Memory (2015)
n Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu,
and Kiyoung Choi,
"A Scalable Processing-in-Memory Accelerator for
Parallel Graph Processing"
Proceedings of the 42nd International Symposium on
Computer Architecture (ISCA), Portland, OR, June 2015.
[Slides (pdf)] [Lightning Session Slides (pdf)]
130
Simple Processing in Memory (2015)
n Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi,
"PIM-Enabled Instructions: A Low-Overhead,
Locality-Aware Processing-in-Memory Architecture"
Proceedings of the 42nd International Symposium on
Computer Architecture (ISCA), Portland, OR, June 2015.
[Slides (pdf)] [Lightning Session Slides (pdf)]
Processing in Memory on Mobile Devices
n Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata
Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki
Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu,
"Google Workloads for Consumer Devices: Mitigating Data
Movement Bottlenecks"
Proceedings of the 23rd International Conference on Architectural
Support for Programming Languages and Operating
Systems (ASPLOS), Williamsburg, VA, USA, March 2018.
132
Efficient Synchronization for NDP
n Christina Giannoula, Nandita Vijaykumar, Nikela Papadopoulou,
Vasileios Karakostas, Ivan Fernandez, Juan Gómez-Luna, Lois Orosa,
Nectarios Koziris, Georgios Goumas, and Onur Mutlu,
"SynCron: Efficient Synchronization Support for Near-Data-
Processing Architectures"
Proceedings of the 27th International Symposium on High-Performance
Computer Architecture (HPCA), Virtual, February-March 2021.
133
Accelerating GPU Execution with PIM (I)
n Kevin Hsieh, Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike
O'Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W. Keckler,
"Transparent Offloading and Mapping (TOM): Enabling
Programmer-Transparent Near-Data Processing in GPU
Systems"
Proceedings of the 43rd International Symposium on Computer
Architecture (ISCA), Seoul, South Korea, June 2016.
[Slides (pptx) (pdf)]
[Lightning Session Slides (pptx) (pdf)]
134
Accelerating Linked Data Structures
n Kevin Hsieh, Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali
Boroumand, Saugata Ghose, and Onur Mutlu,
"Accelerating Pointer Chasing in 3D-Stacked Memory:
Challenges, Mechanisms, Evaluation"
Proceedings of the 34th IEEE International Conference on Computer
Design (ICCD), Phoenix, AZ, USA, October 2016.
135
DAMOV Analysis Methodology & Workloads
https://arxiv.org/pdf/2105.03725.pdf
More on DAMOV Analysis Methodology & Workloads
https://www.youtube.com/watch?v=GWideVyo0nM&list=PL5Q2soXY2Zi_tOTAYm--dYByNPL7JhwR9&index=3
DAMOV is Open-Source
• We open-source our benchmark suite and our toolchain
DAMOV-SIM
DAMOV
Benchmarks
44
More on DAMOV
139
Processing in Memory:
Two Approaches
140
In-DRAM Processing (2013)
n Vivek Seshadri et al., “Ambit: In-Memory Accelerator
for Bulk Bitwise Operations Using Commodity DRAM
Technology,” MICRO 2017.
141
In-DRAM Bulk Bitwise Execution (2017)
n Vivek Seshadri and Onur Mutlu,
"In-DRAM Bulk Bitwise Execution Engine"
Invited Book Chapter in Advances in Computers, to appear
in 2020.
[Preliminary arXiv version]
142
SIMDRAM Framework
n Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, Joao Dinis Ferreira, Nika Mansouri
Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gomez-Luna, and Onur Mutlu,
"SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM"
Proceedings of the 26th International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS), Virtual, March-April 2021.
[2-page Extended Abstract]
[Short Talk Slides (pptx) (pdf)]
[Talk Slides (pptx) (pdf)]
[Short Talk Video (5 mins)]
[Full Talk Video (27 mins)]
143
Bulk Data Copy and Initialization in DRAM
n Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata
Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Michael A.
Kozuch, Phillip B. Gibbons, and Todd C. Mowry,
"RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and
Initialization"
Proceedings of the 46th International Symposium on Microarchitecture
(MICRO), Davis, CA, December 2013. [Slides (pptx) (pdf)] [Lightning Session
Slides (pptx) (pdf)] [Poster (pptx) (pdf)]
144
LISA: Increasing Connectivity in DRAM
n Kevin K. Chang, Prashant J. Nair, Saugata Ghose, Donghyuk Lee,
Moinuddin K. Qureshi, and Onur Mutlu,
"Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast
Inter-Subarray Data Movement in DRAM"
Proceedings of the 22nd International Symposium on High-
Performance Computer Architecture (HPCA), Barcelona, Spain,
March 2016.
[Slides (pptx) (pdf)]
[Source Code]
145
FIGARO: Fine-Grained In-DRAM Copy
n Yaohua Wang, Lois Orosa, Xiangjun Peng, Yang Guo, Saugata Ghose,
Minesh Patel, Jeremie S. Kim, Juan Gómez Luna, Mohammad
Sadrosadati, Nika Mansouri Ghiasi, and Onur Mutlu,
"FIGARO: Improving System Performance via Fine-Grained In-
DRAM Data Relocation and Caching"
Proceedings of the 53rd International Symposium on
Microarchitecture (MICRO), Virtual, October 2020.
146
Network-On-Memory: Fast Inter-Bank Copy
147
In-DRAM Physical Unclonable Functions
n Jeremie S. Kim, Minesh Patel, Hasan Hassan, and Onur Mutlu,
"The DRAM Latency PUF: Quickly Evaluating Physical Unclonable
Functions by Exploiting the Latency-Reliability Tradeoff in Modern DRAM
Devices"
Proceedings of the 24th International Symposium on High-Performance Computer
Architecture (HPCA), Vienna, Austria, February 2018.
[Lightning Talk Video]
[Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)]
[Full Talk Lecture Video (28 minutes)]
148
In-DRAM True Random Number Generation
n Jeremie S. Kim, Minesh Patel, Hasan Hassan, Lois Orosa, and Onur Mutlu,
"D-RaNGe: Using Commodity DRAM Devices to Generate True Random
Numbers with Low Latency and High Throughput"
Proceedings of the 25th International Symposium on High-Performance Computer
Architecture (HPCA), Washington, DC, USA, February 2019.
[Slides (pptx) (pdf)]
[Full Talk Video (21 minutes)]
[Full Talk Lecture Video (27 minutes)]
Top Picks Honorable Mention by IEEE Micro.
149
Processing in Memory:
Two Approaches
150
PIM Review and Open Problems
151
https://arxiv.org/pdf/1903.03988.pdf
152
153
PIM Review and Open Problems (II)
Saugata Ghose, Amirali Boroumand, Jeremie S. Kim, Juan Gomez-Luna, and Onur Mutlu,
"Processing-in-Memory: A Workload-Driven Perspective"
Invited Article in IBM Journal of Research & Development, Special Issue on
Hardware for Artificial Intelligence, to appear in November 2019.
[Preliminary arXiv version]
154
https://arxiv.org/pdf/1907.12947.pdf
A Tutorial on PIM
n Onur Mutlu,
"Memory-Centric Computing Systems"
Invited Tutorial at 66th International Electron Devices
Meeting (IEDM), Virtual, 12 December 2020.
[Slides (pptx) (pdf)]
[Executive Summary Slides (pptx) (pdf)]
[Tutorial Video (1 hour 51 minutes)]
[Executive Summary Video (2 minutes)]
[Abstract and Bio]
[Related Keynote Paper from VLSI-DAT 2020]
[Related Review Paper on Processing in Memory]
https://www.youtube.com/watch?v=H3sEaINPBOE
https://www.youtube.com/onurmutlulectures 155
A Tutorial on PIM
n Onur Mutlu,
"Memory-Centric Computing Systems"
Invited Tutorial at 66th International Electron Devices
Meeting (IEDM), Virtual, 12 December 2020.
[Slides (pptx) (pdf)]
[Executive Summary Slides (pptx) (pdf)]
[Tutorial Video (1 hour 51 minutes)]
[Executive Summary Video (2 minutes)]
[Abstract and Bio]
[Related Keynote Paper from VLSI-DAT 2020]
[Related Review Paper on Processing in Memory]
https://www.youtube.com/watch?v=H3sEaINPBOE
https://www.youtube.com/onurmutlulectures 156
Detailed Lectures on PIM (I)
n Computer Architecture, Fall 2020, Lecture 6
q Computation in Memory (ETH Zürich, Fall 2020)
q https://www.youtube.com/watch?v=oGcZAGwfEUE&list=PL5Q2soXY2Zi9xidyIgBxUz
7xRPS-wisBN&index=12
q https://www.youtube.com/watch?v=j2GIigqn1Qw&list=PL5Q2soXY2Zi9xidyIgBxUz7
xRPS-wisBN&index=13
q https://www.youtube.com/watch?v=TeG773OgiMQ&list=PL5Q2soXY2Zi9xidyIgBxUz
7xRPS-wisBN&index=20
q https://www.youtube.com/watch?v=Sscy1Wrr22A&list=PL5Q2soXY2Zi9xidyIgBxUz7
xRPS-wisBN&index=25
https://www.youtube.com/onurmutlulectures 157
Detailed Lectures on PIM (II)
n Computer Architecture, Fall 2020, Lecture 15
q Emerging Memory Technologies (ETH Zürich, Fall 2020)
q https://www.youtube.com/watch?v=AlE1rD9G_YU&list=PL5Q2soXY2Zi9xidyIgBxUz
7xRPS-wisBN&index=28
https://www.youtube.com/onurmutlulectures 158
Many Interesting Things
Are Happening Today
in Computer Architecture
Performance
and
Energy Efficiency
159
TESLA Full Self-Driving Computer (2019)
n ML accelerator: 260 mm2, 6 billion transistors,
600 GFLOPS GPU, 12 ARM 2.2 GHz CPUs.
n Two redundant chips for better safety.
https://youtu.be/Ucp0TTmvqOE?t=4236 160
Google TPU Generation I (~2016)
Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, ISCA 2017.
161
Google TPU Generation II (2017)
4 TPU chips
vs 1 chip in TPU1
162
Google TPU Generation III (2019)
32GB HBM per chip 4 Matrix Units per chip 90 TFLOPS per chip
vs 16GB HBM in TPU2 vs 2 Matrix Units in TPU2 vs 45 TFLOPS in TPU2
https://cloud.google.com/tpu/docs/system-architecture 163
Google TPU Generation IV (2021)
Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, ISCA 2017.
165
An Example Modern Systolic Array: TPU (III)
166
Many (Other) AI/ML Chips
n Alibaba
n Amazon
n Facebook
n Google
n Huawei
n Intel
n Microsoft
n NVIDIA
n Tesla
n Many Others and Many Startups…
168
https://basicmi.github.io/AI-Chip/
Many (Other) AI/ML Chips (2021)
n Alibaba
n Amazon
n Facebook
n Google
n Huawei
n Microsoft
n NVIDIA
n Tesla
n Many Startups…
169
https://basicmi.github.io/AI-Chip/
Computer Architecture
Lecture 1: Introduction and Basics
171
Many Interesting Things
Are Happening Today
in Computer Architecture
172
Many Interesting Things
Are Happening Today
in Computer Architecture
Reliability
Security
Safety
173
Security: RowHammer (2014)
174
The Story of RowHammer
n One can predictably induce bit flips in commodity DRAM chips
q >80% of the tested DRAM chips are vulnerable
175
Modern DRAM is Prone to Disturbance Errors
Up to Up to Up to
1.0×107 2.7×106 3.3×105
errors errors errors
Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM
Disturbance Errors, (Kim et al., ISCA 2014) 177
One Can Take Over an Otherwise-Secure System
178
Security: RowHammer (2014)
179
More Security Implications (I)
“We can gain unrestricted access to systems of website visitors.”
182
More Security Implications (IV)
n Rowhammer over RDMA (I)
183
More Security Implications (V)
n Rowhammer over RDMA (II)
184
More Security Implications (VI)
n IEEE S&P 2020
More Security Implications (VII)
n USENIX Security 2019
More Security Implications (VIII)
n USENIX Security 2020
RowHammer: Seven Years Ago…
n Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk
Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu,
"Flipping Bits in Memory Without Accessing Them: An
Experimental Study of DRAM Disturbance Errors"
Proceedings of the 41st International Symposium on Computer
Architecture (ISCA), Minneapolis, MN, June 2014.
[Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)] [Source Code
and Data]
188
RowHammer: 2019 and Beyond…
n Onur Mutlu and Jeremie Kim,
"RowHammer: A Retrospective"
IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems (TCAD) Special Issue on Top Picks in Hardware and
Embedded Security, 2019.
[Preliminary arXiv version]
[Slides from COSADE 2019 (pptx)]
[Slides from VLSI-SOC 2020 (pptx) (pdf)]
[Talk Video (1 hr 15 minutes, with Q&A)]
189
RowHammer in 2020
RowHammer in 2020 (I)
n Jeremie S. Kim, Minesh Patel, A. Giray Yaglikci, Hasan Hassan,
Roknoddin Azizi, Lois Orosa, and Onur Mutlu,
"Revisiting RowHammer: An Experimental Analysis of Modern
Devices and Mitigation Techniques"
Proceedings of the 47th International Symposium on Computer
Architecture (ISCA), Valencia, Spain, June 2020.
[Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[Talk Video (20 minutes)]
[Lightning Talk Video (3 minutes)]
191
Key Takeaways from 1580 Chips
• Newer DRAM chips are more vulnerable to
RowHammer
• There are chips today whose weakest cells fail after only
4800 hammers
192
RowHammer in 2020 (II)
n Pietro Frigo, Emanuele Vannacci, Hasan Hassan, Victor van der Veen, Onur Mutlu,
Cristiano Giuffrida, Herbert Bos, and Kaveh Razavi,
"TRRespass: Exploiting the Many Sides of Target Row Refresh"
Proceedings of the 41st IEEE Symposium on Security and Privacy (S&P), San Francisco,
CA, USA, May 2020.
[Slides (pptx) (pdf)]
[Lecture Slides (pptx) (pdf)]
[Talk Video (17 minutes)]
[Lecture Video (59 minutes)]
[Source Code]
[Web Article]
Best paper award.
Pwnie Award 2020 for Most Innovative Research. Pwnie Awards 2020
193
RowHammer in 2020 (III)
n Lucian Cojocar, Jeremie Kim, Minesh Patel, Lillian Tsai, Stefan Saroiu,
Alec Wolman, and Onur Mutlu,
"Are We Susceptible to Rowhammer? An End-to-End
Methodology for Cloud Providers"
Proceedings of the 41st IEEE Symposium on Security and
Privacy (S&P), San Francisco, CA, USA, May 2020.
[Slides (pptx) (pdf)]
[Talk Video (17 minutes)]
194
BlockHammer Solution in 2021
n A. Giray Yaglikci, Minesh Patel, Jeremie S. Kim, Roknoddin Azizi, Ataberk Olgun,
Lois Orosa, Hasan Hassan, Jisung Park, Konstantinos Kanellopoulos, Taha
Shahroodi, Saugata Ghose, and Onur Mutlu,
"BlockHammer: Preventing RowHammer at Low Cost by Blacklisting
Rapidly-Accessed DRAM Rows"
Proceedings of the 27th International Symposium on High-Performance
Computer Architecture (HPCA), Virtual, February-March 2021.
[Slides (pptx) (pdf)]
[Short Talk Slides (pptx) (pdf)]
[Talk Video (22 minutes)]
[Short Talk Video (7 minutes)]
195
Detailed Lectures on RowHammer
n Computer Architecture, Fall 2020, Lecture 4b
q RowHammer (ETH Zürich, Fall 2020)
q https://www.youtube.com/watch?v=KDy632z23UE&list=PL5Q2soXY2Zi9xidyIgBxUz
7xRPS-wisBN&index=8
n Computer Architecture, Fall 2020, Lecture 5a
q RowHammer in 2020: TRRespass (ETH Zürich, Fall 2020)
q https://www.youtube.com/watch?v=pwRw7QqK_qA&list=PL5Q2soXY2Zi9xidyIgBxU
z7xRPS-wisBN&index=9
n Computer Architecture, Fall 2020, Lecture 5b
q RowHammer in 2020: Revisiting RowHammer (ETH Zürich, Fall 2020)
q https://www.youtube.com/watch?v=gR7XR-
Eepcg&list=PL5Q2soXY2Zi9xidyIgBxUz7xRPS-wisBN&index=10
n Computer Architecture, Fall 2020, Lecture 5c
q Secure and Reliable Memory (ETH Zürich, Fall 2020)
q https://www.youtube.com/watch?v=HvswnsfG3oQ&list=PL5Q2soXY2Zi9xidyIgBxUz
7xRPS-wisBN&index=11
https://www.youtube.com/onurmutlulectures 196
The Story of RowHammer Lecture …
n Onur Mutlu,
"The Story of RowHammer"
Keynote Talk at Secure Hardware, Architectures, and Operating Systems
Workshop (SeHAS), held with HiPEAC 2021 Conference, Virtual, 19 January 2021.
[Slides (pptx) (pdf)]
[Talk Video (1 hr 15 minutes, with Q&A)]
197
Two Upcoming RowHammer Papers at MICRO 2021
n Lois Orosa, Abdullah Giray Yaglikci, Haocong Luo, Ataberk Olgun, Jisung
Park, Hasan Hassan, Minesh Patel, Jeremie S. Kim, Onur Mutlu,
"A Deeper Look into RowHammer's Sensitivities: Experimental
Analysis of Real DRAM Chips and Implications on Future Attacks
and Defenses"
MICRO 2021
199
Two Upcoming RowHammer Papers at MICRO 2021
n Hasan Hassan, Yahya Can Tugrul, Jeremie S. Kim, Victor van der Veen,
Kaveh Razavi, Onur Mutlu,
"Uncovering In-DRAM RowHammer Protection Mechanisms: A
New Methodology, Custom RowHammer Patterns, and
Implications"
MICRO 2021
200
TRRespass Key Takeaways
RowHammer is still
an open problem
Security by obscurity
is likely not a good solution
201
Security: Meltdown and Spectre (2018)
n Why?
q Speculative execution leaves traces of secret data in the
processor’s cache (internal storage)
n It brings data that is not supposed to be brought/accessed if there
was no speculative execution
q A malicious program can inspect the contents of the cache to
“infer” secret data that it is not supposed to access
q A malicious program can actually force another program to
speculatively execute code that leaves traces of secret data
203
More on Meltdown/Spectre Vulnerabilities
Source: https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html
204
Many Interesting Things
Are Happening Today
in Computer Architecture
205
Many Interesting Things
Are Happening Today
in Computer Architecture
206
Increasingly Demanding Applications
Dream
207
New Genome Sequencing Technologies
208
Why Do We Care? An Example
209
Source: https://nanoporetech.com/about-us/news/200-oxford-nanopore-sequencers-have-left-uk-china-support-rapid-near-sample
Population-Scale Microbiome Profiling
https://blog.wego.com/7-crowded-places-and-events-that-you-will-love/ 210
City-Scale Microbiome Profiling
Quick+, “Real-time, portable genome sequencing for Ebola surveillance”, Nature, 2016
212
High-Throughput Genome Sequencers
Oxford
Nanopore
PromethION
Pacific
Biosciences
Illumina MiSeq
Sequel II
Oxford
Nanopore
SmidgION
Illumina NovaSeq 6000 Pacific Biosciences RS II
… and more! All produce data with different properties.
213
High-Throughput Genome Sequencers
Mohammed Alser, Zülal Bingöl, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, Onur Mutlu
“Accelerating Genome Analysis: A Primer on an Ongoing Journey” IEEE Micro, August 2020.
Number of Genomes
Sequenced
http://www.economist.com/news/21631808-so-much-genetic-data-so-many-uses-genes-unzipped 215
C CA TC AT TT AA AT
G C AC
A C G
T
C 0 1 2
AA 1 0 1 2
CC 2 1 0 1 2
TT 2 1 0 1 2
Billions of Short Reads AA 2 1 2 1 2
AA 3 2 2 2 2
CT 4 4 3 2
GT 5 4 3
217
Shifted Hamming Distance: SIMD Acceleration
https://github.com/CMU-SAFARI/Shifted-Hamming-Distance
218
GateKeeper: FPGA-Based Alignment Filtering
1
st
Alignment
Filter FPGA-based
Alignment Filter.
Low Speed & High Accuracy
Medium Speed, Medium Accuracy
High Speed, Low Accuracy
C 0 1 2
AA 1 0 1 2
CC 2 1 0 1 2
TT 2 1 0 1 2
AA 2 1 2 1 2
TG 2 2 2 1 2
AA 3 2 2 2 2
TA 3 3 3 2 3
AC 4 3 3 2 3
CT 4 4 3 2
220
In-Memory DNA Sequence Analysis
n Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan
Hassan, Oguz Ergin, Can Alkan, and Onur Mutlu,
"GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-
Memory Technologies"
BMC Genomics, 2018.
Proceedings of the 16th Asia Pacific Bioinformatics Conference (APBC), Yokohama, Japan, January
2018.
[Slides (pptx) (pdf)]
[Source Code]
[arxiv.org Version (pdf)]
[Talk Video at AACBB 2019]
221
Shouji (障子) [Alser+, Bioinformatics 2019]
Mohammed Alser, Hasan Hassan, Akash Kumar, Onur Mutlu, and Can Alkan,
"Shouji: A Fast and Efficient Pre-Alignment Filter for Sequence Alignment"
Bioinformatics, [published online, March 28], 2019.
[Source Code]
[Online link at Bioinformatics Journal]
222
SneakySnake [Alser+, Bioinformatics 2020]
Mohammed Alser, Taha Shahroodi, Juan-Gomez Luna, Can Alkan, and Onur Mutlu,
"SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment
Filter for CPUs, GPUs, and FPGAs"
Bioinformatics, to appear in 2020.
[Source Code]
[Online link at Bioinformatics Journal]
223
GenASM Framework [MICRO 2020]
n Damla Senol Cali, Gurpreet S. Kalsi, Zulal Bingol, Can Firtina, Lavanya Subramanian, Jeremie S.
Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand,
Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu,
"GenASM: A High-Performance, Low-Power Approximate String Matching
Acceleration Framework for Genome Sequence Analysis"
Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual,
October 2020.
[Lighting Talk Video (1.5 minutes)]
[Lightning Talk Slides (pptx) (pdf)]
[Talk Video (18 minutes)]
[Slides (pptx) (pdf)]
224
Future of Genome Sequencing & Analysis
Mohammed Alser, Zülal Bingöl, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, Onur Mutlu
“Accelerating Genome Analysis: A Primer on an Ongoing Journey” IEEE Micro, August 2020.
228
More on Fast Genome Analysis …
n Onur Mutlu,
"Accelerating Genome Analysis: A Primer on an Ongoing Journey"
Invited Lecture at Technion, Virtual, 26 January 2021.
[Slides (pptx) (pdf)]
[Talk Video (1 hour 37 minutes, including Q&A)]
[Related Invited Paper (at IEEE Micro, 2020)]
229
Detailed Lectures on Genome Analysis
n Computer Architecture, Fall 2020, Lecture 3a
q Introduction to Genome Sequence Analysis (ETH Zürich, Fall 2020)
q https://www.youtube.com/watch?v=CrRb32v7SJc&list=PL5Q2soXY2Zi9xidyIgBxUz7
xRPS-wisBN&index=5
q https://www.youtube.com/watch?v=rgjl8ZyLsAg&list=PL5Q2soXY2Zi9E2bBVAgCqL
gwiDRQDTyId
https://www.youtube.com/onurmutlulectures 230
Many Interesting Things
Are Happening Today
in Computer Architecture
231
The Problem
Computing
is Bottlenecked by Data
232
Data is Key for AI, ML, Genomics, …
n Data is increasing
q We can generate more than we can process
233
Data is Key for Future Workloads
238
Data Movement vs. Computation Energy
240
Many Novel Concepts Investigated Today
n New Computing Paradigms (Rethinking the Full Stack)
q Processing in Memory, Processing Near Data
q Neuromorphic Computing
q Fundamentally Secure and Dependable Computers
Dream
242
Increasingly Diverging/Complex Tradeoffs
243
Increasingly Diverging/Complex Tradeoffs
Past systems
245
Increasingly Complex Systems
FPGAs
Modern systems
246
Computer Architecture Today
n Computing landscape is very different from 10-20 years ago
247
Computer Architecture Today (II)
n You can revolutionize the way computers are built, if you
understand both the hardware and the software (and
change each accordingly)
248
Computer Architecture Today (II)
n You can revolutionize the way computers are built, if you
understand both the hardware and the software (and
change each accordingly)
249
Takeaways
n It is an exciting time to be understanding and designing
computing architectures
250
Let’s Start with Some Fundamentals
251
Question: What Is This?
255
Source: https://www.dezeen.com/2016/08/29/santiago-calatrava-oculus-world-trade-center-transportation-hub-new-york-photographs-hufton-crow/
Answer: Masterpiece of a Famous Architect
Susannah Maidment et al. & Natural History Museum, London - Maidment SCR, Brassey C, Barrett PM (2015)
The Postcranial Skeleton of an Exceptionally Complete Individual of the Plated Dinosaur Stegosaurus stenops
(Dinosauria: Thyreophora) from the Upper Jurassic Morrison Formation of Wyoming, U.S.A. PLoS ONE 10(10):
e0138352. doi:10.1371/journal.pone.0138352
259
Design Constraints: Noone is Immune
261
262
Answer: Masterpiece of Another Famous Architect
265
Find The Differences of
This and That
266
This
Source: By Toni_V from Zurich, Switzerland - Stadelhofen2, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=4087256 267
That
269
Aside: Evaluation Criteria for the Designs
n Functionality (Does it meet the specification?)
n Reliability
n Space requirement
n Cost
n Expandability
n Comfort level of users
n Happiness level of users
n Aesthetics
n …
Source: By Martín Gómez Tagle - Lisbon, Portugal, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=13764903 273
Source: http://www.arcspace.com/exhibitions/unsorted/santiago-calatrava/
A Principled Design
274
What Does This Remind You Of?
280
Source: http://www.architectmagazine.com/design/buildings/florida-polytechnic-university-designed-by-santiago-calatrava_o
Oculus, New York City
281
Source: https://www.dezeen.com/2016/08/29/santiago-calatrava-oculus-world-trade-center-transportation-hub-new-york-photographs-hufton-crow/
A Quote from The Other Famous Architect
n “architecture […] based upon principle, and not upon
precedent” (Frank Lloyd Wright)
283
Another View
n In Computer Architecture
287
Role of the (Computer) Architect
Algorithm Problem
n Also see:
q Hamming, “You and Your Research,” Talk at Bell Labs, 1986.
q http://www.cs.virginia.edu/~robins/YouAndYourResearch.html
293
Levels of Transformation, Revisited
n A user-centric view: computer designed for users
Problem
Algorithm
Program/Language User
Runtime System
(VM, OS, MM)
ISA
Microarchitecture
Logic
Devices
Electrons
295
Crossing the Abstraction Layers
n As long as everything goes well, not knowing what happens
underneath (or above) is not a problem.
n What if
q The program you wrote is running slow?
q The program you wrote does not run correctly?
q The program you wrote consumes too much energy?
q Your system just shut down and you have no idea why?
q Someone just compromised your system and you have no idea how?
n What if
q The hardware you designed is too hard to program?
q The hardware you designed is too slow because it does not provide the
right primitives to the software?
n What if
q You want to design a much more efficient and higher performance system?
296
Crossing the Abstraction Layers
n Two key goals of this course are
297
An Example: Multi-Core Systems
Multi-Core
Chip
L2 CACHE 1
L2 CACHE 0
SHARED L3 CACHE
DRAM INTERFACE
CORE 0 CORE 1
DRAM BANKS
DRAM MEMORY
CONTROLLER
L2 CACHE 2
L2 CACHE 3
CORE 2 CORE 3
47%
15%
Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. 299
Computer Architecture
Lecture 1: Introduction and Basics