SlideShare a Scribd company logo
How shit works:
the CPU
Tomer Gabel
BuildStuff 2016 Lithuania
Image: Telecarlos (CC BY-SA 3.0)
Full Disclosure
Bullshit ahead!
• I’m not an expert
• Explanations may be:
– Simplified
– Inaccurate
– Wrong :-)
• We’ll barely scratch the
surface
Image: Public Domain
A CONUNDRUM?
Are you ready for…
Image: Louis Reed (CC BY-SA 4.0)
Setting the Stage
// Generate a bunch of bytes
byte[] data = new byte[32768];
new Random().nextBytes(data);
Arrays.sort(data);
// Sum positive elements
long sum = 0;
for (int i = 0; i < data.length; i++)
if (data[i] >= 0)
sum += data[i];
1. Which is faster?
2. By how much?
3. And crucially…
why?!
# Run complete. Total time: 00:00:32
Benchmark Mode Cnt Score Error Units
Baseline.sum avgt 6 115.666 ± 3.137 us/op
Presorted.sum avgt 6 13.741 ± 0.524 us/op
Surprise, Terror and Ruthless Efficiency
# Run complete. Total time: 00:00:32
Benchmark Mode Cnt Error Units
Baseline.sum avgt 6 ± 3.137 us/op
Presorted.sum avgt 6 ± 0.524 us/op
* Ignoring setup cost
CPUS ARE
COMPLEX
BEASTS.
Image: Pauli Rautakorpi (CC BY 3.0)
It Is Known
• Your high-level code…
long sum = 0;
for (i = 0; i < length; i++)
if (data[i] >= 0)
sum += data[i];
• Gets compiled down to…
movsx eax,BYTE PTR [rax+rdx*1+0x10]
cmp eax,0x0
movabs rdx,0x11f3a9f60
movabs rcx,0x128
jl 0x000000010679e077
movabs rcx,0x138
mov r8,QWORD PTR [rdx+rcx*1]
lea r8,[r8+0x1]
mov QWORD PTR [rdx+rcx*1],r8
jl 0x000000010679e092
movsxd rax,eax
add rax,rbx
mov rbx,rax
inc edi
It Is Less Known
• What happens then?
• The instruction goes through phases…
Fetch Decode Execute
Memory
Access
Write-
back
Instruction
Stream
CPU Architecture 101
Image: Appaloosa (CC BY-SA 3.0)
CPU Architecture 101
• What does a CPU do?
– Reads the program
CPU Architecture 101
• What does a CPU do?
– Reads the program
– Figures it out
CPU Architecture 101
• What does a CPU do?
– Reads the program
– Figures it out
– Executes it
CPU Architecture 101
• What does a CPU do?
– Reads the program
– Figures it out
– Executes it
– Talks to memory
CPU Architecture 101
• What does a CPU do?
– Reads the program
– Figures it out
– Executes it
– Talks to memory
– Performs I/O
CPU Architecture 101
• What does a CPU do?
– Reads the program
– Figures it out
– Executes it
– Talks to memory
– Performs I/O
• Immense complexity!
Execution Units
• Arithmetic-Logic Unit (ALU)
– Boolean algebra
– Arithmetic
– Memory accesses
– Flow control
• Floating Point Unit (FPU)
• Memory Management Unit (MMU)
– Memory mapping
– Paging
– Access control
Images: ALU by Dirk Oppelt (CC BY-SA 3.0), FPU by Konstantin Lanzet (CC BY-SA 3.0), MMU from unknown source
DESIGN
CONSIDERATIONS
Image: William M. Plate Jr. (Public Domain)
Fetch Decode Execute
Memory
Access
Write-
back
Fetch Decode Execute
Memory
Access
Write-
back
Fetch Decode Execute
Memory
Access
Write-
back
I1
I0
I2
Pipelining
Sequential Execution
Latency = 5 cycles
Throughput= 0.2 ops / cycle
Fetch Decode Execute
Memory
Access
Write-
back
I1
I0
I2
Fetch Decode Execute
Memory
Access
Fetch Decode Execute
Pipelining
Sequential Execution Pipelined Execution
Latency = 5 cycles
Throughput= 0.2 ops / cycle
Latency = 5 cycles
Throughput= 1 ops / cycle
Fetch Decode Execute
Memory
Access
Write-
back
Fetch Decode Execute
Memory
Access
Write-
back
Fetch Decode Execute
Memory
Access
Write-
back
I1
I0
I2
Pipelining
• A pipeline can stall
• This happens with:
– Branches
if (i < 0) i++ else i--;
F D E M WMemory Load
F D E MTest
F D EConditional
Jump
? ????
F D E M WIncrement
memory address
F D E M
F D Stall
F D
Load from
memory
Add +1
Store in
memory
Pipelining
• A pipeline can stall
• This happens with:
– Branches
– Dependent Instructions
• A.K.A pipeline bubbling
i++;
x = i + 1;
Stall
PRACTICAL
RAMIFICATIONS
Image: Hangsna (CC BY-SA 3.0)
1. Memory is Slow
• RAM access is ~60ns
• Random access on a
4GHz, 64-bit CPU:
– 250 cycles / memory access
– 130MB / second bandwidth
• Surely we can do better!
Image: Noah Wieder (Public Domain)
Source: 7-cpu.com
Enter: CPU Cache
Level Size Latency
L1 32KB + 32KB 1ns
L2 256KB 3ns
L3 4MB 11ns
Main Memory 62ns
Intel i7-6700 “Skylake” at 4 GHz
Image: Ferry24.Milan (CC BY-SA 3.0)
Source: 7-cpu.com
Enter: CPU Cache
• A unit of work is
called cache line
– 64 bytes on x86
– LRU eviction policy
• Why is sequential
access fast?
– Cache prefetching
In Real Life
• Let’s rotate an image!
for (y = 0; y < height; y++)
for (x = 0; x < width; x++) {
int from = y * width + x;
int to = x * height + y;
target[to] = source[from];
}
Image: EgoAltere (CC0 Public Domain)
In Real Life
• This is not efficient
• Reads are sequential
0 1 2 3 ... 9
0
1
2
3
…
9
In Real Life
• This is not efficient
• Reads are sequential
0 1 2 3 ... 9
0 0 1 2 3 … 9
1
2
3
…
9
In Real Life
• This is not efficient
• Reads are sequential
• Writes aren’t, though
• Different strides
– Worst case wins :-(
0 1 2 3 ... 9
0 0 1 2 3 … 9
1 10
2 20
3 30
… …
9 90
Cache-Friendly Algorithms
• Use blocking or tiling
for (y = 0; y < height; y += blockHeight)
for (x = 0; x < width; x += blockWidth)
for (by = 0; by < blockHeight; by++)
for (bx = 0; bx < blockWidth; bx++) {
int from = (y + by) * width + (x + bx);
int to = (x + bx) * height + (y + by);
target[to] = source[from];
}
Cache-Friendly Algorithms
• The results?
Benchmark Mode Cnt Score Error Units
CachingShowcase.transposeNaive avgt 10 43.851 ± 6.000 ms/op
CachingShowcase.transposeTiled8x8 avgt 10 20.641 ± 1.646 ms/op
CachingShowcase.transposeTiled16x16 avgt 10 18.515 ± 1.833 ms/op
CachingShowcase.transposeTiled48x48 avgt 10 21.941 ± 1.954 ms/op
• The results?
Benchmark Mode Cnt Error Units
CachingShowcase.transpose avgt 10 ± 6.000 ms/op
CachingShowcase.transpose avgt 10 ± 1.646 ms/op
CachingShowcase.transpose avgt 10 ± 1.833 ms/op
CachingShowcase.transpose avgt 10 ± 1.954 ms/op
x2.37 speedup!
2. Those Pesky Branches
• Do I go left or right?
• Need input!
• … but can’t wait for it
• Maybe...
– Take a guess?
– Based on historic trends?
• Sounds speculative
Image: Michael Dolan (CC BY 2.0)
Those Pesky Branches
• Enter: Branch Prediction
• Concurrently:
– Speculate branch
– Evaluate condition
• It’s now a tradeoff
– Commit is fast
– Rollback is slow
Image: Alejandro C. (CC BY-NC 2.0)
// Generate a bunch of bytes
byte[] data = new byte[32768];
new Random().nextBytes(data);
Arrays.sort(data);
// Sum positive elements
long sum = 0;
for (int i = 0; i < data.length; i++)
if (data[i] >= 0)
sum += data[i];
Back to Our Conundrum
• Can you guess?
– 3…
– 2...
– 1...
• Here it is!
// Generate a bunch of bytes
byte[] data = new byte[32768];
new Random().nextBytes(data);
Arrays.sort(data);
// Sum positive elements
long sum = 0;
for (int i = 0; i < data.length; i++)
if (data[i] >= 0)
sum += data[i];
Catharsis
54 10 -4 -2 15 41
-
37
13 0 -9 14 25
-
61
40
Original data array:
Catharsis
-
61
-
37
-9 -4 -2 0 10 13 14 15 25 40 41 54
After sorting:
0
data[i] >= 0
Always false!
data[i] >= 0
Always true!
QUESTIONS?
Thank you for listening
tomer@tomergabel.com
@tomerg
http://engineering.wix.com
Sources and Examples:
https://goo.gl/f7NfGT
This work is licensed under a Creative
Commons Attribution-ShareAlike 4.0
International License.
Further Reading
• Jason Robert Carey Patterson –
Modern Microprocessors, a 90-Minute Guide
• Igor Ostrovsky - Gallery of Processor Cache
Effects
• Piyush Kumar –
Cache Oblivious Algorithms
Ad

More Related Content

What's hot (20)

Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
Linaro
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
shimosawa
 
Heap exploitation
Heap exploitationHeap exploitation
Heap exploitation
Angel Boy
 
Understand more about C
Understand more about CUnderstand more about C
Understand more about C
Yi-Hsiu Hsu
 
U-Boot Porting on New Hardware
U-Boot Porting on New HardwareU-Boot Porting on New Hardware
U-Boot Porting on New Hardware
RuggedBoardGroup
 
What Can Compilers Do for Us?
What Can Compilers Do for Us?What Can Compilers Do for Us?
What Can Compilers Do for Us?
National Cheng Kung University
 
20分くらいでわかった気分になれるC++20コルーチン
20分くらいでわかった気分になれるC++20コルーチン20分くらいでわかった気分になれるC++20コルーチン
20分くらいでわかった気分になれるC++20コルーチン
yohhoy
 
Virtual Machine Constructions for Dummies
Virtual Machine Constructions for DummiesVirtual Machine Constructions for Dummies
Virtual Machine Constructions for Dummies
National Cheng Kung University
 
Halide による画像処理プログラミング入門
Halide による画像処理プログラミング入門Halide による画像処理プログラミング入門
Halide による画像処理プログラミング入門
Fixstars Corporation
 
Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
National Cheng Kung University
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
SUSE Labs Taipei
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
emBO_Conference
 
Linux System Programming - File I/O
Linux System Programming - File I/O Linux System Programming - File I/O
Linux System Programming - File I/O
YourHelper1
 
Linux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeLinux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledge
Angel Boy
 
Grub2 Booting Process
Grub2 Booting ProcessGrub2 Booting Process
Grub2 Booting Process
Mike Wang
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
National Cheng Kung University
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
Alexei Starovoitov
 
Introduction to char device driver
Introduction to char device driverIntroduction to char device driver
Introduction to char device driver
Vandana Salve
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
National Cheng Kung University
 
05.2 virtio introduction
05.2 virtio introduction05.2 virtio introduction
05.2 virtio introduction
zenixls2
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
Linaro
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
shimosawa
 
Heap exploitation
Heap exploitationHeap exploitation
Heap exploitation
Angel Boy
 
Understand more about C
Understand more about CUnderstand more about C
Understand more about C
Yi-Hsiu Hsu
 
U-Boot Porting on New Hardware
U-Boot Porting on New HardwareU-Boot Porting on New Hardware
U-Boot Porting on New Hardware
RuggedBoardGroup
 
20分くらいでわかった気分になれるC++20コルーチン
20分くらいでわかった気分になれるC++20コルーチン20分くらいでわかった気分になれるC++20コルーチン
20分くらいでわかった気分になれるC++20コルーチン
yohhoy
 
Halide による画像処理プログラミング入門
Halide による画像処理プログラミング入門Halide による画像処理プログラミング入門
Halide による画像処理プログラミング入門
Fixstars Corporation
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
SUSE Labs Taipei
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
emBO_Conference
 
Linux System Programming - File I/O
Linux System Programming - File I/O Linux System Programming - File I/O
Linux System Programming - File I/O
YourHelper1
 
Linux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeLinux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledge
Angel Boy
 
Grub2 Booting Process
Grub2 Booting ProcessGrub2 Booting Process
Grub2 Booting Process
Mike Wang
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
Alexei Starovoitov
 
Introduction to char device driver
Introduction to char device driverIntroduction to char device driver
Introduction to char device driver
Vandana Salve
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
National Cheng Kung University
 
05.2 virtio introduction
05.2 virtio introduction05.2 virtio introduction
05.2 virtio introduction
zenixls2
 

Viewers also liked (12)

How Shit Works: Storage
How Shit Works: StorageHow Shit Works: Storage
How Shit Works: Storage
Tomer Gabel
 
The Wix Microservice Stack
The Wix Microservice StackThe Wix Microservice Stack
The Wix Microservice Stack
Tomer Gabel
 
Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016
aixigo AG
 
Onboarding at Scale
Onboarding at ScaleOnboarding at Scale
Onboarding at Scale
Tomer Gabel
 
5 Bullets to Scala Adoption
5 Bullets to Scala Adoption5 Bullets to Scala Adoption
5 Bullets to Scala Adoption
Tomer Gabel
 
Four hands
Four handsFour hands
Four hands
StuartMMills
 
Disturbios de aprendizagem
Disturbios de aprendizagemDisturbios de aprendizagem
Disturbios de aprendizagem
Beneditaarruda
 
безсмертна пам’ять
безсмертна      пам’ятьбезсмертна      пам’ять
безсмертна пам’ять
kilobajt
 
Cualidades del personal del futuro
Cualidades del personal del futuroCualidades del personal del futuro
Cualidades del personal del futuro
parc21
 
Scala Back to Basics: Type Classes
Scala Back to Basics: Type ClassesScala Back to Basics: Type Classes
Scala Back to Basics: Type Classes
Tomer Gabel
 
Put Your Thinking CAP On
Put Your Thinking CAP OnPut Your Thinking CAP On
Put Your Thinking CAP On
Tomer Gabel
 
Scala in practice
Scala in practiceScala in practice
Scala in practice
Tomer Gabel
 
How Shit Works: Storage
How Shit Works: StorageHow Shit Works: Storage
How Shit Works: Storage
Tomer Gabel
 
The Wix Microservice Stack
The Wix Microservice StackThe Wix Microservice Stack
The Wix Microservice Stack
Tomer Gabel
 
Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016
aixigo AG
 
Onboarding at Scale
Onboarding at ScaleOnboarding at Scale
Onboarding at Scale
Tomer Gabel
 
5 Bullets to Scala Adoption
5 Bullets to Scala Adoption5 Bullets to Scala Adoption
5 Bullets to Scala Adoption
Tomer Gabel
 
Disturbios de aprendizagem
Disturbios de aprendizagemDisturbios de aprendizagem
Disturbios de aprendizagem
Beneditaarruda
 
безсмертна пам’ять
безсмертна      пам’ятьбезсмертна      пам’ять
безсмертна пам’ять
kilobajt
 
Cualidades del personal del futuro
Cualidades del personal del futuroCualidades del personal del futuro
Cualidades del personal del futuro
parc21
 
Scala Back to Basics: Type Classes
Scala Back to Basics: Type ClassesScala Back to Basics: Type Classes
Scala Back to Basics: Type Classes
Tomer Gabel
 
Put Your Thinking CAP On
Put Your Thinking CAP OnPut Your Thinking CAP On
Put Your Thinking CAP On
Tomer Gabel
 
Scala in practice
Scala in practiceScala in practice
Scala in practice
Tomer Gabel
 
Ad

Similar to How shit works: the CPU (20)

HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
Hackito Ergo Sum
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
Takuya ASADA
 
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce RichardsonThe 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
harryvanhaaren
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
JAXLondon2014
 
Pitfalls of Object Oriented Programming
Pitfalls of Object Oriented ProgrammingPitfalls of Object Oriented Programming
Pitfalls of Object Oriented Programming
Slide_N
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
Cosimo Streppone
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
Tal Bar-Zvi
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis
Moabi.com
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Peter Hlavaty
 
LST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, TouchLST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, Touch
Dimitry Snezhkov
 
A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010
Tsukasa Oi
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War III
Slide_N
 
Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Python
infodox
 
PyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc AltedPyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc Alted
Pôle Systematic Paris-Region
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!
Peter Hlavaty
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status Report
Liu Yuan
 
Unity - Internals: memory and performance
Unity - Internals: memory and performanceUnity - Internals: memory and performance
Unity - Internals: memory and performance
Codemotion
 
The Quantum Physics of Java
The Quantum Physics of JavaThe Quantum Physics of Java
The Quantum Physics of Java
Michael Heinrichs
 
Meltdown & Spectre
Meltdown & Spectre Meltdown & Spectre
Meltdown & Spectre
Marco Cipriano
 
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
Hackito Ergo Sum
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
Takuya ASADA
 
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce RichardsonThe 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
harryvanhaaren
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
JAXLondon2014
 
Pitfalls of Object Oriented Programming
Pitfalls of Object Oriented ProgrammingPitfalls of Object Oriented Programming
Pitfalls of Object Oriented Programming
Slide_N
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
Cosimo Streppone
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
Tal Bar-Zvi
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis
Moabi.com
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Peter Hlavaty
 
LST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, TouchLST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, Touch
Dimitry Snezhkov
 
A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010
Tsukasa Oi
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War III
Slide_N
 
Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Python
infodox
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!
Peter Hlavaty
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status Report
Liu Yuan
 
Unity - Internals: memory and performance
Unity - Internals: memory and performanceUnity - Internals: memory and performance
Unity - Internals: memory and performance
Codemotion
 
Ad

More from Tomer Gabel (20)

How shit works: Time
How shit works: TimeHow shit works: Time
How shit works: Time
Tomer Gabel
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
Tomer Gabel
 
Slaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency InjectionSlaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency Injection
Tomer Gabel
 
An Abridged Guide to Event Sourcing
An Abridged Guide to Event SourcingAn Abridged Guide to Event Sourcing
An Abridged Guide to Event Sourcing
Tomer Gabel
 
Java 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala StoryJava 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala Story
Tomer Gabel
 
Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)
Tomer Gabel
 
Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and Profit
Tomer Gabel
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the Wild
Tomer Gabel
 
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Tomer Gabel
 
Leveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better ValidationLeveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better Validation
Tomer Gabel
 
A Field Guide to DSL Design in Scala
A Field Guide to DSL Design in ScalaA Field Guide to DSL Design in Scala
A Field Guide to DSL Design in Scala
Tomer Gabel
 
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Tomer Gabel
 
Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)
Tomer Gabel
 
Ponies and Unicorns With Scala
Ponies and Unicorns With ScalaPonies and Unicorns With Scala
Ponies and Unicorns With Scala
Tomer Gabel
 
Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101
Tomer Gabel
 
DevCon³: Scala Best Practices
DevCon³: Scala Best PracticesDevCon³: Scala Best Practices
DevCon³: Scala Best Practices
Tomer Gabel
 
Maven for Dummies
Maven for DummiesMaven for Dummies
Maven for Dummies
Tomer Gabel
 
SHC Israel: GigaSpaces Case Study
SHC Israel: GigaSpaces Case StudySHC Israel: GigaSpaces Case Study
SHC Israel: GigaSpaces Case Study
Tomer Gabel
 
The Demoscene: A cursory introduction
The Demoscene: A cursory introductionThe Demoscene: A cursory introduction
The Demoscene: A cursory introduction
Tomer Gabel
 
Video: What you never thought you might want to know
Video: What you never thought you might want to knowVideo: What you never thought you might want to know
Video: What you never thought you might want to know
Tomer Gabel
 
How shit works: Time
How shit works: TimeHow shit works: Time
How shit works: Time
Tomer Gabel
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
Tomer Gabel
 
Slaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency InjectionSlaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency Injection
Tomer Gabel
 
An Abridged Guide to Event Sourcing
An Abridged Guide to Event SourcingAn Abridged Guide to Event Sourcing
An Abridged Guide to Event Sourcing
Tomer Gabel
 
Java 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala StoryJava 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala Story
Tomer Gabel
 
Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)
Tomer Gabel
 
Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and Profit
Tomer Gabel
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the Wild
Tomer Gabel
 
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Tomer Gabel
 
Leveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better ValidationLeveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better Validation
Tomer Gabel
 
A Field Guide to DSL Design in Scala
A Field Guide to DSL Design in ScalaA Field Guide to DSL Design in Scala
A Field Guide to DSL Design in Scala
Tomer Gabel
 
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Tomer Gabel
 
Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)
Tomer Gabel
 
Ponies and Unicorns With Scala
Ponies and Unicorns With ScalaPonies and Unicorns With Scala
Ponies and Unicorns With Scala
Tomer Gabel
 
Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101
Tomer Gabel
 
DevCon³: Scala Best Practices
DevCon³: Scala Best PracticesDevCon³: Scala Best Practices
DevCon³: Scala Best Practices
Tomer Gabel
 
Maven for Dummies
Maven for DummiesMaven for Dummies
Maven for Dummies
Tomer Gabel
 
SHC Israel: GigaSpaces Case Study
SHC Israel: GigaSpaces Case StudySHC Israel: GigaSpaces Case Study
SHC Israel: GigaSpaces Case Study
Tomer Gabel
 
The Demoscene: A cursory introduction
The Demoscene: A cursory introductionThe Demoscene: A cursory introduction
The Demoscene: A cursory introduction
Tomer Gabel
 
Video: What you never thought you might want to know
Video: What you never thought you might want to knowVideo: What you never thought you might want to know
Video: What you never thought you might want to know
Tomer Gabel
 

Recently uploaded (20)

What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 

How shit works: the CPU

  • 1. How shit works: the CPU Tomer Gabel BuildStuff 2016 Lithuania Image: Telecarlos (CC BY-SA 3.0)
  • 2. Full Disclosure Bullshit ahead! • I’m not an expert • Explanations may be: – Simplified – Inaccurate – Wrong :-) • We’ll barely scratch the surface Image: Public Domain
  • 3. A CONUNDRUM? Are you ready for… Image: Louis Reed (CC BY-SA 4.0)
  • 4. Setting the Stage // Generate a bunch of bytes byte[] data = new byte[32768]; new Random().nextBytes(data); Arrays.sort(data); // Sum positive elements long sum = 0; for (int i = 0; i < data.length; i++) if (data[i] >= 0) sum += data[i]; 1. Which is faster? 2. By how much? 3. And crucially… why?!
  • 5. # Run complete. Total time: 00:00:32 Benchmark Mode Cnt Score Error Units Baseline.sum avgt 6 115.666 ± 3.137 us/op Presorted.sum avgt 6 13.741 ± 0.524 us/op Surprise, Terror and Ruthless Efficiency # Run complete. Total time: 00:00:32 Benchmark Mode Cnt Error Units Baseline.sum avgt 6 ± 3.137 us/op Presorted.sum avgt 6 ± 0.524 us/op * Ignoring setup cost
  • 6. CPUS ARE COMPLEX BEASTS. Image: Pauli Rautakorpi (CC BY 3.0)
  • 7. It Is Known • Your high-level code… long sum = 0; for (i = 0; i < length; i++) if (data[i] >= 0) sum += data[i]; • Gets compiled down to… movsx eax,BYTE PTR [rax+rdx*1+0x10] cmp eax,0x0 movabs rdx,0x11f3a9f60 movabs rcx,0x128 jl 0x000000010679e077 movabs rcx,0x138 mov r8,QWORD PTR [rdx+rcx*1] lea r8,[r8+0x1] mov QWORD PTR [rdx+rcx*1],r8 jl 0x000000010679e092 movsxd rax,eax add rax,rbx mov rbx,rax inc edi
  • 8. It Is Less Known • What happens then? • The instruction goes through phases… Fetch Decode Execute Memory Access Write- back Instruction Stream
  • 9. CPU Architecture 101 Image: Appaloosa (CC BY-SA 3.0)
  • 10. CPU Architecture 101 • What does a CPU do? – Reads the program
  • 11. CPU Architecture 101 • What does a CPU do? – Reads the program – Figures it out
  • 12. CPU Architecture 101 • What does a CPU do? – Reads the program – Figures it out – Executes it
  • 13. CPU Architecture 101 • What does a CPU do? – Reads the program – Figures it out – Executes it – Talks to memory
  • 14. CPU Architecture 101 • What does a CPU do? – Reads the program – Figures it out – Executes it – Talks to memory – Performs I/O
  • 15. CPU Architecture 101 • What does a CPU do? – Reads the program – Figures it out – Executes it – Talks to memory – Performs I/O • Immense complexity!
  • 16. Execution Units • Arithmetic-Logic Unit (ALU) – Boolean algebra – Arithmetic – Memory accesses – Flow control • Floating Point Unit (FPU) • Memory Management Unit (MMU) – Memory mapping – Paging – Access control Images: ALU by Dirk Oppelt (CC BY-SA 3.0), FPU by Konstantin Lanzet (CC BY-SA 3.0), MMU from unknown source
  • 17. DESIGN CONSIDERATIONS Image: William M. Plate Jr. (Public Domain)
  • 18. Fetch Decode Execute Memory Access Write- back Fetch Decode Execute Memory Access Write- back Fetch Decode Execute Memory Access Write- back I1 I0 I2 Pipelining Sequential Execution Latency = 5 cycles Throughput= 0.2 ops / cycle
  • 19. Fetch Decode Execute Memory Access Write- back I1 I0 I2 Fetch Decode Execute Memory Access Fetch Decode Execute Pipelining Sequential Execution Pipelined Execution Latency = 5 cycles Throughput= 0.2 ops / cycle Latency = 5 cycles Throughput= 1 ops / cycle Fetch Decode Execute Memory Access Write- back Fetch Decode Execute Memory Access Write- back Fetch Decode Execute Memory Access Write- back I1 I0 I2
  • 20. Pipelining • A pipeline can stall • This happens with: – Branches if (i < 0) i++ else i--; F D E M WMemory Load F D E MTest F D EConditional Jump ? ????
  • 21. F D E M WIncrement memory address F D E M F D Stall F D Load from memory Add +1 Store in memory Pipelining • A pipeline can stall • This happens with: – Branches – Dependent Instructions • A.K.A pipeline bubbling i++; x = i + 1; Stall
  • 23. 1. Memory is Slow • RAM access is ~60ns • Random access on a 4GHz, 64-bit CPU: – 250 cycles / memory access – 130MB / second bandwidth • Surely we can do better! Image: Noah Wieder (Public Domain) Source: 7-cpu.com
  • 24. Enter: CPU Cache Level Size Latency L1 32KB + 32KB 1ns L2 256KB 3ns L3 4MB 11ns Main Memory 62ns Intel i7-6700 “Skylake” at 4 GHz Image: Ferry24.Milan (CC BY-SA 3.0) Source: 7-cpu.com
  • 25. Enter: CPU Cache • A unit of work is called cache line – 64 bytes on x86 – LRU eviction policy • Why is sequential access fast? – Cache prefetching
  • 26. In Real Life • Let’s rotate an image! for (y = 0; y < height; y++) for (x = 0; x < width; x++) { int from = y * width + x; int to = x * height + y; target[to] = source[from]; } Image: EgoAltere (CC0 Public Domain)
  • 27. In Real Life • This is not efficient • Reads are sequential 0 1 2 3 ... 9 0 1 2 3 … 9
  • 28. In Real Life • This is not efficient • Reads are sequential 0 1 2 3 ... 9 0 0 1 2 3 … 9 1 2 3 … 9
  • 29. In Real Life • This is not efficient • Reads are sequential • Writes aren’t, though • Different strides – Worst case wins :-( 0 1 2 3 ... 9 0 0 1 2 3 … 9 1 10 2 20 3 30 … … 9 90
  • 30. Cache-Friendly Algorithms • Use blocking or tiling for (y = 0; y < height; y += blockHeight) for (x = 0; x < width; x += blockWidth) for (by = 0; by < blockHeight; by++) for (bx = 0; bx < blockWidth; bx++) { int from = (y + by) * width + (x + bx); int to = (x + bx) * height + (y + by); target[to] = source[from]; }
  • 31. Cache-Friendly Algorithms • The results? Benchmark Mode Cnt Score Error Units CachingShowcase.transposeNaive avgt 10 43.851 ± 6.000 ms/op CachingShowcase.transposeTiled8x8 avgt 10 20.641 ± 1.646 ms/op CachingShowcase.transposeTiled16x16 avgt 10 18.515 ± 1.833 ms/op CachingShowcase.transposeTiled48x48 avgt 10 21.941 ± 1.954 ms/op • The results? Benchmark Mode Cnt Error Units CachingShowcase.transpose avgt 10 ± 6.000 ms/op CachingShowcase.transpose avgt 10 ± 1.646 ms/op CachingShowcase.transpose avgt 10 ± 1.833 ms/op CachingShowcase.transpose avgt 10 ± 1.954 ms/op x2.37 speedup!
  • 32. 2. Those Pesky Branches • Do I go left or right? • Need input! • … but can’t wait for it • Maybe... – Take a guess? – Based on historic trends? • Sounds speculative Image: Michael Dolan (CC BY 2.0)
  • 33. Those Pesky Branches • Enter: Branch Prediction • Concurrently: – Speculate branch – Evaluate condition • It’s now a tradeoff – Commit is fast – Rollback is slow Image: Alejandro C. (CC BY-NC 2.0)
  • 34. // Generate a bunch of bytes byte[] data = new byte[32768]; new Random().nextBytes(data); Arrays.sort(data); // Sum positive elements long sum = 0; for (int i = 0; i < data.length; i++) if (data[i] >= 0) sum += data[i]; Back to Our Conundrum • Can you guess? – 3… – 2... – 1... • Here it is! // Generate a bunch of bytes byte[] data = new byte[32768]; new Random().nextBytes(data); Arrays.sort(data); // Sum positive elements long sum = 0; for (int i = 0; i < data.length; i++) if (data[i] >= 0) sum += data[i];
  • 35. Catharsis 54 10 -4 -2 15 41 - 37 13 0 -9 14 25 - 61 40 Original data array:
  • 36. Catharsis - 61 - 37 -9 -4 -2 0 10 13 14 15 25 40 41 54 After sorting: 0 data[i] >= 0 Always false! data[i] >= 0 Always true!
  • 37. QUESTIONS? Thank you for listening [email protected] @tomerg http://engineering.wix.com Sources and Examples: https://goo.gl/f7NfGT This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
  • 38. Further Reading • Jason Robert Carey Patterson – Modern Microprocessors, a 90-Minute Guide • Igor Ostrovsky - Gallery of Processor Cache Effects • Piyush Kumar – Cache Oblivious Algorithms