SlideShare a Scribd company logo
Understanding JVM GC
Jean-Philippe Bempel
@jpbempel
Agenda
GC basics
G1
Shenandoah
Azul C4
Z GC
GC Basics
Generations
Marking for Minor GC
Traversing references to mark lived objects
Stops when reaching old gen
From GC Roots (static fields, thread stacks, JNI)
Young Old
Card Table
Young
Card scanning for young-to-old refs
1 1 0
Write barrier to update card table
mov QWORD PTR [rsi+0x20],rdx
shr rsi,0x9
mov rdi,0x7f2d42817000
mov BYTE PTR [rsi+rdi*1],0x0
G1
Garbage First
Region based
Pause time target (soft real-time)
-XX:MaxGCPauseMillis=n (default 200)
Middle ground throughput/latency
Default GC since JDK9
Regions
Heap divided into fixed-size regions
Garbage First
Credit: Kirk Pepperdine
G1 phases
Young Collection (STW)
Initial Mark (STW)
Concurrent Marking
Final Remark (STW)
Cleanup (STW/Concurrent)
Mixed Collection (STW)
Young Collection
Stop-The-World Event
Evacuates lived objects to survivor or Old regions
Only objects in young generation are considered
Remembered Sets
Card table per region
Avoid scanning the entire heap to update reference
Remembered Sets: Post Write Barrier
For each reference assignment (X.f = Y) need to check:
● References (X & Y) are not in the same region
● Y is not null
● => enqueue for RS processing
Refinement Threads to process the queue
Additional instructions added after assignment
if (!isInSameRegion(X, Y)
&& Y != null)
RSEnqueue(X)
mov DWORD PTR [rbp+0x74],r10d
mov r11,rbp
mov r8,r10
shl r8,0x3
xor r8,r11
shr r8,0x14
test r8,r8
je cont
test r10d,r10d
je cont
shr r11,0x9
movabs rcx,0x2965ecc3000
add rcx,r11
cmp BYTE PTR [rcx],0x20
je cont
mov r10,QWORD PTR [r15+0x70]
mov r11,QWORD PTR [r15+0x80]
lock add DWORD PTR [rsp-0x40],0x0
cmp BYTE PTR [rcx],0x0
je cont
mov BYTE PTR [rcx],0x0
test r10,r10
jne 0x000002965edc62bc
mov rdx,r15
movabs r10,0x7ffac2febc30
call r10
jmp cont
mov QWORD PTR [r11+r10*1-0x8],rcx
add r10,0xfffffffffffffff8
mov QWORD PTR [r15+0x70],r10
Concurrent Marking
Triggered based on Initiating Heap Occupancy Percent flag (45%)
Try to mark the whole object graph concurrently with application
running
Based on tri-color abstraction & Snapshot-at-the-beginning algorithm
Concurrent Marking: Tri-Color abstraction
GC root
Concurrent Marking: Tri-Color abstraction
GC root
Concurrent Marking: Tri-Color abstraction
GC root
Concurrent Marking: Tri-Color abstraction
GC root
Concurrent Marking: Tri-Color abstraction
GC root
Concurrent Marking: Tri-Color abstraction
GC root
Concurrent Marking: Tri-Color abstraction
GC root
Concurrent Marking: Issues
New allocations during mark phase can be handled in 2 ways:
● Marking automatically object at allocation
● Not considering new allocations for the current cycle
Tri-Color abstraction provides 2 properties of missed objects:
1. The mutator stores a reference to a white object into a black object
2. All paths from any gray objects to a white objects are destroyed
Concurrent Marking: Issues
A
B
C
A.field1 = C;
B.field2 = null;
OOPS!
Concurrent Marking: Resolving misses
2 ways to ensure not missing any marking
For SATB, pre-write barriers, recording object
for marking
SATB barrier is active only when Marking is on
(global state)
if (SATB_WriteBarrier) {
if (X.f != null)
SATB_enqueue(X.f);
}
cmp BYTE PTR [r15+0x30],0x0
jne 0x000002965edc62e5
[...]
mov r11d,DWORD PTR [rbp+0x74]
test r11d,r11d
je 0x000002965edc6253
mov r10,QWORD PTR [r15+0x38]
mov rcx,r11
shl rcx,0x3
test r10,r10
je 0x000002965edc6318
mov r11,QWORD PTR [r15+0x48]
mov QWORD PTR
[r11+r10*1-0x8],rcx
add r10,0xfffffffffffffff8
mov QWORD PTR [r15+0x38],r10
jmp 0x000002965edc6253
mov rdx,r15
movabs r10,0x7ffac2febc50
call r10
jmp 0x000002965edc6253
CollectionSet
At the end of Marking, we have per region liveness information
Regions are sorted by liveness (ascending, so garbage first!)
Regions full of garbage are collected during concurrent cleanup
CollectionSet used during Mixed event is built based on:
● Liveness, up until threshold (G1HeapWastePercent and
G1MixedGCLiveThresholdPercent )
● Maximum number of regions (G1OldCSetRegionThresholdPercent )
Mixed GC
Based on CollectionSet, G1 schedules to collect part of old regions
When Young is triggered, old regions to collect are piggy-backed
Some part of old regions are considered for each GC based on cost
model to reach the pause goal
Several young GCs can be used to collect old regions (mixed event)
Mixed GC
Humoungous
If an object is more than half of a region, it is humongous
Humongous objects are too expensive to move or relocate
Too many Humongous alloc leads to early triggering of GC
Avoid having too much humongous allocations
Full GC
Fallback to FullGC (serial < JDK10)
Fragmentation can still happen (regions with lots of lived objects)
Unpredictable
Shenandoah
Shenandoah GC
Non-generational (still option for partial collection)
Region based
Use Read Barrier: Brooks pointer
Self-Healing:
● Cooperation between mutator thread & GC threads
● Only for concurrent compaction
Mostly based on G1
Shenandoah Phases
Initial Marking (STW)
Concurrent Marking
Final Remark (STW)
Concurrent Cleanup
Concurrent Evacuation
Init Update References (STW)
Concurrent Update References
Final Update References (STW)
Concurrent Cleanup
Concurrent Marking
SATB-style (like G1)
2 STW pauses for Initial Mark & Final Remark
Conditional Write Barrier
To deal with concurrent modification of object graph
Concurrent Evacuation
Same principle than G1:
Build CollectionSet with Garbage First!
Evacuate to new regions to release the region for reuse
Concurrent Evacuation done with the help of:
1 Read Barrier : Brooks pointer
4 Write Barriers
Barriers help to keep the to-space invariant:
All Writes are made into an object in to-space
Brooks pointers
All objects have an additional forwarding pointer
Placed before the regular object
mov r13,QWORD PTR [r12+r14*8-0x8]
Header
Brooks pointer
Dereference the forwarding pointer for each access
Memory footprint overhead
Throughput overhead
Concurrent copy: GC thread
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
GC thread
Concurrent copy: Reader thread
Header
Brooks pointer
From-Space To-Space
Reader
thread
Reader
thread
Concurrent copy: Writer thread
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
Header
Brooks pointer
Header
Brooks pointer
Write Barriers
Any writes (even primitives) to from-space object needs to be
protected
if (evacInProgress
&& inCollectionSet(obj)
&& notCopyYet(obj)) {
evacuateObject(obj)
}
test BYTE PTR [r15+0x3c0],0x2
jne 0x000000000281bcbc
[...]
mov r10d,DWORD PTR [r13+0xc]
test r10d,r10d
je 0x000000000281bc2b
mov r11,QWORD PTR [r15+0x360]
mov rcx,r10
shl rcx,0x3
test r11,r11
je 0x000000000281bd0d
[...]
mov rdx,r15
movabs r10,0x62d1f660
call r10
jmp 0x000000000281bc2b
Exotic barriers:
acmp (pointer comparison)
CAS
clone
Extreme cases
Late memory release
Only happens when all refs updated (Concurrent Cleanup
phase)
Allocations can overrun the GC
Failure modes:
Pacing
Degenerated GC
FullGC
Shenandoah 2.0?
Since JDK13, notable changes introduced
Load Reference Barrier
Baker-style barrier (Relocation)
Evaluated at reference load time
Eliminating forward pointer word
Store forward information into Mark word
Remove Brooks pointer
Azul’s C4
Continuously Concurrent Compacting Collector
Generational (young & old)
Region based (pages)
Use Read Barrier: Loaded Value Barrier
Self-Healing
Cooperation between mutator threads & GC threads
Pauseless algorithm but implementation requires safepoints
Pauses are most of the time < 1ms
LVB
Baker-style based Barrier
move objects through forwarding addresses stored aside
Applied at load time, not when dereferencing
Fused marking & relocation state
Ensure C4 invariants:
Marked Through the current cycle
Not relocated
If not => Self-healing process to correct it
Mark object
Relocate & correct reference
Checked for each reference loads
Benefits from JIT optimization for caching loaded value (registers)
LVB
States of objects stored inside reference address => Colored
pointers
NMT bit
Remapped/Generation
Checked against a global expected value during the GC cycle
Thread local, almost always L1 cache hits
Register
Unmasking reference addresses:
Linux Kernel module, aliasing
memfd, multi-mapping
test r9, rax
jne 0x3001443b
mov r10d, dword ptr [rax + 8]
Virtual Memory vs Physical Memory
Virtual Memory
Physical Memory
0 2^64
0 2^37
C4 Phases
Mark
Marking all objects in graph
Relocation
Moving objects to release pages
Remap
Fixup references in object graph
Folded with next mark cycle
Mark Phase
Precise Wavefront Marking
Single pass
No final mark/remark
Self-Healing: Mark object that are not marked for the current
cycle
Mark Phase: Concurrent Modfication
A
B
C
A.field1 = C;
B.field2 = null;
LVB
Relocation Phase
Select pages with the greatest number of dead objects (garbage first!)
Protect page selected from being accessed by mutators thread
Move objects to new allocated pages
Build side arrays (off heap tables) for forwarding information
Self-Healing: As protected, LVB will trigger a trap to:
Copy object to the new location if not done
Use forward pointer to fix the reference
Virtual
Physical
Relocation Phase
Forwarding table
Remap Phase
Traverse Object Graph and fixup references
Execute LVB barrier for each object
Self-Healing: fixup references using forward information
As we traverse again, mark for the next phase
Mark & Remap phases are folded!
Remap - Kernel module
Algorithm requires a sustainable rate or remapping operations
Linux limitations:
TLB invalidation
Only 4KB pages can be remapped
Single threaded remapping (write lock)
Kernel module implements API for the Zing JVM to increase significantly the
remapping rate
Implements also virtual address aliasing for addressing objects with metadata
C4 in real life scenario
Ullink:
Low latency order router (100us)
Order Management System (512GB heap)
Criteo:
Name Node on a 2000+ nodes Hadoop cluster
580GB heap
Hard to tune with G1 (JDK8)
Once in prod no issue in 2 years
Z GC
Z GC
Non generational
Region based (zPages, dynamically sized)
Concurrent Marking, Compaction, Ref processing
Use Colored Pointers & Read/Load Barrier
Self-Healing
Cooperation between mutator threads & GC threads
Experimental in JDK 11 (-XX:+UnlockExperimentalVMOptions
–XX:+UseZGC)
mov r10,QWORD PTR [r11+0xb0]
test QWORD PTR [r15+0x20],r10
jne 0x00007f9594cc54b5
Z GC
Z GC phases:
Initial Mark (STW)
Concurrent Mark/Remap
Final Mark (STW)
Concurrent Prepare for Relocation
Start Relocate (STW)
Concurrent Relocate
Colored Pointers
Store metadata in unused bits of reference address
42 bits for addressing (4TB)
44 bits (16TB) for JDK 13
4 bits for metadata
Marked0
Marked1
Remapped
Finalizable
Multi-Mapping
Colored pointers needs to be unmasked for dereferencing
Some HW support masking (SPARC, Aarch64))
On linux/windows, overhead if done with classical
instructions
Only one view is active at any point
Plenty of Virtual Space
Multi-Mapping
Virtual Memory
Physical Memory
0 2^64
0 2^37
(marked0)
001<address>
(marked1)
010<address>
(remapped)
100<address>
Memory Usage
Difference between C4 & Z GC
Unmasking ref addresses
C4: Kernel module aliasing
Z: Multi-mapping or HW support
Pages & Relocation
C4:
Page are fixed (2MB)
relocation for large objects by remapping
Z:
zPages are dynamic, a zPage can be 100MB large
No relocation for large objects
How to choose a GC algorithm
Low Latency GCs
You have to run on Windows
Shenandoah
Battlefield tested GC (maturity)
C4
Shenandoah
Minimizing any kind of JVM pauses
C4
Z
You don’t want pay for it:
Shenandoah
Z
References
References GC basics
Java Garbage Collection distilled by Martin Thompson
Java GC minibook
Oracle’s white paper on JVM memory management & GC
What differences JVM makes by Nitsan Wakart
Memory Management Reference
References G1
Garbage-First Garbage Collection
G1 One Garbage Collector to rule them all by Monica Beckwith
Tips for Tuning The G1 GC by Monica Beckwith
G1 Garbage Collector Details and Tuning by Simone Bordet
Write Barriers in Garbage-First Garbage Collector by Monica Beckwith
References Shenandoah
Shenandoah: An open-source concurrent compacting garbage collector
for OpenJDK
Shenandoah: The Garbage Collector That Could by Aleksey Shipilev
Shenandoah GC Wiki
Load Reference Barriers by Roman Kennke
Eliminating forward pointer word by Roman Kennke
References C4
The Pauseless GC algorithm (2005)
C4: Continuously Concurrent Compacting Collector (2011)
Azul GC in Detail by Charles Humble
2010 version source code
References Z GC
ZGC - Low Latency GC for OpenJDK by Per Liden
Java's new Z Garbage Collector (ZGC) is very exciting by Richard
Warburton
A first look into ZGC by Dominik Inführ
Architectural Comparison with C4/Pauseless
ZGC Heap Size and RSS counters
Thank You!
Jean-Philippe Bempel
@jpbempel

More Related Content

What's hot (20)

PDF
The Search for Gravitational Waves
inside-BigData.com
 
PPTX
A petri-net
Omar Al-Sabek
 
ODP
Gc algorithms
Michał Warecki
 
PPTX
Jumping into heaven’s gate
Yarden Shafir
 
PPTX
Deep hooks
Yarden Shafir
 
PPTX
Paper_Scalable database logging for multicores
Hyo jeong Lee
 
PDF
Time Series Processing with Solr and Spark
Josef Adersberger
 
PPTX
Petri Nets: Properties, Analysis and Applications
Dr. Mohamed Torky
 
PDF
The Ring programming language version 1.10 book - Part 125 of 212
Mahmoud Samir Fayed
 
PDF
Parallel Random Generator - GDC 2015
Manchor Ko
 
PDF
Eclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India
 
PPT
Class3
issbp
 
PPTX
.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup
 
PDF
SAP S4 HANA FI 1610 Overview (mindmap edition)
Benedict Yong (杨腾翔)
 
PDF
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward
 
PPTX
SAP S4 HANA SD 1709 Overview (mindmap edition) Final
Benedict Yong (杨腾翔)
 
PDF
Oracle12c achitecture
naderattia
 
PDF
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward
 
PDF
09_Dxt 압축 알고리즘 소개
noerror
 
The Search for Gravitational Waves
inside-BigData.com
 
A petri-net
Omar Al-Sabek
 
Gc algorithms
Michał Warecki
 
Jumping into heaven’s gate
Yarden Shafir
 
Deep hooks
Yarden Shafir
 
Paper_Scalable database logging for multicores
Hyo jeong Lee
 
Time Series Processing with Solr and Spark
Josef Adersberger
 
Petri Nets: Properties, Analysis and Applications
Dr. Mohamed Torky
 
The Ring programming language version 1.10 book - Part 125 of 212
Mahmoud Samir Fayed
 
Parallel Random Generator - GDC 2015
Manchor Ko
 
Eclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India
 
Class3
issbp
 
.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup
 
SAP S4 HANA FI 1610 Overview (mindmap edition)
Benedict Yong (杨腾翔)
 
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward
 
SAP S4 HANA SD 1709 Overview (mindmap edition) Final
Benedict Yong (杨腾翔)
 
Oracle12c achitecture
naderattia
 
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward
 
09_Dxt 압축 알고리즘 소개
noerror
 

Similar to Understanding JVM GC: advanced! (20)

PDF
New Algorithms in Java
Krystian Zybała
 
PPTX
OpenJDK Concurrent Collectors
Monica Beckwith
 
PPT
Garbage collection in JVM
aragozin
 
PDF
Demystifying Garbage Collection in Java
Igor Braga
 
ODP
Quick introduction to Java Garbage Collector (JVM GC)
Marcos García
 
PDF
JVM Under the Hood
Serkan Özal
 
PDF
JVM Memory Management Details
Azul Systems Inc.
 
PDF
Jvm is-your-friend
ColdFusionConference
 
PDF
The JVM is your friend
Kai Koenig
 
PDF
OPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORS
Monica Beckwith
 
PDF
Compiler Construction | Lecture 15 | Memory Management
Eelco Visser
 
PPTX
Java GC
Ray Cheng
 
PPTX
Java garbage collection & GC friendly coding
Md Ayub Ali Sarker
 
PPT
«Большие объёмы данных и сборка мусора в Java
Olga Lavrentieva
 
PPT
Lp seminar
guestdff961
 
PPTX
Garbage collection
Anand Srinivasan
 
ODP
Garbage Collection in Hotspot JVM
jaganmohanreddyk
 
PPTX
Intro to Garbage Collection
Monica Beckwith
 
PDF
A New Age of JVM Garbage Collectors (Clojure Conj 2019)
Alexander Yakushev
 
PDF
Low latency Java apps
Simon Ritter
 
New Algorithms in Java
Krystian Zybała
 
OpenJDK Concurrent Collectors
Monica Beckwith
 
Garbage collection in JVM
aragozin
 
Demystifying Garbage Collection in Java
Igor Braga
 
Quick introduction to Java Garbage Collector (JVM GC)
Marcos García
 
JVM Under the Hood
Serkan Özal
 
JVM Memory Management Details
Azul Systems Inc.
 
Jvm is-your-friend
ColdFusionConference
 
The JVM is your friend
Kai Koenig
 
OPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORS
Monica Beckwith
 
Compiler Construction | Lecture 15 | Memory Management
Eelco Visser
 
Java GC
Ray Cheng
 
Java garbage collection & GC friendly coding
Md Ayub Ali Sarker
 
«Большие объёмы данных и сборка мусора в Java
Olga Lavrentieva
 
Lp seminar
guestdff961
 
Garbage collection
Anand Srinivasan
 
Garbage Collection in Hotspot JVM
jaganmohanreddyk
 
Intro to Garbage Collection
Monica Beckwith
 
A New Age of JVM Garbage Collectors (Clojure Conj 2019)
Alexander Yakushev
 
Low latency Java apps
Simon Ritter
 
Ad

More from Jean-Philippe BEMPEL (14)

PDF
Mastering GC.pdf
Jean-Philippe BEMPEL
 
PDF
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Jean-Philippe BEMPEL
 
PDF
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Jean-Philippe BEMPEL
 
PDF
Tools in action jdk mission control and flight recorder
Jean-Philippe BEMPEL
 
PDF
Le guide de dépannage de la jvm
Jean-Philippe BEMPEL
 
PDF
Out ofmemoryerror what is the cost of java objects
Jean-Philippe BEMPEL
 
PDF
OutOfMemoryError : quel est le coût des objets en java
Jean-Philippe BEMPEL
 
PDF
Low latency & mechanical sympathy issues and solutions
Jean-Philippe BEMPEL
 
PDF
Lock free programming - pro tips devoxx uk
Jean-Philippe BEMPEL
 
PDF
Lock free programming- pro tips
Jean-Philippe BEMPEL
 
PDF
Programmation lock free - les techniques des pros (2eme partie)
Jean-Philippe BEMPEL
 
PDF
Programmation lock free - les techniques des pros (1ere partie)
Jean-Philippe BEMPEL
 
PDF
Measuring directly from cpu hardware performance counters
Jean-Philippe BEMPEL
 
PDF
Devoxx france 2014 compteurs de perf
Jean-Philippe BEMPEL
 
Mastering GC.pdf
Jean-Philippe BEMPEL
 
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Jean-Philippe BEMPEL
 
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Jean-Philippe BEMPEL
 
Tools in action jdk mission control and flight recorder
Jean-Philippe BEMPEL
 
Le guide de dépannage de la jvm
Jean-Philippe BEMPEL
 
Out ofmemoryerror what is the cost of java objects
Jean-Philippe BEMPEL
 
OutOfMemoryError : quel est le coût des objets en java
Jean-Philippe BEMPEL
 
Low latency & mechanical sympathy issues and solutions
Jean-Philippe BEMPEL
 
Lock free programming - pro tips devoxx uk
Jean-Philippe BEMPEL
 
Lock free programming- pro tips
Jean-Philippe BEMPEL
 
Programmation lock free - les techniques des pros (2eme partie)
Jean-Philippe BEMPEL
 
Programmation lock free - les techniques des pros (1ere partie)
Jean-Philippe BEMPEL
 
Measuring directly from cpu hardware performance counters
Jean-Philippe BEMPEL
 
Devoxx france 2014 compteurs de perf
Jean-Philippe BEMPEL
 
Ad

Recently uploaded (20)

PPTX
西班牙巴利阿里群岛大学电子版毕业证{UIBLetterUIB文凭证书}文凭复刻
Taqyea
 
PPTX
原版一样(毕业证书)法国蒙彼利埃大学毕业证文凭复刻
Taqyea
 
PDF
BRKAPP-1102 - Proactive Network and Application Monitoring.pdf
fcesargonca
 
PDF
google promotion services in Delhi, India
Digital Web Future
 
DOCX
Custom vs. Off-the-Shelf Banking Software
KristenCarter35
 
PPTX
04 Output 1 Instruments & Tools (3).pptx
GEDYIONGebre
 
PDF
Top 10 Testing Procedures to Ensure Your Magento to Shopify Migration Success...
CartCoders
 
PPTX
Networking_Essentials_version_3.0_-_Module_5.pptx
ryan622010
 
PDF
Digital burnout toolkit for youth workers and teachers
asociatiastart123
 
PDF
Enhancing Parental Roles in Protecting Children from Online Sexual Exploitati...
ICT Frame Magazine Pvt. Ltd.
 
PDF
BRKSP-2551 - Introduction to Segment Routing.pdf
fcesargonca
 
PDF
Boardroom AI: The Next 10 Moves | Cerebraix Talent Tech
ssuser73bdb11
 
PDF
FutureCon Seattle 2025 Presentation Slides - You Had One Job
Suzanne Aldrich
 
PPTX
原版一样(LHU毕业证书)英国利物浦希望大学毕业证办理方法
Taqyea
 
PPTX
Academic Debate: Creation vs Evolution.pptx
JOHNPATRICKMARTINEZ5
 
PPTX
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
PDF
The Hidden Benefits of Outsourcing IT Hardware Procurement for Small Businesses
Carley Cramer
 
PPTX
PHIPA-Compliant Web Hosting in Toronto: What Healthcare Providers Must Know
steve198109
 
PPTX
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
PPTX
美国电子毕业证帕克大学电子版成绩单UMCP学费发票办理学历认证
Taqyea
 
西班牙巴利阿里群岛大学电子版毕业证{UIBLetterUIB文凭证书}文凭复刻
Taqyea
 
原版一样(毕业证书)法国蒙彼利埃大学毕业证文凭复刻
Taqyea
 
BRKAPP-1102 - Proactive Network and Application Monitoring.pdf
fcesargonca
 
google promotion services in Delhi, India
Digital Web Future
 
Custom vs. Off-the-Shelf Banking Software
KristenCarter35
 
04 Output 1 Instruments & Tools (3).pptx
GEDYIONGebre
 
Top 10 Testing Procedures to Ensure Your Magento to Shopify Migration Success...
CartCoders
 
Networking_Essentials_version_3.0_-_Module_5.pptx
ryan622010
 
Digital burnout toolkit for youth workers and teachers
asociatiastart123
 
Enhancing Parental Roles in Protecting Children from Online Sexual Exploitati...
ICT Frame Magazine Pvt. Ltd.
 
BRKSP-2551 - Introduction to Segment Routing.pdf
fcesargonca
 
Boardroom AI: The Next 10 Moves | Cerebraix Talent Tech
ssuser73bdb11
 
FutureCon Seattle 2025 Presentation Slides - You Had One Job
Suzanne Aldrich
 
原版一样(LHU毕业证书)英国利物浦希望大学毕业证办理方法
Taqyea
 
Academic Debate: Creation vs Evolution.pptx
JOHNPATRICKMARTINEZ5
 
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
The Hidden Benefits of Outsourcing IT Hardware Procurement for Small Businesses
Carley Cramer
 
PHIPA-Compliant Web Hosting in Toronto: What Healthcare Providers Must Know
steve198109
 
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
美国电子毕业证帕克大学电子版成绩单UMCP学费发票办理学历认证
Taqyea
 

Understanding JVM GC: advanced!

  • 5. Marking for Minor GC Traversing references to mark lived objects Stops when reaching old gen From GC Roots (static fields, thread stacks, JNI) Young Old
  • 6. Card Table Young Card scanning for young-to-old refs 1 1 0 Write barrier to update card table mov QWORD PTR [rsi+0x20],rdx shr rsi,0x9 mov rdi,0x7f2d42817000 mov BYTE PTR [rsi+rdi*1],0x0
  • 7. G1
  • 8. Garbage First Region based Pause time target (soft real-time) -XX:MaxGCPauseMillis=n (default 200) Middle ground throughput/latency Default GC since JDK9
  • 9. Regions Heap divided into fixed-size regions
  • 11. G1 phases Young Collection (STW) Initial Mark (STW) Concurrent Marking Final Remark (STW) Cleanup (STW/Concurrent) Mixed Collection (STW)
  • 12. Young Collection Stop-The-World Event Evacuates lived objects to survivor or Old regions Only objects in young generation are considered
  • 13. Remembered Sets Card table per region Avoid scanning the entire heap to update reference
  • 14. Remembered Sets: Post Write Barrier For each reference assignment (X.f = Y) need to check: ● References (X & Y) are not in the same region ● Y is not null ● => enqueue for RS processing Refinement Threads to process the queue Additional instructions added after assignment if (!isInSameRegion(X, Y) && Y != null) RSEnqueue(X) mov DWORD PTR [rbp+0x74],r10d mov r11,rbp mov r8,r10 shl r8,0x3 xor r8,r11 shr r8,0x14 test r8,r8 je cont test r10d,r10d je cont shr r11,0x9 movabs rcx,0x2965ecc3000 add rcx,r11 cmp BYTE PTR [rcx],0x20 je cont mov r10,QWORD PTR [r15+0x70] mov r11,QWORD PTR [r15+0x80] lock add DWORD PTR [rsp-0x40],0x0 cmp BYTE PTR [rcx],0x0 je cont mov BYTE PTR [rcx],0x0 test r10,r10 jne 0x000002965edc62bc mov rdx,r15 movabs r10,0x7ffac2febc30 call r10 jmp cont mov QWORD PTR [r11+r10*1-0x8],rcx add r10,0xfffffffffffffff8 mov QWORD PTR [r15+0x70],r10
  • 15. Concurrent Marking Triggered based on Initiating Heap Occupancy Percent flag (45%) Try to mark the whole object graph concurrently with application running Based on tri-color abstraction & Snapshot-at-the-beginning algorithm
  • 16. Concurrent Marking: Tri-Color abstraction GC root
  • 17. Concurrent Marking: Tri-Color abstraction GC root
  • 18. Concurrent Marking: Tri-Color abstraction GC root
  • 19. Concurrent Marking: Tri-Color abstraction GC root
  • 20. Concurrent Marking: Tri-Color abstraction GC root
  • 21. Concurrent Marking: Tri-Color abstraction GC root
  • 22. Concurrent Marking: Tri-Color abstraction GC root
  • 23. Concurrent Marking: Issues New allocations during mark phase can be handled in 2 ways: ● Marking automatically object at allocation ● Not considering new allocations for the current cycle Tri-Color abstraction provides 2 properties of missed objects: 1. The mutator stores a reference to a white object into a black object 2. All paths from any gray objects to a white objects are destroyed
  • 24. Concurrent Marking: Issues A B C A.field1 = C; B.field2 = null; OOPS!
  • 25. Concurrent Marking: Resolving misses 2 ways to ensure not missing any marking For SATB, pre-write barriers, recording object for marking SATB barrier is active only when Marking is on (global state) if (SATB_WriteBarrier) { if (X.f != null) SATB_enqueue(X.f); } cmp BYTE PTR [r15+0x30],0x0 jne 0x000002965edc62e5 [...] mov r11d,DWORD PTR [rbp+0x74] test r11d,r11d je 0x000002965edc6253 mov r10,QWORD PTR [r15+0x38] mov rcx,r11 shl rcx,0x3 test r10,r10 je 0x000002965edc6318 mov r11,QWORD PTR [r15+0x48] mov QWORD PTR [r11+r10*1-0x8],rcx add r10,0xfffffffffffffff8 mov QWORD PTR [r15+0x38],r10 jmp 0x000002965edc6253 mov rdx,r15 movabs r10,0x7ffac2febc50 call r10 jmp 0x000002965edc6253
  • 26. CollectionSet At the end of Marking, we have per region liveness information Regions are sorted by liveness (ascending, so garbage first!) Regions full of garbage are collected during concurrent cleanup CollectionSet used during Mixed event is built based on: ● Liveness, up until threshold (G1HeapWastePercent and G1MixedGCLiveThresholdPercent ) ● Maximum number of regions (G1OldCSetRegionThresholdPercent )
  • 27. Mixed GC Based on CollectionSet, G1 schedules to collect part of old regions When Young is triggered, old regions to collect are piggy-backed Some part of old regions are considered for each GC based on cost model to reach the pause goal Several young GCs can be used to collect old regions (mixed event)
  • 29. Humoungous If an object is more than half of a region, it is humongous Humongous objects are too expensive to move or relocate Too many Humongous alloc leads to early triggering of GC Avoid having too much humongous allocations
  • 30. Full GC Fallback to FullGC (serial < JDK10) Fragmentation can still happen (regions with lots of lived objects) Unpredictable
  • 32. Shenandoah GC Non-generational (still option for partial collection) Region based Use Read Barrier: Brooks pointer Self-Healing: ● Cooperation between mutator thread & GC threads ● Only for concurrent compaction Mostly based on G1
  • 33. Shenandoah Phases Initial Marking (STW) Concurrent Marking Final Remark (STW) Concurrent Cleanup Concurrent Evacuation Init Update References (STW) Concurrent Update References Final Update References (STW) Concurrent Cleanup
  • 34. Concurrent Marking SATB-style (like G1) 2 STW pauses for Initial Mark & Final Remark Conditional Write Barrier To deal with concurrent modification of object graph
  • 35. Concurrent Evacuation Same principle than G1: Build CollectionSet with Garbage First! Evacuate to new regions to release the region for reuse Concurrent Evacuation done with the help of: 1 Read Barrier : Brooks pointer 4 Write Barriers Barriers help to keep the to-space invariant: All Writes are made into an object in to-space
  • 36. Brooks pointers All objects have an additional forwarding pointer Placed before the regular object mov r13,QWORD PTR [r12+r14*8-0x8] Header Brooks pointer Dereference the forwarding pointer for each access Memory footprint overhead Throughput overhead
  • 37. Concurrent copy: GC thread Header Brooks pointer Header Brooks pointer From-Space To-Space GC thread
  • 38. Concurrent copy: Reader thread Header Brooks pointer From-Space To-Space Reader thread Reader thread
  • 39. Concurrent copy: Writer thread Header Brooks pointer From-Space To-Space Writer thread Writer thread Header Brooks pointer Header Brooks pointer
  • 40. Write Barriers Any writes (even primitives) to from-space object needs to be protected if (evacInProgress && inCollectionSet(obj) && notCopyYet(obj)) { evacuateObject(obj) } test BYTE PTR [r15+0x3c0],0x2 jne 0x000000000281bcbc [...] mov r10d,DWORD PTR [r13+0xc] test r10d,r10d je 0x000000000281bc2b mov r11,QWORD PTR [r15+0x360] mov rcx,r10 shl rcx,0x3 test r11,r11 je 0x000000000281bd0d [...] mov rdx,r15 movabs r10,0x62d1f660 call r10 jmp 0x000000000281bc2b Exotic barriers: acmp (pointer comparison) CAS clone
  • 41. Extreme cases Late memory release Only happens when all refs updated (Concurrent Cleanup phase) Allocations can overrun the GC Failure modes: Pacing Degenerated GC FullGC
  • 42. Shenandoah 2.0? Since JDK13, notable changes introduced Load Reference Barrier Baker-style barrier (Relocation) Evaluated at reference load time Eliminating forward pointer word Store forward information into Mark word Remove Brooks pointer
  • 44. Continuously Concurrent Compacting Collector Generational (young & old) Region based (pages) Use Read Barrier: Loaded Value Barrier Self-Healing Cooperation between mutator threads & GC threads Pauseless algorithm but implementation requires safepoints Pauses are most of the time < 1ms
  • 45. LVB Baker-style based Barrier move objects through forwarding addresses stored aside Applied at load time, not when dereferencing Fused marking & relocation state Ensure C4 invariants: Marked Through the current cycle Not relocated If not => Self-healing process to correct it Mark object Relocate & correct reference Checked for each reference loads Benefits from JIT optimization for caching loaded value (registers)
  • 46. LVB States of objects stored inside reference address => Colored pointers NMT bit Remapped/Generation Checked against a global expected value during the GC cycle Thread local, almost always L1 cache hits Register Unmasking reference addresses: Linux Kernel module, aliasing memfd, multi-mapping test r9, rax jne 0x3001443b mov r10d, dword ptr [rax + 8]
  • 47. Virtual Memory vs Physical Memory Virtual Memory Physical Memory 0 2^64 0 2^37
  • 48. C4 Phases Mark Marking all objects in graph Relocation Moving objects to release pages Remap Fixup references in object graph Folded with next mark cycle
  • 49. Mark Phase Precise Wavefront Marking Single pass No final mark/remark Self-Healing: Mark object that are not marked for the current cycle
  • 50. Mark Phase: Concurrent Modfication A B C A.field1 = C; B.field2 = null; LVB
  • 51. Relocation Phase Select pages with the greatest number of dead objects (garbage first!) Protect page selected from being accessed by mutators thread Move objects to new allocated pages Build side arrays (off heap tables) for forwarding information Self-Healing: As protected, LVB will trigger a trap to: Copy object to the new location if not done Use forward pointer to fix the reference
  • 53. Remap Phase Traverse Object Graph and fixup references Execute LVB barrier for each object Self-Healing: fixup references using forward information As we traverse again, mark for the next phase Mark & Remap phases are folded!
  • 54. Remap - Kernel module Algorithm requires a sustainable rate or remapping operations Linux limitations: TLB invalidation Only 4KB pages can be remapped Single threaded remapping (write lock) Kernel module implements API for the Zing JVM to increase significantly the remapping rate Implements also virtual address aliasing for addressing objects with metadata
  • 55. C4 in real life scenario Ullink: Low latency order router (100us) Order Management System (512GB heap) Criteo: Name Node on a 2000+ nodes Hadoop cluster 580GB heap Hard to tune with G1 (JDK8) Once in prod no issue in 2 years
  • 56. Z GC
  • 57. Z GC Non generational Region based (zPages, dynamically sized) Concurrent Marking, Compaction, Ref processing Use Colored Pointers & Read/Load Barrier Self-Healing Cooperation between mutator threads & GC threads Experimental in JDK 11 (-XX:+UnlockExperimentalVMOptions –XX:+UseZGC) mov r10,QWORD PTR [r11+0xb0] test QWORD PTR [r15+0x20],r10 jne 0x00007f9594cc54b5
  • 58. Z GC
  • 59. Z GC phases: Initial Mark (STW) Concurrent Mark/Remap Final Mark (STW) Concurrent Prepare for Relocation Start Relocate (STW) Concurrent Relocate
  • 60. Colored Pointers Store metadata in unused bits of reference address 42 bits for addressing (4TB) 44 bits (16TB) for JDK 13 4 bits for metadata Marked0 Marked1 Remapped Finalizable
  • 61. Multi-Mapping Colored pointers needs to be unmasked for dereferencing Some HW support masking (SPARC, Aarch64)) On linux/windows, overhead if done with classical instructions Only one view is active at any point Plenty of Virtual Space
  • 62. Multi-Mapping Virtual Memory Physical Memory 0 2^64 0 2^37 (marked0) 001<address> (marked1) 010<address> (remapped) 100<address>
  • 64. Difference between C4 & Z GC Unmasking ref addresses C4: Kernel module aliasing Z: Multi-mapping or HW support Pages & Relocation C4: Page are fixed (2MB) relocation for large objects by remapping Z: zPages are dynamic, a zPage can be 100MB large No relocation for large objects
  • 65. How to choose a GC algorithm
  • 66. Low Latency GCs You have to run on Windows Shenandoah Battlefield tested GC (maturity) C4 Shenandoah Minimizing any kind of JVM pauses C4 Z You don’t want pay for it: Shenandoah Z
  • 68. References GC basics Java Garbage Collection distilled by Martin Thompson Java GC minibook Oracle’s white paper on JVM memory management & GC What differences JVM makes by Nitsan Wakart Memory Management Reference
  • 69. References G1 Garbage-First Garbage Collection G1 One Garbage Collector to rule them all by Monica Beckwith Tips for Tuning The G1 GC by Monica Beckwith G1 Garbage Collector Details and Tuning by Simone Bordet Write Barriers in Garbage-First Garbage Collector by Monica Beckwith
  • 70. References Shenandoah Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK Shenandoah: The Garbage Collector That Could by Aleksey Shipilev Shenandoah GC Wiki Load Reference Barriers by Roman Kennke Eliminating forward pointer word by Roman Kennke
  • 71. References C4 The Pauseless GC algorithm (2005) C4: Continuously Concurrent Compacting Collector (2011) Azul GC in Detail by Charles Humble 2010 version source code
  • 72. References Z GC ZGC - Low Latency GC for OpenJDK by Per Liden Java's new Z Garbage Collector (ZGC) is very exciting by Richard Warburton A first look into ZGC by Dominik Inführ Architectural Comparison with C4/Pauseless ZGC Heap Size and RSS counters