SlideShare a Scribd company logo
Peter Lawrey
CEO of Higher Frequency Trading
J On The Beach 2016
Low Latency in Java 8
Peter Lawrey
Java Developer/Consultant for investment banks
and hedge funds for 8 years, 23 years in IT.
Most answers for Java and JVM on
stackoverflow.com
Founder of the Performance Java User’s Group.
Architect of Chronicle Software
Java Champion
Chronicle Software
Help companies migrate to high performance Java code.
Sponsor open source projects https://github.com/OpenHFT
Licensed solutions Chronicle-FIX, Chronicle-Enterprise and
Chronicle-Queue-Enterprise
Offer one week proof of concept workshops, advanced Java
training, consulting and bespoke development.
Low latency in java 8 by Peter Lawrey
Chronicle Demo
Agenda
• What is Low Latency?
• What is an ultra low GC rate?
– Eden space and Compressed OOPs.
• How does Java 8 help?
– Non-capturing lambdas
– Serializing Lambdas
– Escape Analysis
• Optimise for Latency distribution not Throughput.
– Flow control and avoiding co-ordinated omission.
• Low latency code using Java 8.
What is low latency?
The term “low latency” can applied to a wide range of situations.
A broad definition might be;
Low latency means you have a view on how much the response
time of a system costs your business.
In this talk I will assume;
Low latency means you care about latencies you can only
measure as even the worst latencies are too fast to see.
Even latencies you can’t see add up
Data passing Latency Light over a fibre Throughput on at a
time
Method call Inlined: 0
Real call: 50 ns.
10 meters 20,000,000/sec
Shared memory 200 ns 40 meters 5,000,000/sec
SYSV Shared memory 2 µs 400 meters 500,000/sec
Low latency network 8 µs 1.6 km 125,000/sec
Typical LAN network 30 µs 6 km 30,000/sec
Typical data grid system 800 µs 160 km 1,250/sec
60 Hz power flickers 8 ms 1600 km 120/sec
4G request latency in UK 55 ms 11,000 km 18/sec
Why consistent performance matters.
When you have high delays, even rarely, this two consequences
- Customers tend to remember the worst service they ever got.
You tend to lose the most money when are unable to react to
events the longest.
- In a busy system, a delayed in processing an
event/message/request has a knock on effect many
subsequent requests. A single delay can impact hundreds or
even many thousands of requests.
What is an ultra low GC?
In a generational collector you have multiple generations of
objects.
Small to medium sized objects are created in the Eden space.
When your Eden space fills up you trigger a Minor Collection.
So how big can your Eden space be and still have Compressed
OOPS?
Where does the Eden space fit in
Java 7 memory layout.
Where does the Eden space fit in
Java 8 memory layout.
What are Compressed OOPS?
64-bit applications use 64-bit pointers. This take up more space
than 32-bit pointers. For real applications this can mean up to
30% of memory needed.
However, the 64-bit JVM can address the heap using 32-bit
references. This saves memory and improves the efficiency of the
cache as you can keep more objects in cache.
What are Compressed OOPS?
What are Compressed OOPS?
In Java 8; If the heap is between 32 GB and 64 GB …
–XX:+UseCompressedOops
–XX:ObjectAlignmentInBytes=16
The JVM can address anywhere on the heap by multiplying by 16
and adding a relative offset.
You can increase the object alignment to 32 manually but at this
point 64-bit references uses less memory.
Ultra low GC
Let say you have an Eden size of 48 GB.
You will only get a garbage collection once the Eden space fills, ie
you have used 48 GB.
If you need to run for 24 hours without a GC, you can still produce
2 GB of garbage, and GC once per day.
This is a rate of 500KB/s,
What is an ultra low GC?
Another reason to use ultra low garbage rates is your caches are
not being filled with garbage and they work much more efficiently
and consistently.
If you have a web server which is producing 300 MB/s of garbage,
this means less than 5% of the time you will be pausing for a GC.
However, it does mean that you could be filling a 32 KB L1 CPU in
around 0.1 milli-seconds. Your L2 cache fills with garbage every
milli-second.
So without a GC pause, no more pauses?
Actually, a high percentage of pauses are not from the GC. The
biggest ones are, but take away GC pauses and still see;
- IO delays. These can be larger than GC pauses.
- Network delays.
- Waiting for databases.
- Disk reads / writes.
- OS interrupts.
- It is not uncommon for your OS to stop your process for 5 ms or more.
- Lock contention pauses.
How does Java 8 help?
The biggest improvement in Java 8 are;
- Lambdas with no captured values and lambdas with reduced
capture of variables.
- More efficient than anonymous inner classes.
- Escape Analysis to unpack objects onto the stack.
- Short lived objects placed on the stack don’t create garbage.
How do Lambdas help?
Lambdas are like anonymous inner classes, however they are
assigned to static variables if they don’t capture anything.
public static Runnable helloWorld() {
return () -> System.out.println("Hello World");
}
public static Consumer<String> printMe() {
// may create a new object each time = Garbage.
return System.out::println;
}
public static Consumer<String> printMe2() {
return x -> System.out.println(x);
}
How does Java 8 help? Lambdas
When you call new on an anonymous inner classes, a new object
is always created. Non capturing lambdas can be cached.
Runnable r1 = helloWorld();
Runnable r2 = helloWorld();
System.out.println(r1 == r2); // prints true
Consumer<String> c1 = printMe();
Consumer<String> c2 = printMe();
System.out.println(c1 == c2); // prints false
Consumer<String> c3 = printMe2();
Consumer<String> c4 = printMe2();
System.out.println(c3 == c4); // prints true
Serialization
Lambdas capture less scope. This means it doesn’t capture this
unless it has to, but it can capture things you don’t expect.
Lambdas capture less scope. If you use this.a the value of a is
copied.
Note: if you use the :: notation, it will capture the left operand if it
is a variable.
interface SerializableConsumer<T> extends Consumer<T>, Serializable {
}
// throws java.io.NotSerializableException: java.io.PrintStream
public SerializableConsumer<String> printMe() {
return System.out::println;
}
public SerializableConsumer<String> printMe2() {
return x -> System.out.println(x);
}
public SerializableConsumer<String> printMe3() {
// throws java.io.NotSerializableException: A
return new SerializableConsumer<String>() {
@Override
public void accept(String s) {
System.out.println(s);
}
};
}
Why Serialize a Lambda?
Lambdas are designed to reduce boiler plate, and when you
have a distributed system, they can be a powerful addition.
The two lambdas are serialized on the client to be executed on
the server.
This example is from the RedisEmulator in Chronicle-Engine.
public static long incrby(MapView<String, Long> map, String key, long toAdd) {
return map.syncUpdateKey(key, v -> v + toAdd, v -> v);
}
Why Serialize a Lambda?
public static Set<String> keys(MapView<String, ?> map, String pattern) {
return map.applyTo(m -> {
Pattern compile = Pattern.compile(pattern);
return m.keySet().stream()
.filter(k -> compile.matcher(k).matches())
.collect(Collectors.toSet());
});
}
The lambda m-> { is serialized and executed on the server.
// print userId which have a usageCounter > 10
// each time it is incremented (asynchronously)
userMap.entrySet().query()
.filter(e -> e.getValue().usageCounter > 10)
.map(e -> e.getKey())
.subscribe(System.out::println);
Why Serialize a Lambda?
The filter/map lambdas are serialized.
The subscribe lambda is executed asynchronously on the client.
How does Java 8 help? Escape Analysis
Escape Analysis can
- Determine an object doesn’t escape a method so it can be
placed on the stack.
- Determine an object doesn’t escape a method so it doesn’t
need to be synchronized.
How does Java 8 help? Escape Analysis
Escape Analysis works with inlining. After inlining, the JIT can see
all the places an object is used. If it doesn’t escape the method it
doesn’t need to be created and can be unpacked on the stack.
This works for class objects, but not arrays currently.
After “unpacking” on to the stack the object might be optimised
away. E.g. say all the fields are set using local variables anyway.
How does Java 8 help? Escape Analysis
As of Java 8 update 60, the JITed code generated is still not as
efficient as code written to no need these optimisation, however
the JIT is getting closer to optimal.
How does Java 8 help? Escape Analysis
To parameters which control in-lining and the maximum method
size for performing escape analysis are
-XX:MaxBCEAEstimateSize=150 -XX:FreqInlineSize=325
For our software I favour
-XX:MaxBCEAEstimateSize=450 -XX:FreqInlineSize=425
Optimising for latency instead of throughput
Measuring Throughput is a great way to hide bad service, or poor
latencies.
Your users however tend to be impacted by the poor
services/latencies.
Optimising for latency instead of throughput
Average latency is largely an inverse of your throughput and no
better. Using standard deviation for average latency is mis-
leading at best as the distribution for latencies are not a normal
distribution or anything like it.
From Little’s Law
Average latency = concurrency / throughput.
How to hide bad performance
with average latency
Say you have a service which responds to 2 requests every milli-
seconds, but once an hour it stops for 6 minutes.
Without the 6 minute pause, the average latency would 0.5 ms.
With the 6 minute latency, the number of tasks performed in an
hour drop from 7.2 million to 6 million and the average latency is
0.6 ms.
Conclusion: pausing for 6 minutes each hour doesn’t make much
difference to our metric.
How to find solvable problems
with latency distributions
Run a test with many time at a given rate.
Time how long the task takes from when the test should have
started, not when you actually started.
Sort these timings and look at the 99%, 99.9%, 99.99% and worst
numbers in your results.
The 99% (or worst 1 in 100) will be much higher than your
average latency and is more like to explain why your users see bad
performance sometimes.
Ideally you should reduce your 99.9% and even your worst
latency.
Is it fair to time from when
the test should have started?
Without having a view on when your tests should have started
you don’t consider that if the system stalls, it could impact tens or
even thousands of tasks. i.e. you can’t know the impact of a long
delay.
Unfortunately a lot of tools optimistically assume only 1 tasks
ways delayed, but this is unrealistic.
Co-ordinated omission
Term coined by Gill Tene, CTO Azul Systems.
Co-ordinated omission occurs when you have a benchmark which
accepts flow control from the solution being tested.
Flow control allows the system being tested to stop the
benchmark when it is running slow. Most older benchmark tools
allow this!!
What is flow control?
Most producer/consumer systems use some for of flow control.
Flow control allows the consumer to stop the producer to prevent
the system from becoming overloaded and cause a failure of the
system.
TCP/IP uses flow control for example.
UDP doesn’t have flow control and if the consumer can’t keep up,
the messages are lost.
What is flow control?
Chronicle Queue however uses an open ended persisted queue.
This avoid the need for flow control as the consumer can be any
amount behind the producer (to the limits of your disk capacity)
A system without flow control is easier to reproduce for testing
debugging and performance tuning purposes.
You can test the producer or consumer in isolation as they don’t
interact with one another.
Why is flow control bad
for performance measures?
Flow control helps deal with period when the consumer can’t
keep up. It does so by slowing down or stopping the producer.
For performance testing this is like creating a blind spot for the
producer which is the load generator.
The load generator measures all the times the consumer can keep
up, but significantly underrates when the consumer can’t keep
up.
Our example without co-ordinated omission
Say you have a service which responds to 2 requests every milli-
seconds, but once an hour it stops for 6 minutes.
In the 6 minutes when the process stopped, how many tasks were
delayed. The optimistic answer is 1, but the pessimistic answer is
2 every milli-second or 720,000.
Our example without co-ordinated omission
This is why the expect rate matters, and unless you see an
expected throughput used in tested, there is a good chance co-
ordinated omission occurred.
Lets consider that the target throughput was 100 per second. In
the 6 minutes at least 6 * 60 * 100 requests were delayed. Some 6
minutes, some 5 minutes, … 1 minute. However as the requests
are being performed, more tests are being added. In the 18
seconds it takes process the waiting requests, about 1800 are
added but soon the queue is empty.
Co-ordinated omission
18,005
0.5
-
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
50% 80% 85% 90% 92% 94% 96% 98% 99% 99.20% 99.40% 99.60% 99.80% 99.90% 100.00%
Delayinmilli-seconds
Axis Title
One big delay modelled
without Co-ordinated omission and with CO
Without CO With CO
Optimising for latency instead of throughput
Having enough throughput is only the start. You also ned to look
at the consistency of your service.
The next step is to look at your 99 percentile latencies (worst 1 in
100). After that you can look at your 99.9% tile, 99.99% tile and
your worst latencies tested.
A low latency API which uses Lambdas
Chronicle Wire is a single API which supports multiple formats.
You decide what data you want to read/write and independently
you can chose the format. E.g. YAML, JSON, Binary YAML, XML.
Using lambdas helped to simplify the API.
A low latency API which uses Lambdas
Timings are in micro-seconds with JMH.
* Data was read/written to native memory.
Wire Format Bytes 99.9 %tile 99.99 %tile 99.999 %tile worst
JSONWire 100* 3.11 5.56 10.6 36.9
Jackson 100 4.95 8.3 1,400 1,500
Jackson + Chronicle-Bytes 100* 2.87 10.1 1,300 1,400
BSON 96 19.8 1,430 1,400 1,600
BSON + Chronicle-Bytes 96* 7.47 15.1 1,400 11,600
BOON Json 100 20.7 32.5 11,000 69,000
"price":1234,"longInt":1234567890,"smallInt":123,"flag":true,"text":"Hello World!","side":"Sell"
A resizable buffer and a Wire format
// Bytes which wraps a ByteBuffer which is resized as needed.
Bytes<ByteBuffer> bytes = Bytes.elasticByteBuffer();
// YAML based wire format
Wire wire = new TextWire(bytes);
// or a binary YAML based wire format
Bytes<ByteBuffer> bytes2 = Bytes.elasticByteBuffer();
Wire wire2 = new BinaryWire(bytes2);
// or just data, no meta data.
Bytes<ByteBuffer> bytes3 = Bytes.elasticByteBuffer();
Wire wire3 = new RawWire(bytes3);
Low latency API using Lambdas (Wire)
message: Hello World number: 1234567890 code: SECONDS price: 10.5
wire.read(() -> "message").text(this, (o, s) -> o.message = s)
.read(() -> "number").int64(this, (o, i) -> o.number = i)
.read(() -> "timeUnit").asEnum(TimeUnit.class, this, (o, e) -> o.timeUnit = e)
.read(() -> "price").float64(this, (o, d) -> o.price = d);
wire.write(() -> "message").text(message)
.write(() -> "number").int64(number)
.write(() -> "timeUnit").asEnum(timeUnit)
.write(() -> "price").float64(price);
To write a message
To read a message
A resizable buffer and a Wire format
message: Hello World number: 1234567890 code: SECONDS price: 10.5
In the YAML based TextWire
Binary YAML Wire
message: Hello World
number: 1234567890
code: SECONDS
price: 10.5
00000000 C7 6D 65 73 73 61 67 65 EB 48 65 6C 6C 6F 20 57 ·message ·Hello W
00000010 6F 72 6C 64 C6 6E 75 6D 62 65 72 A3 D2 02 96 49 orld·num ber····I
00000020 C4 63 6F 64 65 E7 53 45 43 4F 4E 44 53 C5 70 72 ·code·SE CONDS·pr
00000030 69 63 65 90 00 00 28 41 ice···(A
Lambdas and Junit tests
message: Hello World number: 1234567890 code: SECONDS price: 10.5
To read the data
To check the data without a data structure
wire.read(() -> "message").text(this, (o, s) -> o.message = s)
.read(() -> "number").int64(this, (o, i) -> o.number = i)
.read(() -> "timeUnit").asEnum(TimeUnit.class, this, (o, e) -> o.timeUnit = e)
.read(() -> "price").float64(this, (o, d) -> o.price = d);
wire.read(() -> "message").text("Hello World", Assert::assertEquals)
.read(() -> "number").int64(1234567890L, Assert::assertEquals)
.read(() -> "timeUnit").asEnum(TimeUnit.class, TimeUnit.SECONDS,Assert::assertEquals)
.read(() -> "price").float64(10.5, (o, d) -> assertEquals(o, d, 0));
Interchanging Enums and Lambdas
message: Hello World number: 1234567890 code: SECONDS price: 10.5
Enums and lambdas can both implement an interface.
Wherever you have used a non capturing lambda you can also use
an enum.
enum Field implements WireKey {
message, number, timeUnit, price;
}
@Override
public void writeMarshallable(WireOut wire) {
wire.write(Field.message).text(message)
.write(Field.number).int64(number)
.write(Field.timeUnit).asEnum(timeUnit)
.write(Field.price).float64(price);
}
When to use Enums
message: Hello World number: 1234567890 code: SECONDS price: 10.5
Enums have a number of benefits.
• They are easier to debug.
• The serialize much more efficiently.
• Its easier to manage a class of pre-defined enums to implement
your code, than lambdas which could be any where
Under https://github.com/OpenHFT/Chronicle-Engine search for
MapFunction and MapUpdater
When to use Lambdas
message: Hello World number: 1234567890 code: SECONDS price: 10.5
Lambdas have a number of benefits.
• They are simpler to write
• They support generics better
• They can capture values.
Where can I try this out?
message: Hello World number: 1234567890 code: SECONDS price: 10.5
The source for these micro-benchmarks are test are available
https://github.com/OpenHFT/Chronicle-Wire
Chronicle Engine with live subscriptions
https://github.com/OpenHFT/Chronicle-Engine
Q & A
Peter Lawrey
@PeterLawrey
http://chronicle.software
http://vanillajava.blogspot.com

More Related Content

What's hot (20)

PDF
Solrで多様なランキングモデルを活用するためのプラグイン開発 #SolrJP
Yahoo!デベロッパーネットワーク
 
PDF
The Integration of Laravel with Swoole
Albert Chen
 
PPTX
Apache Spark Architecture
Alexey Grishchenko
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PDF
Redis
imalik8088
 
PDF
A24 SQL Server におけるパフォーマンスチューニング手法 - 注目すべきポイントを簡単に by 多田典史
Insight Technology, Inc.
 
PPTX
Tutorial: Using GoBGP as an IXP connecting router
Shu Sugimoto
 
PPTX
JAVA_HOME/binにあるコマンド、いくつ使っていますか?[JVM関連ツール編](JJUGナイトセミナー「Java解析ツール特集」 発表資料)
NTT DATA Technology & Innovation
 
PPTX
Optimizing Apache Spark SQL Joins
Databricks
 
PPTX
java.lang.OutOfMemoryError #渋谷java
Yuji Kubota
 
PPTX
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Josh Elser
 
PDF
Parallelization of Structured Streaming Jobs Using Delta Lake
Databricks
 
PDF
[cb22] Hayabusa Threat Hunting and Fast Forensics in Windows environments fo...
CODE BLUE
 
PDF
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
Markus Michalewicz
 
PDF
Apache Impalaパフォーマンスチューニング #dbts2018
Cloudera Japan
 
PDF
[Meetup] a successful migration from elastic search to clickhouse
Vianney FOUCAULT
 
PPT
Open HFT libraries in @Java
Peter Lawrey
 
PDF
카프카, 산전수전 노하우
if kakao
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Solrで多様なランキングモデルを活用するためのプラグイン開発 #SolrJP
Yahoo!デベロッパーネットワーク
 
The Integration of Laravel with Swoole
Albert Chen
 
Apache Spark Architecture
Alexey Grishchenko
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Redis
imalik8088
 
A24 SQL Server におけるパフォーマンスチューニング手法 - 注目すべきポイントを簡単に by 多田典史
Insight Technology, Inc.
 
Tutorial: Using GoBGP as an IXP connecting router
Shu Sugimoto
 
JAVA_HOME/binにあるコマンド、いくつ使っていますか?[JVM関連ツール編](JJUGナイトセミナー「Java解析ツール特集」 発表資料)
NTT DATA Technology & Innovation
 
Optimizing Apache Spark SQL Joins
Databricks
 
java.lang.OutOfMemoryError #渋谷java
Yuji Kubota
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Josh Elser
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Databricks
 
[cb22] Hayabusa Threat Hunting and Fast Forensics in Windows environments fo...
CODE BLUE
 
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
Markus Michalewicz
 
Apache Impalaパフォーマンスチューニング #dbts2018
Cloudera Japan
 
[Meetup] a successful migration from elastic search to clickhouse
Vianney FOUCAULT
 
Open HFT libraries in @Java
Peter Lawrey
 
카프카, 산전수전 노하우
if kakao
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 

Similar to Low latency in java 8 by Peter Lawrey (20)

ODP
Writing and testing high frequency trading engines in java
Peter Lawrey
 
ODP
Low level java programming
Peter Lawrey
 
PPTX
Microservices for performance - GOTO Chicago 2016
Peter Lawrey
 
PPT
Advanced off heap ipc
Peter Lawrey
 
PPTX
Lambdas puzzler - Peter Lawrey
JAXLondon_Conference
 
PPT
Troubleshooting SQL Server
Stephen Rose
 
PPTX
Deterministic behaviour and performance in trading systems
Peter Lawrey
 
PPTX
Software architecture for data applications
Ding Li
 
PPT
High Frequency Trading and NoSQL database
Peter Lawrey
 
PPTX
Low latency microservices in java QCon New York 2016
Peter Lawrey
 
PDF
Deep learning with kafka
Nitin Kumar
 
PPTX
VISUG - Approaches for application request throttling
Maarten Balliauw
 
PDF
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Sean Zhong
 
PPTX
Approaches to application request throttling
Maarten Balliauw
 
PDF
"Surviving highload with Node.js", Andrii Shumada
Fwdays
 
PDF
Java Performance and Profiling
WSO2
 
PPTX
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
Adrian Cockcroft
 
PDF
Building a Database for the End of the World
jhugg
 
PDF
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Apache Apex
 
ODP
PoC: Using a Group Communication System to improve MySQL Replication HA
Ulf Wendel
 
Writing and testing high frequency trading engines in java
Peter Lawrey
 
Low level java programming
Peter Lawrey
 
Microservices for performance - GOTO Chicago 2016
Peter Lawrey
 
Advanced off heap ipc
Peter Lawrey
 
Lambdas puzzler - Peter Lawrey
JAXLondon_Conference
 
Troubleshooting SQL Server
Stephen Rose
 
Deterministic behaviour and performance in trading systems
Peter Lawrey
 
Software architecture for data applications
Ding Li
 
High Frequency Trading and NoSQL database
Peter Lawrey
 
Low latency microservices in java QCon New York 2016
Peter Lawrey
 
Deep learning with kafka
Nitin Kumar
 
VISUG - Approaches for application request throttling
Maarten Balliauw
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Sean Zhong
 
Approaches to application request throttling
Maarten Balliauw
 
"Surviving highload with Node.js", Andrii Shumada
Fwdays
 
Java Performance and Profiling
WSO2
 
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
Adrian Cockcroft
 
Building a Database for the End of the World
jhugg
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Apache Apex
 
PoC: Using a Group Communication System to improve MySQL Replication HA
Ulf Wendel
 
Ad

More from J On The Beach (20)

PDF
Massively scalable ETL in real world applications: the hard way
J On The Beach
 
PPTX
Big Data On Data You Don’t Have
J On The Beach
 
PPTX
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
J On The Beach
 
PDF
Pushing it to the edge in IoT
J On The Beach
 
PDF
Drinking from the firehose, with virtual streams and virtual actors
J On The Beach
 
PDF
How do we deploy? From Punched cards to Immutable server pattern
J On The Beach
 
PDF
Java, Turbocharged
J On The Beach
 
PDF
When Cloud Native meets the Financial Sector
J On The Beach
 
PDF
The big data Universe. Literally.
J On The Beach
 
PDF
Streaming to a New Jakarta EE
J On The Beach
 
PDF
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
J On The Beach
 
PDF
Pushing AI to the Client with WebAssembly and Blazor
J On The Beach
 
PDF
Axon Server went RAFTing
J On The Beach
 
PDF
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
J On The Beach
 
PDF
Madaari : Ordering For The Monkeys
J On The Beach
 
PDF
Servers are doomed to fail
J On The Beach
 
PDF
Interaction Protocols: It's all about good manners
J On The Beach
 
PDF
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
J On The Beach
 
PDF
Leadership at every level
J On The Beach
 
PDF
Machine Learning: The Bare Math Behind Libraries
J On The Beach
 
Massively scalable ETL in real world applications: the hard way
J On The Beach
 
Big Data On Data You Don’t Have
J On The Beach
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
J On The Beach
 
Pushing it to the edge in IoT
J On The Beach
 
Drinking from the firehose, with virtual streams and virtual actors
J On The Beach
 
How do we deploy? From Punched cards to Immutable server pattern
J On The Beach
 
Java, Turbocharged
J On The Beach
 
When Cloud Native meets the Financial Sector
J On The Beach
 
The big data Universe. Literally.
J On The Beach
 
Streaming to a New Jakarta EE
J On The Beach
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
J On The Beach
 
Pushing AI to the Client with WebAssembly and Blazor
J On The Beach
 
Axon Server went RAFTing
J On The Beach
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
J On The Beach
 
Madaari : Ordering For The Monkeys
J On The Beach
 
Servers are doomed to fail
J On The Beach
 
Interaction Protocols: It's all about good manners
J On The Beach
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
J On The Beach
 
Leadership at every level
J On The Beach
 
Machine Learning: The Bare Math Behind Libraries
J On The Beach
 
Ad

Recently uploaded (20)

PDF
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
PDF
Latest Capcut Pro 5.9.0 Crack Version For PC {Fully 2025
utfefguu
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
 
PPTX
Prompt Like a Pro. Leveraging Salesforce Data to Power AI Workflows.pptx
Dele Amefo
 
PPTX
From spreadsheets and delays to real-time control
SatishKumar2651
 
PPTX
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PDF
AI Prompts Cheat Code prompt engineering
Avijit Kumar Roy
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
Simplify React app login with asgardeo-sdk
vaibhav289687
 
PDF
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
PPTX
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
Latest Capcut Pro 5.9.0 Crack Version For PC {Fully 2025
utfefguu
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
 
Prompt Like a Pro. Leveraging Salesforce Data to Power AI Workflows.pptx
Dele Amefo
 
From spreadsheets and delays to real-time control
SatishKumar2651
 
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
AI Prompts Cheat Code prompt engineering
Avijit Kumar Roy
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Simplify React app login with asgardeo-sdk
vaibhav289687
 
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 

Low latency in java 8 by Peter Lawrey

  • 1. Peter Lawrey CEO of Higher Frequency Trading J On The Beach 2016 Low Latency in Java 8
  • 2. Peter Lawrey Java Developer/Consultant for investment banks and hedge funds for 8 years, 23 years in IT. Most answers for Java and JVM on stackoverflow.com Founder of the Performance Java User’s Group. Architect of Chronicle Software Java Champion
  • 3. Chronicle Software Help companies migrate to high performance Java code. Sponsor open source projects https://github.com/OpenHFT Licensed solutions Chronicle-FIX, Chronicle-Enterprise and Chronicle-Queue-Enterprise Offer one week proof of concept workshops, advanced Java training, consulting and bespoke development.
  • 6. Agenda • What is Low Latency? • What is an ultra low GC rate? – Eden space and Compressed OOPs. • How does Java 8 help? – Non-capturing lambdas – Serializing Lambdas – Escape Analysis • Optimise for Latency distribution not Throughput. – Flow control and avoiding co-ordinated omission. • Low latency code using Java 8.
  • 7. What is low latency? The term “low latency” can applied to a wide range of situations. A broad definition might be; Low latency means you have a view on how much the response time of a system costs your business. In this talk I will assume; Low latency means you care about latencies you can only measure as even the worst latencies are too fast to see.
  • 8. Even latencies you can’t see add up Data passing Latency Light over a fibre Throughput on at a time Method call Inlined: 0 Real call: 50 ns. 10 meters 20,000,000/sec Shared memory 200 ns 40 meters 5,000,000/sec SYSV Shared memory 2 µs 400 meters 500,000/sec Low latency network 8 µs 1.6 km 125,000/sec Typical LAN network 30 µs 6 km 30,000/sec Typical data grid system 800 µs 160 km 1,250/sec 60 Hz power flickers 8 ms 1600 km 120/sec 4G request latency in UK 55 ms 11,000 km 18/sec
  • 9. Why consistent performance matters. When you have high delays, even rarely, this two consequences - Customers tend to remember the worst service they ever got. You tend to lose the most money when are unable to react to events the longest. - In a busy system, a delayed in processing an event/message/request has a knock on effect many subsequent requests. A single delay can impact hundreds or even many thousands of requests.
  • 10. What is an ultra low GC? In a generational collector you have multiple generations of objects. Small to medium sized objects are created in the Eden space. When your Eden space fills up you trigger a Minor Collection. So how big can your Eden space be and still have Compressed OOPS?
  • 11. Where does the Eden space fit in Java 7 memory layout.
  • 12. Where does the Eden space fit in Java 8 memory layout.
  • 13. What are Compressed OOPS? 64-bit applications use 64-bit pointers. This take up more space than 32-bit pointers. For real applications this can mean up to 30% of memory needed. However, the 64-bit JVM can address the heap using 32-bit references. This saves memory and improves the efficiency of the cache as you can keep more objects in cache.
  • 15. What are Compressed OOPS? In Java 8; If the heap is between 32 GB and 64 GB … –XX:+UseCompressedOops –XX:ObjectAlignmentInBytes=16 The JVM can address anywhere on the heap by multiplying by 16 and adding a relative offset. You can increase the object alignment to 32 manually but at this point 64-bit references uses less memory.
  • 16. Ultra low GC Let say you have an Eden size of 48 GB. You will only get a garbage collection once the Eden space fills, ie you have used 48 GB. If you need to run for 24 hours without a GC, you can still produce 2 GB of garbage, and GC once per day. This is a rate of 500KB/s,
  • 17. What is an ultra low GC? Another reason to use ultra low garbage rates is your caches are not being filled with garbage and they work much more efficiently and consistently. If you have a web server which is producing 300 MB/s of garbage, this means less than 5% of the time you will be pausing for a GC. However, it does mean that you could be filling a 32 KB L1 CPU in around 0.1 milli-seconds. Your L2 cache fills with garbage every milli-second.
  • 18. So without a GC pause, no more pauses? Actually, a high percentage of pauses are not from the GC. The biggest ones are, but take away GC pauses and still see; - IO delays. These can be larger than GC pauses. - Network delays. - Waiting for databases. - Disk reads / writes. - OS interrupts. - It is not uncommon for your OS to stop your process for 5 ms or more. - Lock contention pauses.
  • 19. How does Java 8 help? The biggest improvement in Java 8 are; - Lambdas with no captured values and lambdas with reduced capture of variables. - More efficient than anonymous inner classes. - Escape Analysis to unpack objects onto the stack. - Short lived objects placed on the stack don’t create garbage.
  • 20. How do Lambdas help? Lambdas are like anonymous inner classes, however they are assigned to static variables if they don’t capture anything. public static Runnable helloWorld() { return () -> System.out.println("Hello World"); } public static Consumer<String> printMe() { // may create a new object each time = Garbage. return System.out::println; } public static Consumer<String> printMe2() { return x -> System.out.println(x); }
  • 21. How does Java 8 help? Lambdas When you call new on an anonymous inner classes, a new object is always created. Non capturing lambdas can be cached. Runnable r1 = helloWorld(); Runnable r2 = helloWorld(); System.out.println(r1 == r2); // prints true Consumer<String> c1 = printMe(); Consumer<String> c2 = printMe(); System.out.println(c1 == c2); // prints false Consumer<String> c3 = printMe2(); Consumer<String> c4 = printMe2(); System.out.println(c3 == c4); // prints true
  • 22. Serialization Lambdas capture less scope. This means it doesn’t capture this unless it has to, but it can capture things you don’t expect. Lambdas capture less scope. If you use this.a the value of a is copied. Note: if you use the :: notation, it will capture the left operand if it is a variable.
  • 23. interface SerializableConsumer<T> extends Consumer<T>, Serializable { } // throws java.io.NotSerializableException: java.io.PrintStream public SerializableConsumer<String> printMe() { return System.out::println; } public SerializableConsumer<String> printMe2() { return x -> System.out.println(x); } public SerializableConsumer<String> printMe3() { // throws java.io.NotSerializableException: A return new SerializableConsumer<String>() { @Override public void accept(String s) { System.out.println(s); } }; }
  • 24. Why Serialize a Lambda? Lambdas are designed to reduce boiler plate, and when you have a distributed system, they can be a powerful addition. The two lambdas are serialized on the client to be executed on the server. This example is from the RedisEmulator in Chronicle-Engine. public static long incrby(MapView<String, Long> map, String key, long toAdd) { return map.syncUpdateKey(key, v -> v + toAdd, v -> v); }
  • 25. Why Serialize a Lambda? public static Set<String> keys(MapView<String, ?> map, String pattern) { return map.applyTo(m -> { Pattern compile = Pattern.compile(pattern); return m.keySet().stream() .filter(k -> compile.matcher(k).matches()) .collect(Collectors.toSet()); }); } The lambda m-> { is serialized and executed on the server.
  • 26. // print userId which have a usageCounter > 10 // each time it is incremented (asynchronously) userMap.entrySet().query() .filter(e -> e.getValue().usageCounter > 10) .map(e -> e.getKey()) .subscribe(System.out::println); Why Serialize a Lambda? The filter/map lambdas are serialized. The subscribe lambda is executed asynchronously on the client.
  • 27. How does Java 8 help? Escape Analysis Escape Analysis can - Determine an object doesn’t escape a method so it can be placed on the stack. - Determine an object doesn’t escape a method so it doesn’t need to be synchronized.
  • 28. How does Java 8 help? Escape Analysis Escape Analysis works with inlining. After inlining, the JIT can see all the places an object is used. If it doesn’t escape the method it doesn’t need to be created and can be unpacked on the stack. This works for class objects, but not arrays currently. After “unpacking” on to the stack the object might be optimised away. E.g. say all the fields are set using local variables anyway.
  • 29. How does Java 8 help? Escape Analysis As of Java 8 update 60, the JITed code generated is still not as efficient as code written to no need these optimisation, however the JIT is getting closer to optimal.
  • 30. How does Java 8 help? Escape Analysis To parameters which control in-lining and the maximum method size for performing escape analysis are -XX:MaxBCEAEstimateSize=150 -XX:FreqInlineSize=325 For our software I favour -XX:MaxBCEAEstimateSize=450 -XX:FreqInlineSize=425
  • 31. Optimising for latency instead of throughput Measuring Throughput is a great way to hide bad service, or poor latencies. Your users however tend to be impacted by the poor services/latencies.
  • 32. Optimising for latency instead of throughput Average latency is largely an inverse of your throughput and no better. Using standard deviation for average latency is mis- leading at best as the distribution for latencies are not a normal distribution or anything like it. From Little’s Law Average latency = concurrency / throughput.
  • 33. How to hide bad performance with average latency Say you have a service which responds to 2 requests every milli- seconds, but once an hour it stops for 6 minutes. Without the 6 minute pause, the average latency would 0.5 ms. With the 6 minute latency, the number of tasks performed in an hour drop from 7.2 million to 6 million and the average latency is 0.6 ms. Conclusion: pausing for 6 minutes each hour doesn’t make much difference to our metric.
  • 34. How to find solvable problems with latency distributions Run a test with many time at a given rate. Time how long the task takes from when the test should have started, not when you actually started. Sort these timings and look at the 99%, 99.9%, 99.99% and worst numbers in your results. The 99% (or worst 1 in 100) will be much higher than your average latency and is more like to explain why your users see bad performance sometimes. Ideally you should reduce your 99.9% and even your worst latency.
  • 35. Is it fair to time from when the test should have started? Without having a view on when your tests should have started you don’t consider that if the system stalls, it could impact tens or even thousands of tasks. i.e. you can’t know the impact of a long delay. Unfortunately a lot of tools optimistically assume only 1 tasks ways delayed, but this is unrealistic.
  • 36. Co-ordinated omission Term coined by Gill Tene, CTO Azul Systems. Co-ordinated omission occurs when you have a benchmark which accepts flow control from the solution being tested. Flow control allows the system being tested to stop the benchmark when it is running slow. Most older benchmark tools allow this!!
  • 37. What is flow control? Most producer/consumer systems use some for of flow control. Flow control allows the consumer to stop the producer to prevent the system from becoming overloaded and cause a failure of the system. TCP/IP uses flow control for example. UDP doesn’t have flow control and if the consumer can’t keep up, the messages are lost.
  • 38. What is flow control? Chronicle Queue however uses an open ended persisted queue. This avoid the need for flow control as the consumer can be any amount behind the producer (to the limits of your disk capacity) A system without flow control is easier to reproduce for testing debugging and performance tuning purposes. You can test the producer or consumer in isolation as they don’t interact with one another.
  • 39. Why is flow control bad for performance measures? Flow control helps deal with period when the consumer can’t keep up. It does so by slowing down or stopping the producer. For performance testing this is like creating a blind spot for the producer which is the load generator. The load generator measures all the times the consumer can keep up, but significantly underrates when the consumer can’t keep up.
  • 40. Our example without co-ordinated omission Say you have a service which responds to 2 requests every milli- seconds, but once an hour it stops for 6 minutes. In the 6 minutes when the process stopped, how many tasks were delayed. The optimistic answer is 1, but the pessimistic answer is 2 every milli-second or 720,000.
  • 41. Our example without co-ordinated omission This is why the expect rate matters, and unless you see an expected throughput used in tested, there is a good chance co- ordinated omission occurred. Lets consider that the target throughput was 100 per second. In the 6 minutes at least 6 * 60 * 100 requests were delayed. Some 6 minutes, some 5 minutes, … 1 minute. However as the requests are being performed, more tests are being added. In the 18 seconds it takes process the waiting requests, about 1800 are added but soon the queue is empty.
  • 42. Co-ordinated omission 18,005 0.5 - 50,000 100,000 150,000 200,000 250,000 300,000 350,000 400,000 50% 80% 85% 90% 92% 94% 96% 98% 99% 99.20% 99.40% 99.60% 99.80% 99.90% 100.00% Delayinmilli-seconds Axis Title One big delay modelled without Co-ordinated omission and with CO Without CO With CO
  • 43. Optimising for latency instead of throughput Having enough throughput is only the start. You also ned to look at the consistency of your service. The next step is to look at your 99 percentile latencies (worst 1 in 100). After that you can look at your 99.9% tile, 99.99% tile and your worst latencies tested.
  • 44. A low latency API which uses Lambdas Chronicle Wire is a single API which supports multiple formats. You decide what data you want to read/write and independently you can chose the format. E.g. YAML, JSON, Binary YAML, XML. Using lambdas helped to simplify the API.
  • 45. A low latency API which uses Lambdas Timings are in micro-seconds with JMH. * Data was read/written to native memory. Wire Format Bytes 99.9 %tile 99.99 %tile 99.999 %tile worst JSONWire 100* 3.11 5.56 10.6 36.9 Jackson 100 4.95 8.3 1,400 1,500 Jackson + Chronicle-Bytes 100* 2.87 10.1 1,300 1,400 BSON 96 19.8 1,430 1,400 1,600 BSON + Chronicle-Bytes 96* 7.47 15.1 1,400 11,600 BOON Json 100 20.7 32.5 11,000 69,000 "price":1234,"longInt":1234567890,"smallInt":123,"flag":true,"text":"Hello World!","side":"Sell"
  • 46. A resizable buffer and a Wire format // Bytes which wraps a ByteBuffer which is resized as needed. Bytes<ByteBuffer> bytes = Bytes.elasticByteBuffer(); // YAML based wire format Wire wire = new TextWire(bytes); // or a binary YAML based wire format Bytes<ByteBuffer> bytes2 = Bytes.elasticByteBuffer(); Wire wire2 = new BinaryWire(bytes2); // or just data, no meta data. Bytes<ByteBuffer> bytes3 = Bytes.elasticByteBuffer(); Wire wire3 = new RawWire(bytes3);
  • 47. Low latency API using Lambdas (Wire) message: Hello World number: 1234567890 code: SECONDS price: 10.5 wire.read(() -> "message").text(this, (o, s) -> o.message = s) .read(() -> "number").int64(this, (o, i) -> o.number = i) .read(() -> "timeUnit").asEnum(TimeUnit.class, this, (o, e) -> o.timeUnit = e) .read(() -> "price").float64(this, (o, d) -> o.price = d); wire.write(() -> "message").text(message) .write(() -> "number").int64(number) .write(() -> "timeUnit").asEnum(timeUnit) .write(() -> "price").float64(price); To write a message To read a message
  • 48. A resizable buffer and a Wire format message: Hello World number: 1234567890 code: SECONDS price: 10.5 In the YAML based TextWire Binary YAML Wire message: Hello World number: 1234567890 code: SECONDS price: 10.5 00000000 C7 6D 65 73 73 61 67 65 EB 48 65 6C 6C 6F 20 57 ·message ·Hello W 00000010 6F 72 6C 64 C6 6E 75 6D 62 65 72 A3 D2 02 96 49 orld·num ber····I 00000020 C4 63 6F 64 65 E7 53 45 43 4F 4E 44 53 C5 70 72 ·code·SE CONDS·pr 00000030 69 63 65 90 00 00 28 41 ice···(A
  • 49. Lambdas and Junit tests message: Hello World number: 1234567890 code: SECONDS price: 10.5 To read the data To check the data without a data structure wire.read(() -> "message").text(this, (o, s) -> o.message = s) .read(() -> "number").int64(this, (o, i) -> o.number = i) .read(() -> "timeUnit").asEnum(TimeUnit.class, this, (o, e) -> o.timeUnit = e) .read(() -> "price").float64(this, (o, d) -> o.price = d); wire.read(() -> "message").text("Hello World", Assert::assertEquals) .read(() -> "number").int64(1234567890L, Assert::assertEquals) .read(() -> "timeUnit").asEnum(TimeUnit.class, TimeUnit.SECONDS,Assert::assertEquals) .read(() -> "price").float64(10.5, (o, d) -> assertEquals(o, d, 0));
  • 50. Interchanging Enums and Lambdas message: Hello World number: 1234567890 code: SECONDS price: 10.5 Enums and lambdas can both implement an interface. Wherever you have used a non capturing lambda you can also use an enum. enum Field implements WireKey { message, number, timeUnit, price; } @Override public void writeMarshallable(WireOut wire) { wire.write(Field.message).text(message) .write(Field.number).int64(number) .write(Field.timeUnit).asEnum(timeUnit) .write(Field.price).float64(price); }
  • 51. When to use Enums message: Hello World number: 1234567890 code: SECONDS price: 10.5 Enums have a number of benefits. • They are easier to debug. • The serialize much more efficiently. • Its easier to manage a class of pre-defined enums to implement your code, than lambdas which could be any where Under https://github.com/OpenHFT/Chronicle-Engine search for MapFunction and MapUpdater
  • 52. When to use Lambdas message: Hello World number: 1234567890 code: SECONDS price: 10.5 Lambdas have a number of benefits. • They are simpler to write • They support generics better • They can capture values.
  • 53. Where can I try this out? message: Hello World number: 1234567890 code: SECONDS price: 10.5 The source for these micro-benchmarks are test are available https://github.com/OpenHFT/Chronicle-Wire Chronicle Engine with live subscriptions https://github.com/OpenHFT/Chronicle-Engine
  • 54. Q & A Peter Lawrey @PeterLawrey http://chronicle.software http://vanillajava.blogspot.com