SlideShare a Scribd company logo
Brought to you by
Optimizing Servers for High
Throughput and Low Latency
Alexey Ivanov
Software Engineer at Dapper Labs
Alexey Ivanov
Software Engineer, Dapper Labs
■ Previously: Traffic, Networking, and Databases @Dropbox
■ Performance: Hardware. OS. Application. RUM.
Optimizing (web-)Servers 5 Years Later…
This is an updated version of the nginx.conf’17 talk.
Changelog:
■ New hardware features are available. AMD EPYCs and ARM64 are a thing.
■ New linux kernel features. Especially around observability.
■ Replace nginx with a generic HTTP-server/-client focus.
● (Most of the clients and servers nowadays are HTTP- or HTTP/2-based)
The biggest performance gains are usually gained via high-level optimizations:
load-balancing, algorithms,data structures, and (especially) business logic.
A few examples from large scale production systems.
■ The lower the variance in backend load – the better.
● Applying “Two Random Choices” load-balancing greatly reduced latencies.
■ The fastest code is “no code”.
● E.g. at Dropbox we’ve pre-compressed static files for web so we spent 0% CPU on it while
maintaining the best possible compression ratio.
■ Algorithm improvements.
● Switching from zlib to brotli saved us both CPU and storage.
■ Data locality improvements.
● Switching from B+tree to LSM-based storage improved compression efficiency and reduced
database sizes by ~2.5x.
High-level vs Low-level Optimizations
Hardware
CPU and Memory
Generally, picking the newest processor is the best choice since it will have the
most hardware offloads:
■ AVX2, BMI, ADX, AVX-512, AES-NI, SHA-NI (x86)
● (Symmetric/Asymmetric encryption, signatures, hashing, MACs)
■ PMUL, PMULL2, SHA256H, SHA3 (ARMv8.2+)
● (finite field arithmetic, hashing, MACs)
Many of the things that previously were prohibitively expensive now are almost
free due to hardware offloads: mTLS, crypto-hashing, storage encryption.
CPU and Memory (Cont’d)
What if budget is limited? Rules of thumb:
■ Low-latency: single NUMA-node, bigger caches, disabled SMT, more Ghz,
more memory channels.
■ High-throughput: more cores, enabled SMT, more memory.
Frequently, in production, high CPU usage does not mean a CPU bottleneck but a
“CPU pipeline stall” problem, i.e.: cache, TLB, or memory-bandwidth limitation.
Top-Down Analysis (TMA)
github.com/andikleen/pmu-tools
# toplev.py -l1 --single-thread --force-events ./app
BE Backend_Bound: 60.34%
This category reflects slots where no uops are being
delivered due to a lack of required resources for
accepting more uops in the Backend of the pipeline...
github.com/andikleen/pmu-tools
# toplev.py -l3 --single-thread --force-events ./app
BE Backend_Bound: 60.42%
BE/Mem Backend_Bound.Memory_Bound: 32.23%
BE/Mem Backend_Bound.Memory_Bound.L1_Bound: 32.44%
This metric represents how often CPU was stalled without
missing the L1 data cache...
BE/Core Backend_Bound.Core_Bound: 45.93%
BE/Core Backend_Bound.Core_Bound.Ports_Utilization: 45.93%
This metric represents cycles fraction application was
stalled due to Core computation issues (non divider-
related)...
NICs
Relevant only for real hardware, not clouds.
■ 25Gbits or more, older NICs would likely have misc bottlenecks.
■ Open-source drivers, small firmwares, active community.
● In case if (but most likely, “when”) issues occur.
Pressure Stall Information (PSI)
“PSI provides for the first time a canonical way to see resource pressure increases
as they develop, with new pressure metrics for three major resources—
memory, CPU, and IO.”
Source: https://facebookmicrosites.github.io/psi/docs/overview
PSI: global and Per-cgroup (v2)
$ cat /proc/pressure/io
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
$ cat /sys/fs/cgroup/cg1/io.pressure
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
Understanding
Software Dynamics
by Richard L. Sites
Linux Kernel
Kernel Optimizations
The best Linux optimization is the recent kernel version. New kernel versions bring
improvements to networking, memory management, io, and the rest of linux
subsystems.
But most importantly they bring improvements to observability tooling.
CPU and Memory
After you’ve picked the best CPU for your workload, you’ll need to utilize it to the
max:
■ For Intel/AMD you would want to use intel_pstate or amd-pstate driver.
● If you want to be more energy efficient you may consider using schedutil governor. Use
performance otherwise.
■ Set NUMA affinity for your application.
■ Use transparent huge pages.
● Careful here, this may lead to reduction in performance on some workloads.
Networking
The main goal of low-level tuning is to parallelize packet processing, add affinities,
increase buffer sizes, and enable hardware offloads.
■ ethtool is your friend here: # of queues, ring buffers, offloads, coalescing.
● -L, -G, -K, -C, etc.
● -S is your friend to keep track of drops/misses/errors/overruns/etc.
■ Mellanox and Intel cards come with set_irq_affinity/mlnx_affinity.
● Do not forget to turn off irqbalance.
■ After RSS is enabled it is generally a good idea to turn on XPS and xps_rxqs.
■ Avoid RPS. RFS can also have negative consequences.
■ For low latency: try to stay within the NUMA node PCIe NIC is attached to.
The main goal of high-level tuning is to remove transport-level bottlenecks.
■ Enabling BBR congestion control is generally a good idea.
■ Enabling FQ scheduler w/ pacing is always a good idea.
■ Your friends here are RUM metrics and
ss -n --extended --info or getsockopt(TCP_INFO/TCP_CC_INFO)
Networking (Cont’d)
iproute2
$ ss -tie
…
ts sack bbr rto:220 rtt:16.139/10.041 ato:40 mss:1448 pmtu:1500 rcvmss:1269
advmss:1428 cwnd:106 ssthresh:52 bytes_sent:9070462 bytes_retrans:3375
bytes_acked:9067087 bytes_received:5775 segs_out:6327 segs_in:551
data_segs_out:6315 data_segs_in:12
bbr:(bw:99.5Mbps,mrtt:1.912,pacing_gain:1,cwnd_gain:2) send 76.1Mbps
lastsnd:9896 lastrcv:10944 lastack:9864 pacing_rate 98.5Mbps delivery_rate
27.9Mbps delivered:6316 busy:3020ms rwnd_limited:2072ms(68.6%) retrans:0/5
dsack_dups:5 rcv_rtt:16.125 rcv_space:14400 rcv_ssthresh:65535 minrtt:1.907
…
It is impossible to talk about network tuning w/o mentioning sysctls. Here is a
couple of a relatively safe ones.
■ net.ipv4.tcp_slow_start_after_idle=0
● Should be safe if FQ w/ pacing is enabled.
■ net.ipv4.tcp_mtu_probing=1
● Must have on the edge (along with a slightly reduced advmss)
■ net.ipv4.tcp_rmem, net.ipv4.tcp_wmem
● Should be big enough for connections to not be rcv/snd window limited.
■ net.ipv4.tcp_notsent_lowat=262144
● Or even lower if HTTP/2 prioritization is used.
Sysctl Cargo Culting
Systems
Performance
BPF
Performance Tools
by Brendan Gregg
Application
Compiler Flags, Toolchains, and Runtimes
Keeping you compiler/runtime up-to-date is generally a good idea.
■ Compiler upgrade, -O2, and -mtune can visibly affect performance.
● You can also try keeping -march/GOAMD64 in sync with your (cloud) hardware.
■ Link time optimization (LTO) can give a measurable perf boost.
■ Runtime upgrade can frequently give you single to double digit perf
improvements.
● For example, Go runtime upgrades frequently deliver memory/cpu usage improvements.
■ (Toolchain upgrades are also great from the security perspective)
Profile-guided Optimization and Beyond
Most compilers are capable of PGO based on `perf record` profiles.
■ Clang has AutoFDO.
■ Golang would likely have Feedback-Guided Optimization in 1.20.
You can go beyond compile-time optimization and use post-link optimizer:
■ Facebook’s BOLT is now a part of LLVM:
https://github.com/llvm/llvm-project/tree/main/bolt
Any modern application consists of a myriad of libraries. Most servers nowadays
would have allocator, TLS, compression, and serialization libraries. These are the
main candidates for tuning. For example in case of C/C++ servers:
■ Keeping libraries up-to-date is important.
● It doesn’t matter whether CPU supports AVX2 if your library can’t use it.
■ Changing malloc implementation is an option.
● Both jemalloc and tcmalloc have excellent tuning guides.
■ BoringSSL can (mostly) be used as a drop–in replacement for OpenSSL.
● Often switching from RSA to ECDSA, or from AES to ChaCha (or back) can improve perf.
■ zlib has multiple performance-oriented forks.
● Intel, Cloudflare, zlib-ng.
● Sometimes more efficient algorithms like brotli or zstd can be used instead.
Libraries
Designing
Data-Intensive
Applications
by Martin Kleppmann
Site
Reliability
Engineering
Chapter 19. Load Balancing at the Frontend
Chapter 20. Load Balancing in the Datacenter
Chapter 21. Handling Overload
Chapter 22. Addressing Cascading Failures
Brought to you by
Alexey Ivanov
rbtz@dapperlabs.com
@SaveTheRbtz
Ad

More Related Content

What's hot (20)

Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
Brendan Gregg
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
Open Source Consulting
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
Jose De La Rosa
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
lcplcp1
 
Transparent Hugepages in RHEL 6
Transparent Hugepages in RHEL 6 Transparent Hugepages in RHEL 6
Transparent Hugepages in RHEL 6
Raghu Udiyar
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Giuseppe Paterno'
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
ScyllaDB
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
Brendan Gregg
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
【SRX】JUNOS ハンズオントレーニング資料 SRXシリーズ サービス ゲートウェイ コース
【SRX】JUNOS ハンズオントレーニング資料 SRXシリーズ サービス ゲートウェイ コース【SRX】JUNOS ハンズオントレーニング資料 SRXシリーズ サービス ゲートウェイ コース
【SRX】JUNOS ハンズオントレーニング資料 SRXシリーズ サービス ゲートウェイ コース
Juniper Networks (日本)
 
Linux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA'sLinux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA's
Mydbops
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
NAVER D2
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
Brendan Gregg
 
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
Yasunori Goto
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
Ceph Community
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
Brendan Gregg
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
Jignesh Shah
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?
Pradeep Kumar
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
Brendan Gregg
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
Jose De La Rosa
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
lcplcp1
 
Transparent Hugepages in RHEL 6
Transparent Hugepages in RHEL 6 Transparent Hugepages in RHEL 6
Transparent Hugepages in RHEL 6
Raghu Udiyar
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Giuseppe Paterno'
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
ScyllaDB
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
Brendan Gregg
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
【SRX】JUNOS ハンズオントレーニング資料 SRXシリーズ サービス ゲートウェイ コース
【SRX】JUNOS ハンズオントレーニング資料 SRXシリーズ サービス ゲートウェイ コース【SRX】JUNOS ハンズオントレーニング資料 SRXシリーズ サービス ゲートウェイ コース
【SRX】JUNOS ハンズオントレーニング資料 SRXシリーズ サービス ゲートウェイ コース
Juniper Networks (日本)
 
Linux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA'sLinux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA's
Mydbops
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
NAVER D2
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
Brendan Gregg
 
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
Yasunori Goto
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
Ceph Community
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
Brendan Gregg
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
Jignesh Shah
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?
Pradeep Kumar
 

Similar to Optimizing Servers for High-Throughput and Low-Latency at Dropbox (20)

CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Coburn Watson
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
inside-BigData.com
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Tommy Lee
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Ontico
 
R&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsR&D work on pre exascale HPC systems
R&D work on pre exascale HPC systems
Joshua Mora
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
Ganesan Narayanasamy
 
PLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP MoonshotPLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP Moonshot
PROIDEA
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
ScyllaDB
 
Optimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
Optimization of OpenNebula VMs for Higher Performance - Boyan KrosnovOptimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
Optimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
OpenNebula Project
 
Optimization_of_Virtual_Machines_for_High_Performance
Optimization_of_Virtual_Machines_for_High_PerformanceOptimization_of_Virtual_Machines_for_High_Performance
Optimization_of_Virtual_Machines_for_High_Performance
StorPool Storage
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
madhuinturi
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
inside-BigData.com
 
Zendcon scaling magento
Zendcon scaling magentoZendcon scaling magento
Zendcon scaling magento
Mathew Beane
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
IO Visor Project
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
Redge Technologies
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
Alan Sill
 
Icg hpc-user
Icg hpc-userIcg hpc-user
Icg hpc-user
gdburton
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Coburn Watson
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
inside-BigData.com
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Tommy Lee
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Ontico
 
R&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsR&D work on pre exascale HPC systems
R&D work on pre exascale HPC systems
Joshua Mora
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
Ganesan Narayanasamy
 
PLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP MoonshotPLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP Moonshot
PROIDEA
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
ScyllaDB
 
Optimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
Optimization of OpenNebula VMs for Higher Performance - Boyan KrosnovOptimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
Optimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
OpenNebula Project
 
Optimization_of_Virtual_Machines_for_High_Performance
Optimization_of_Virtual_Machines_for_High_PerformanceOptimization_of_Virtual_Machines_for_High_Performance
Optimization_of_Virtual_Machines_for_High_Performance
StorPool Storage
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
madhuinturi
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
inside-BigData.com
 
Zendcon scaling magento
Zendcon scaling magentoZendcon scaling magento
Zendcon scaling magento
Mathew Beane
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
IO Visor Project
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
Alan Sill
 
Icg hpc-user
Icg hpc-userIcg hpc-user
Icg hpc-user
gdburton
 
Ad

More from ScyllaDB (20)

Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
Leading a High-Stakes Database Migration
Leading a High-Stakes Database MigrationLeading a High-Stakes Database Migration
Leading a High-Stakes Database Migration
ScyllaDB
 
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & TradeoffsAchieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn IsarathamHow Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd ColemanHow Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB: 10 Years and Beyond by Dor LaorScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Reduce Your Cloud Spend with ScyllaDB by Tzach LivyatanReduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence LiuMigrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
Vector Search with ScyllaDB by Szymon Wasik
Vector Search with ScyllaDB by Szymon WasikVector Search with ScyllaDB by Szymon Wasik
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Object Storage in ScyllaDB by Ran Regev, ScyllaDBObject Storage in ScyllaDB by Ran Regev, ScyllaDB
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Lessons Learned from Building a Serverless Notifications System by Srushith R...Lessons Learned from Building a Serverless Notifications System by Srushith R...
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
A Dist Sys Programmer's Journey into AI by Piotr Sarna
A Dist Sys Programmer's Journey into AI by Piotr SarnaA Dist Sys Programmer's Journey into AI by Piotr Sarna
A Dist Sys Programmer's Journey into AI by Piotr Sarna
ScyllaDB
 
High Availability: Lessons Learned by Paul Preuveneers
High Availability: Lessons Learned by Paul PreuveneersHigh Availability: Lessons Learned by Paul Preuveneers
High Availability: Lessons Learned by Paul Preuveneers
ScyllaDB
 
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
ScyllaDB
 
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
ScyllaDB
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
Leading a High-Stakes Database Migration
Leading a High-Stakes Database MigrationLeading a High-Stakes Database Migration
Leading a High-Stakes Database Migration
ScyllaDB
 
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & TradeoffsAchieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn IsarathamHow Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd ColemanHow Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB: 10 Years and Beyond by Dor LaorScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Reduce Your Cloud Spend with ScyllaDB by Tzach LivyatanReduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence LiuMigrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
Vector Search with ScyllaDB by Szymon Wasik
Vector Search with ScyllaDB by Szymon WasikVector Search with ScyllaDB by Szymon Wasik
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Object Storage in ScyllaDB by Ran Regev, ScyllaDBObject Storage in ScyllaDB by Ran Regev, ScyllaDB
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Lessons Learned from Building a Serverless Notifications System by Srushith R...Lessons Learned from Building a Serverless Notifications System by Srushith R...
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
A Dist Sys Programmer's Journey into AI by Piotr Sarna
A Dist Sys Programmer's Journey into AI by Piotr SarnaA Dist Sys Programmer's Journey into AI by Piotr Sarna
A Dist Sys Programmer's Journey into AI by Piotr Sarna
ScyllaDB
 
High Availability: Lessons Learned by Paul Preuveneers
High Availability: Lessons Learned by Paul PreuveneersHigh Availability: Lessons Learned by Paul Preuveneers
High Availability: Lessons Learned by Paul Preuveneers
ScyllaDB
 
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
ScyllaDB
 
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
ScyllaDB
 
Ad

Recently uploaded (20)

Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 

Optimizing Servers for High-Throughput and Low-Latency at Dropbox

  • 1. Brought to you by Optimizing Servers for High Throughput and Low Latency Alexey Ivanov Software Engineer at Dapper Labs
  • 2. Alexey Ivanov Software Engineer, Dapper Labs ■ Previously: Traffic, Networking, and Databases @Dropbox ■ Performance: Hardware. OS. Application. RUM.
  • 3. Optimizing (web-)Servers 5 Years Later… This is an updated version of the nginx.conf’17 talk. Changelog: ■ New hardware features are available. AMD EPYCs and ARM64 are a thing. ■ New linux kernel features. Especially around observability. ■ Replace nginx with a generic HTTP-server/-client focus. ● (Most of the clients and servers nowadays are HTTP- or HTTP/2-based)
  • 4. The biggest performance gains are usually gained via high-level optimizations: load-balancing, algorithms,data structures, and (especially) business logic. A few examples from large scale production systems. ■ The lower the variance in backend load – the better. ● Applying “Two Random Choices” load-balancing greatly reduced latencies. ■ The fastest code is “no code”. ● E.g. at Dropbox we’ve pre-compressed static files for web so we spent 0% CPU on it while maintaining the best possible compression ratio. ■ Algorithm improvements. ● Switching from zlib to brotli saved us both CPU and storage. ■ Data locality improvements. ● Switching from B+tree to LSM-based storage improved compression efficiency and reduced database sizes by ~2.5x. High-level vs Low-level Optimizations
  • 6. CPU and Memory Generally, picking the newest processor is the best choice since it will have the most hardware offloads: ■ AVX2, BMI, ADX, AVX-512, AES-NI, SHA-NI (x86) ● (Symmetric/Asymmetric encryption, signatures, hashing, MACs) ■ PMUL, PMULL2, SHA256H, SHA3 (ARMv8.2+) ● (finite field arithmetic, hashing, MACs) Many of the things that previously were prohibitively expensive now are almost free due to hardware offloads: mTLS, crypto-hashing, storage encryption.
  • 7. CPU and Memory (Cont’d) What if budget is limited? Rules of thumb: ■ Low-latency: single NUMA-node, bigger caches, disabled SMT, more Ghz, more memory channels. ■ High-throughput: more cores, enabled SMT, more memory. Frequently, in production, high CPU usage does not mean a CPU bottleneck but a “CPU pipeline stall” problem, i.e.: cache, TLB, or memory-bandwidth limitation.
  • 9. github.com/andikleen/pmu-tools # toplev.py -l1 --single-thread --force-events ./app BE Backend_Bound: 60.34% This category reflects slots where no uops are being delivered due to a lack of required resources for accepting more uops in the Backend of the pipeline...
  • 10. github.com/andikleen/pmu-tools # toplev.py -l3 --single-thread --force-events ./app BE Backend_Bound: 60.42% BE/Mem Backend_Bound.Memory_Bound: 32.23% BE/Mem Backend_Bound.Memory_Bound.L1_Bound: 32.44% This metric represents how often CPU was stalled without missing the L1 data cache... BE/Core Backend_Bound.Core_Bound: 45.93% BE/Core Backend_Bound.Core_Bound.Ports_Utilization: 45.93% This metric represents cycles fraction application was stalled due to Core computation issues (non divider- related)...
  • 11. NICs Relevant only for real hardware, not clouds. ■ 25Gbits or more, older NICs would likely have misc bottlenecks. ■ Open-source drivers, small firmwares, active community. ● In case if (but most likely, “when”) issues occur.
  • 12. Pressure Stall Information (PSI) “PSI provides for the first time a canonical way to see resource pressure increases as they develop, with new pressure metrics for three major resources— memory, CPU, and IO.” Source: https://facebookmicrosites.github.io/psi/docs/overview
  • 13. PSI: global and Per-cgroup (v2) $ cat /proc/pressure/io some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 $ cat /sys/fs/cgroup/cg1/io.pressure some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0
  • 16. Kernel Optimizations The best Linux optimization is the recent kernel version. New kernel versions bring improvements to networking, memory management, io, and the rest of linux subsystems. But most importantly they bring improvements to observability tooling.
  • 17. CPU and Memory After you’ve picked the best CPU for your workload, you’ll need to utilize it to the max: ■ For Intel/AMD you would want to use intel_pstate or amd-pstate driver. ● If you want to be more energy efficient you may consider using schedutil governor. Use performance otherwise. ■ Set NUMA affinity for your application. ■ Use transparent huge pages. ● Careful here, this may lead to reduction in performance on some workloads.
  • 18. Networking The main goal of low-level tuning is to parallelize packet processing, add affinities, increase buffer sizes, and enable hardware offloads. ■ ethtool is your friend here: # of queues, ring buffers, offloads, coalescing. ● -L, -G, -K, -C, etc. ● -S is your friend to keep track of drops/misses/errors/overruns/etc. ■ Mellanox and Intel cards come with set_irq_affinity/mlnx_affinity. ● Do not forget to turn off irqbalance. ■ After RSS is enabled it is generally a good idea to turn on XPS and xps_rxqs. ■ Avoid RPS. RFS can also have negative consequences. ■ For low latency: try to stay within the NUMA node PCIe NIC is attached to.
  • 19. The main goal of high-level tuning is to remove transport-level bottlenecks. ■ Enabling BBR congestion control is generally a good idea. ■ Enabling FQ scheduler w/ pacing is always a good idea. ■ Your friends here are RUM metrics and ss -n --extended --info or getsockopt(TCP_INFO/TCP_CC_INFO) Networking (Cont’d)
  • 20. iproute2 $ ss -tie … ts sack bbr rto:220 rtt:16.139/10.041 ato:40 mss:1448 pmtu:1500 rcvmss:1269 advmss:1428 cwnd:106 ssthresh:52 bytes_sent:9070462 bytes_retrans:3375 bytes_acked:9067087 bytes_received:5775 segs_out:6327 segs_in:551 data_segs_out:6315 data_segs_in:12 bbr:(bw:99.5Mbps,mrtt:1.912,pacing_gain:1,cwnd_gain:2) send 76.1Mbps lastsnd:9896 lastrcv:10944 lastack:9864 pacing_rate 98.5Mbps delivery_rate 27.9Mbps delivered:6316 busy:3020ms rwnd_limited:2072ms(68.6%) retrans:0/5 dsack_dups:5 rcv_rtt:16.125 rcv_space:14400 rcv_ssthresh:65535 minrtt:1.907 …
  • 21. It is impossible to talk about network tuning w/o mentioning sysctls. Here is a couple of a relatively safe ones. ■ net.ipv4.tcp_slow_start_after_idle=0 ● Should be safe if FQ w/ pacing is enabled. ■ net.ipv4.tcp_mtu_probing=1 ● Must have on the edge (along with a slightly reduced advmss) ■ net.ipv4.tcp_rmem, net.ipv4.tcp_wmem ● Should be big enough for connections to not be rcv/snd window limited. ■ net.ipv4.tcp_notsent_lowat=262144 ● Or even lower if HTTP/2 prioritization is used. Sysctl Cargo Culting
  • 24. Compiler Flags, Toolchains, and Runtimes Keeping you compiler/runtime up-to-date is generally a good idea. ■ Compiler upgrade, -O2, and -mtune can visibly affect performance. ● You can also try keeping -march/GOAMD64 in sync with your (cloud) hardware. ■ Link time optimization (LTO) can give a measurable perf boost. ■ Runtime upgrade can frequently give you single to double digit perf improvements. ● For example, Go runtime upgrades frequently deliver memory/cpu usage improvements. ■ (Toolchain upgrades are also great from the security perspective)
  • 25. Profile-guided Optimization and Beyond Most compilers are capable of PGO based on `perf record` profiles. ■ Clang has AutoFDO. ■ Golang would likely have Feedback-Guided Optimization in 1.20. You can go beyond compile-time optimization and use post-link optimizer: ■ Facebook’s BOLT is now a part of LLVM: https://github.com/llvm/llvm-project/tree/main/bolt
  • 26. Any modern application consists of a myriad of libraries. Most servers nowadays would have allocator, TLS, compression, and serialization libraries. These are the main candidates for tuning. For example in case of C/C++ servers: ■ Keeping libraries up-to-date is important. ● It doesn’t matter whether CPU supports AVX2 if your library can’t use it. ■ Changing malloc implementation is an option. ● Both jemalloc and tcmalloc have excellent tuning guides. ■ BoringSSL can (mostly) be used as a drop–in replacement for OpenSSL. ● Often switching from RSA to ECDSA, or from AES to ChaCha (or back) can improve perf. ■ zlib has multiple performance-oriented forks. ● Intel, Cloudflare, zlib-ng. ● Sometimes more efficient algorithms like brotli or zstd can be used instead. Libraries
  • 28. Site Reliability Engineering Chapter 19. Load Balancing at the Frontend Chapter 20. Load Balancing in the Datacenter Chapter 21. Handling Overload Chapter 22. Addressing Cascading Failures
  • 29. Brought to you by Alexey Ivanov [email protected] @SaveTheRbtz