SlideShare a Scribd company logo
ยฉ 2018 NETRONOME SYSTEMS, INC.
John Hurley
Open vSwitch 2018 Fall Conference
Offloading Linux LAG devices Via
Open vSwitch and TC
ยฉ 2018 NETRONOME SYSTEMS, INC. 2
Offloading Flows With TC Flower
ovs-vswitchd
TC Flower OVS Datapath
Nic Driver (NFP)
(nfp_p0) (nfp_p1) (nfp_v0.0)
VM 1 VM 2
ovs-dpctl dump-flows
in_port(2),eth_type(0x0800),ipv4(proto=6,frag=no), packets:98,
bytes:14316, used:5.351s, actions:3
tc -s filter show dev nfp_v0.0 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_proto tcp
ip_flags nofrag
in_hw
action order 1: mirred (Egress Redirect to device nfp_p0) stolen
index 1 ref 1 bind 1 installed 12 sec used 11 sec
Action statistics:
Sent 14316 bytes 98 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
ยฉ 2018 NETRONOME SYSTEMS, INC. 3
Offload Performance on SmartNIC
โ— VXLAN
encapsulated traffic
โ— Open vSwitch rules
offloaded to
SmartNIC via TC
โ— Traffic sent in
physical port
โ— Forwarded to VM
and bounced back
on different port
ยฉ 2018 NETRONOME SYSTEMS, INC. 4
LAG Devices and Representors
โ— Link Aggregation (LAG) - combine multiple ports to act as single
โ—‹ Load balance to increase bandwidth
โ—‹ Active/backup failover
โ— Open vSwitch bonds
โ—‹ Combination of OvS kernel, OvS bond, LACP, high throughput - link flapping
โ— Linux LAG devices
โ—‹ Linux bond, Team
โ—‹ Deployed in OpenStack environments
โ— What if a Linux LAG upper device has โ€˜offloadableโ€™ lower devices and is
added to an OVS bridge?
ip link show
35: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
37: nfp_p0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP
38: nfp_p1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP
ยฉ 2018 NETRONOME SYSTEMS, INC. 5
Kernel
Egress Offload
โ— NETDEV_CHANGELOWERSTATE
โ—‹ Record active/backup port states
โ— NETDEV_CHANGEUPPER
โ—‹ Track new LAG upper and lower devs
โ— Packet hash on NIC gets egress port
tc -s filter show dev nfp_v0.0 ingress
filter protocol ip pref 49152 flower chain 0 handle 0x1
eth_type ipv4
in_hw
action order 1: mirred (Egress Redirect to device bond0) stolen
Team
NFP
TC
SmartNIC
Bond
Group 1:
nfp_p0
nfp_p1
_______
Group 2:
โ€ฆ...
match: nfp_v0.0, ipv4, action: Group 1
ยฉ 2018 NETRONOME SYSTEMS, INC. 6
โ— SmartNIC is not aware of LAG devices (in-kernel representation)
โ— If โ€˜bond0โ€™ contains offloadable ports there is a need to:
โ—‹ Distribute filters to all lower devices
โ—‹ Combine stats from all lower device offload to LAG upper device/flower rule
โ— No offload callback in the LAG drivers
โ—‹ Difficult for SmartNIC driver to track changes
โ— Are there any TC features we can make use of to solve this?
Ingress Offload
tc -s filter show dev bond0 ingress
filter protocol ip pref 49152 flower chain 0 handle 0x1
eth_type ipv4
not_in_hw
action order 1: mirred (Egress Redirect to device nfp_v0.0) stolen
ยฉ 2018 NETRONOME SYSTEMS, INC. 7
TC Shared Blocks
โ— Introduced in Kernel 4.16
โ— Each ingress qdisc has its own set of chains and filters called โ€˜blocksโ€™
โ— Shared blocks allow multiple qdiscs/netdevs to use the same chains/filters
โ— Prevent duplicating rules which may not scale on, say, TCAM device offload
โ— Each qdisc/netdev on block reports same filters and stats
nfp_p0
ingress qdisc
nfp_p1
ingress qdisc
Block X tc qdisc add dev nfp_p0 ingress_block 22 ingress
tc qdisc add dev nfp_p1 ingress_block 22 ingress
tc filter add block 22 protocol ip parent ffff: flower
ip_proto tcp skip_sw action drop
ยฉ 2018 NETRONOME SYSTEMS, INC. 8
TC Shared Block as LAG Representations
โ— Grouping LAG lower devices in shared blocks along with their upper device
โ—‹ LAG netdev hierarchy not influenced outside the TC layer
โ—‹ All lower devices receive same filters applied to master
โ—‹ Effective distribution of offloaded filters - all offload ports get callback
โ—‹ Stats correctly handled by default
Block X
nfp_p0
lower netdev
nfp_p1
lower netdev
bond0
upper netdev
nfp_p0
lower netdev
nfp_p1
lower netdev
bond0
upper netdev
ยฉ 2018 NETRONOME SYSTEMS, INC. 9
TC Shared Block Offload and Re-offload
โ— TC shared blocks call offload hook for each netdev per filter
โ—‹ Cannot (in 4.16 - 4.18) add new qdiscs/netdevs to block if it has offloaded rules
โ—‹ Filter deletion offload hooks are only triggered on block deletion
โ—‹ Removing a netdev from a shared block may still leave offloaded rules
โ— LAG devices by their nature require flexible addition/removal of netdevs
โ— [PATCH 0/7] net: sched: support replay of filter offload when binding to
block (kernel 4.19)
โ— Add the ability to replay offloaded filters when a new callback is registered
โ— Replay โ€˜deleteโ€™ filter messages for each netdev on block withdrawal
ยฉ 2018 NETRONOME SYSTEMS, INC. 10
Working with Open vSwitch
Open vSwitch 2.10
[Patch 0/6] offload Linux LAG
devices to the TC datapath
โ— Add shared block ID support
to OVS-TC API
โ— Track Linux kernel netdevs
and record LAG info
โ— If a LAG upper dev is added
to the OVS bridge, assign its
qdisc a unique block ID
โ— If a lower devโ€™s related upper
dev is on the OVS bridge
then associate it with the
upper devices block
ovs-vswitchd
TC Datapath
Upper Dev Qdisc
Lower Dev 1
Qdisc
Lower Dev 2
Qdisc
NFP Driver
NFP SmartNIC
Block X
Lower Device 1 filters/stats
Lower Device 2 filters/stats
User Space
Kernel
SmartNIC
ยฉ 2018 NETRONOME SYSTEMS, INC.
Thank You
Ad

More Related Content

What's hot (20)

Nmap(network mapping)
Nmap(network mapping)Nmap(network mapping)
Nmap(network mapping)
SSASIT
ย 
Alessio Lama - Development and testing of a safety network protocol
Alessio Lama - Development and testing of a safety network protocolAlessio Lama - Development and testing of a safety network protocol
Alessio Lama - Development and testing of a safety network protocol
linuxlab_conf
ย 
Loco Positioning System - FOSDEM 2017
Loco Positioning System - FOSDEM 2017Loco Positioning System - FOSDEM 2017
Loco Positioning System - FOSDEM 2017
bitcraze
ย 
Understanding iptables
Understanding iptablesUnderstanding iptables
Understanding iptables
Denys Haryachyy
ย 
Packet sniffing in switched LANs
Packet sniffing in switched LANsPacket sniffing in switched LANs
Packet sniffing in switched LANs
Ishraq Al Fataftah
ย 
Ostinato FOSS.IN 2010
Ostinato FOSS.IN 2010Ostinato FOSS.IN 2010
Ostinato FOSS.IN 2010
pstavirs
ย 
NMap
NMapNMap
NMap
Pritesh Raka
ย 
Open Source Tools for the Systems Administrator
Open Source Tools for the Systems AdministratorOpen Source Tools for the Systems Administrator
Open Source Tools for the Systems Administrator
Charles Profitt
ย 
Scanning with nmap
Scanning with nmapScanning with nmap
Scanning with nmap
commiebstrd
ย 
How Microsoft will MiTM your network
How Microsoft will MiTM your networkHow Microsoft will MiTM your network
How Microsoft will MiTM your network
Brandon DeVault
ย 
Understanding NMAP
Understanding NMAPUnderstanding NMAP
Understanding NMAP
Phannarith Ou, G-CISO
ย 
N map presentation
N map presentationN map presentation
N map presentation
ulirraptor
ย 
Nmap and metasploitable
Nmap and metasploitableNmap and metasploitable
Nmap and metasploitable
Mohammed Akbar Shariff
ย 
Port scanning
Port scanningPort scanning
Port scanning
Hemanth Pasumarthi
ย 
Network Mapper (NMAP)
Network Mapper (NMAP)Network Mapper (NMAP)
Network Mapper (NMAP)
KHNOG
ย 
Port Scanning Overview
Port Scanning  OverviewPort Scanning  Overview
Port Scanning Overview
Publicly traded global multi-billion services company
ย 
Nmap
NmapNmap
Nmap
Fat-Thing Gabriel-Culley
ย 
Things you should know for network programming
Things you should know for network programmingThings you should know for network programming
Things you should know for network programming
Anry Lu
ย 
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]Ostinato - Craft Packets, Generate Traffic [SharkFest '20]
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]
pstavirs
ย 
Uart
UartUart
Uart
cs1090211
ย 
Nmap(network mapping)
Nmap(network mapping)Nmap(network mapping)
Nmap(network mapping)
SSASIT
ย 
Alessio Lama - Development and testing of a safety network protocol
Alessio Lama - Development and testing of a safety network protocolAlessio Lama - Development and testing of a safety network protocol
Alessio Lama - Development and testing of a safety network protocol
linuxlab_conf
ย 
Loco Positioning System - FOSDEM 2017
Loco Positioning System - FOSDEM 2017Loco Positioning System - FOSDEM 2017
Loco Positioning System - FOSDEM 2017
bitcraze
ย 
Understanding iptables
Understanding iptablesUnderstanding iptables
Understanding iptables
Denys Haryachyy
ย 
Packet sniffing in switched LANs
Packet sniffing in switched LANsPacket sniffing in switched LANs
Packet sniffing in switched LANs
Ishraq Al Fataftah
ย 
Ostinato FOSS.IN 2010
Ostinato FOSS.IN 2010Ostinato FOSS.IN 2010
Ostinato FOSS.IN 2010
pstavirs
ย 
Open Source Tools for the Systems Administrator
Open Source Tools for the Systems AdministratorOpen Source Tools for the Systems Administrator
Open Source Tools for the Systems Administrator
Charles Profitt
ย 
Scanning with nmap
Scanning with nmapScanning with nmap
Scanning with nmap
commiebstrd
ย 
How Microsoft will MiTM your network
How Microsoft will MiTM your networkHow Microsoft will MiTM your network
How Microsoft will MiTM your network
Brandon DeVault
ย 
N map presentation
N map presentationN map presentation
N map presentation
ulirraptor
ย 
Network Mapper (NMAP)
Network Mapper (NMAP)Network Mapper (NMAP)
Network Mapper (NMAP)
KHNOG
ย 
Things you should know for network programming
Things you should know for network programmingThings you should know for network programming
Things you should know for network programming
Anry Lu
ย 
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]Ostinato - Craft Packets, Generate Traffic [SharkFest '20]
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]
pstavirs
ย 
Uart
UartUart
Uart
cs1090211
ย 

Similar to Offloading Linux LAG Devices Via Open vSwitch and TC (20)

Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
Kernel TLV
ย 
TC Flower Offload
TC Flower OffloadTC Flower Offload
TC Flower Offload
Netronome
ย 
Kubernetes networking
Kubernetes networkingKubernetes networking
Kubernetes networking
Sim Janghoon
ย 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
Hajime Tazaki
ย 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Netronome
ย 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld
ย 
Replacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumReplacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with Cilium
Michal Rostecki
ย 
2015.7.17 JANOG36 BGP Flowspec Interoperability Test @ Interop Tokyo 2015 Sho...
2015.7.17 JANOG36 BGP Flowspec Interoperability Test @ Interop Tokyo 2015 Sho...2015.7.17 JANOG36 BGP Flowspec Interoperability Test @ Interop Tokyo 2015 Sho...
2015.7.17 JANOG36 BGP Flowspec Interoperability Test @ Interop Tokyo 2015 Sho...
Shuichi Ohkubo
ย 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
Adrien Mahieux
ย 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
Sadique Puthen
ย 
SR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and ImprovementSR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and Improvement
LF Events
ย 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Anne Nicolas
ย 
03 linuxfirewall1
03 linuxfirewall103 linuxfirewall1
03 linuxfirewall1
Iwan Threads
ย 
HKG18-110 - net_mdev: Fast path user space I/O
HKG18-110 - net_mdev: Fast path user space I/OHKG18-110 - net_mdev: Fast path user space I/O
HKG18-110 - net_mdev: Fast path user space I/O
Linaro
ย 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
Koan-Sin Tan
ย 
Make container without_docker_7
Make container without_docker_7Make container without_docker_7
Make container without_docker_7
Sam Kim
ย 
Ha nsf notes
Ha nsf notesHa nsf notes
Ha nsf notes
Krunal Shah
ย 
IPTABLES Introduction
IPTABLES IntroductionIPTABLES Introduction
IPTABLES Introduction
HungWei Chiu
ย 
Linux router
Linux routerLinux router
Linux router
Miguel E Arellano Quezada
ย 
Firewalls rules using iptables in linux
Firewalls rules using iptables in linuxFirewalls rules using iptables in linux
Firewalls rules using iptables in linux
aamir lucky
ย 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
Kernel TLV
ย 
TC Flower Offload
TC Flower OffloadTC Flower Offload
TC Flower Offload
Netronome
ย 
Kubernetes networking
Kubernetes networkingKubernetes networking
Kubernetes networking
Sim Janghoon
ย 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
Hajime Tazaki
ย 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Netronome
ย 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld
ย 
Replacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumReplacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with Cilium
Michal Rostecki
ย 
2015.7.17 JANOG36 BGP Flowspec Interoperability Test @ Interop Tokyo 2015 Sho...
2015.7.17 JANOG36 BGP Flowspec Interoperability Test @ Interop Tokyo 2015 Sho...2015.7.17 JANOG36 BGP Flowspec Interoperability Test @ Interop Tokyo 2015 Sho...
2015.7.17 JANOG36 BGP Flowspec Interoperability Test @ Interop Tokyo 2015 Sho...
Shuichi Ohkubo
ย 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
Adrien Mahieux
ย 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
Sadique Puthen
ย 
SR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and ImprovementSR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and Improvement
LF Events
ย 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Anne Nicolas
ย 
03 linuxfirewall1
03 linuxfirewall103 linuxfirewall1
03 linuxfirewall1
Iwan Threads
ย 
HKG18-110 - net_mdev: Fast path user space I/O
HKG18-110 - net_mdev: Fast path user space I/OHKG18-110 - net_mdev: Fast path user space I/O
HKG18-110 - net_mdev: Fast path user space I/O
Linaro
ย 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
Koan-Sin Tan
ย 
Make container without_docker_7
Make container without_docker_7Make container without_docker_7
Make container without_docker_7
Sam Kim
ย 
Ha nsf notes
Ha nsf notesHa nsf notes
Ha nsf notes
Krunal Shah
ย 
IPTABLES Introduction
IPTABLES IntroductionIPTABLES Introduction
IPTABLES Introduction
HungWei Chiu
ย 
Firewalls rules using iptables in linux
Firewalls rules using iptables in linuxFirewalls rules using iptables in linux
Firewalls rules using iptables in linux
aamir lucky
ย 
Ad

More from Netronome (20)

Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Netronome
ย 
LFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DSLFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DS
Netronome
ย 
LFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M InstructionsLFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M Instructions
Netronome
ย 
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server ArchitectureUsing Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Netronome
ย 
Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports
Netronome
ย 
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware OffloadsQuality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
Netronome
ย 
ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project Launch
Netronome
ย 
Flexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesFlexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific Architectures
Netronome
ย 
Massively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional MemoryMassively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional Memory
Netronome
ย 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current Techniques
Netronome
ย 
Efficient JIT to 32-bit Arches
Efficient JIT to 32-bit ArchesEfficient JIT to 32-bit Arches
Efficient JIT to 32-bit Arches
Netronome
ย 
eBPF & Switch Abstractions
eBPF & Switch AbstractionseBPF & Switch Abstractions
eBPF & Switch Abstractions
Netronome
ย 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
Netronome
ย 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
Netronome
ย 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT Compiler
Netronome
ย 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
Netronome
ย 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction
Netronome
ย 
Host Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment ModelsHost Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment Models
Netronome
ย 
The Power of SmartNICs
The Power of SmartNICsThe Power of SmartNICs
The Power of SmartNICs
Netronome
ย 
DPDK Support for New HW Offloads
DPDK Support for New HW OffloadsDPDK Support for New HW Offloads
DPDK Support for New HW Offloads
Netronome
ย 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Netronome
ย 
LFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DSLFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DS
Netronome
ย 
LFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M InstructionsLFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M Instructions
Netronome
ย 
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server ArchitectureUsing Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Netronome
ย 
Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports
Netronome
ย 
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware OffloadsQuality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
Netronome
ย 
ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project Launch
Netronome
ย 
Flexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesFlexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific Architectures
Netronome
ย 
Massively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional MemoryMassively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional Memory
Netronome
ย 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current Techniques
Netronome
ย 
Efficient JIT to 32-bit Arches
Efficient JIT to 32-bit ArchesEfficient JIT to 32-bit Arches
Efficient JIT to 32-bit Arches
Netronome
ย 
eBPF & Switch Abstractions
eBPF & Switch AbstractionseBPF & Switch Abstractions
eBPF & Switch Abstractions
Netronome
ย 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
Netronome
ย 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
Netronome
ย 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT Compiler
Netronome
ย 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
Netronome
ย 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction
Netronome
ย 
Host Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment ModelsHost Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment Models
Netronome
ย 
The Power of SmartNICs
The Power of SmartNICsThe Power of SmartNICs
The Power of SmartNICs
Netronome
ย 
DPDK Support for New HW Offloads
DPDK Support for New HW OffloadsDPDK Support for New HW Offloads
DPDK Support for New HW Offloads
Netronome
ย 
Ad

Recently uploaded (20)

Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
ย 
"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko
Fwdays
ย 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
ย 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
ย 
Rock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning JourneyRock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning Journey
Lynda Kane
ย 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
ย 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
ย 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
ย 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
ย 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
ย 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
ย 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
ย 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
ย 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
ย 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
ย 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
ย 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
ย 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
ย 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
ย 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
ย 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
ย 
"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko
Fwdays
ย 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
ย 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
ย 
Rock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning JourneyRock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning Journey
Lynda Kane
ย 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
ย 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
ย 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
ย 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
ย 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
ย 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
ย 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
ย 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
ย 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
ย 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
ย 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
ย 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
ย 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
ย 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
ย 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
ย 

Offloading Linux LAG Devices Via Open vSwitch and TC

  • 1. ยฉ 2018 NETRONOME SYSTEMS, INC. John Hurley Open vSwitch 2018 Fall Conference Offloading Linux LAG devices Via Open vSwitch and TC
  • 2. ยฉ 2018 NETRONOME SYSTEMS, INC. 2 Offloading Flows With TC Flower ovs-vswitchd TC Flower OVS Datapath Nic Driver (NFP) (nfp_p0) (nfp_p1) (nfp_v0.0) VM 1 VM 2 ovs-dpctl dump-flows in_port(2),eth_type(0x0800),ipv4(proto=6,frag=no), packets:98, bytes:14316, used:5.351s, actions:3 tc -s filter show dev nfp_v0.0 ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 eth_type ipv4 ip_proto tcp ip_flags nofrag in_hw action order 1: mirred (Egress Redirect to device nfp_p0) stolen index 1 ref 1 bind 1 installed 12 sec used 11 sec Action statistics: Sent 14316 bytes 98 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0
  • 3. ยฉ 2018 NETRONOME SYSTEMS, INC. 3 Offload Performance on SmartNIC โ— VXLAN encapsulated traffic โ— Open vSwitch rules offloaded to SmartNIC via TC โ— Traffic sent in physical port โ— Forwarded to VM and bounced back on different port
  • 4. ยฉ 2018 NETRONOME SYSTEMS, INC. 4 LAG Devices and Representors โ— Link Aggregation (LAG) - combine multiple ports to act as single โ—‹ Load balance to increase bandwidth โ—‹ Active/backup failover โ— Open vSwitch bonds โ—‹ Combination of OvS kernel, OvS bond, LACP, high throughput - link flapping โ— Linux LAG devices โ—‹ Linux bond, Team โ—‹ Deployed in OpenStack environments โ— What if a Linux LAG upper device has โ€˜offloadableโ€™ lower devices and is added to an OVS bridge? ip link show 35: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 37: nfp_p0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP 38: nfp_p1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP
  • 5. ยฉ 2018 NETRONOME SYSTEMS, INC. 5 Kernel Egress Offload โ— NETDEV_CHANGELOWERSTATE โ—‹ Record active/backup port states โ— NETDEV_CHANGEUPPER โ—‹ Track new LAG upper and lower devs โ— Packet hash on NIC gets egress port tc -s filter show dev nfp_v0.0 ingress filter protocol ip pref 49152 flower chain 0 handle 0x1 eth_type ipv4 in_hw action order 1: mirred (Egress Redirect to device bond0) stolen Team NFP TC SmartNIC Bond Group 1: nfp_p0 nfp_p1 _______ Group 2: โ€ฆ... match: nfp_v0.0, ipv4, action: Group 1
  • 6. ยฉ 2018 NETRONOME SYSTEMS, INC. 6 โ— SmartNIC is not aware of LAG devices (in-kernel representation) โ— If โ€˜bond0โ€™ contains offloadable ports there is a need to: โ—‹ Distribute filters to all lower devices โ—‹ Combine stats from all lower device offload to LAG upper device/flower rule โ— No offload callback in the LAG drivers โ—‹ Difficult for SmartNIC driver to track changes โ— Are there any TC features we can make use of to solve this? Ingress Offload tc -s filter show dev bond0 ingress filter protocol ip pref 49152 flower chain 0 handle 0x1 eth_type ipv4 not_in_hw action order 1: mirred (Egress Redirect to device nfp_v0.0) stolen
  • 7. ยฉ 2018 NETRONOME SYSTEMS, INC. 7 TC Shared Blocks โ— Introduced in Kernel 4.16 โ— Each ingress qdisc has its own set of chains and filters called โ€˜blocksโ€™ โ— Shared blocks allow multiple qdiscs/netdevs to use the same chains/filters โ— Prevent duplicating rules which may not scale on, say, TCAM device offload โ— Each qdisc/netdev on block reports same filters and stats nfp_p0 ingress qdisc nfp_p1 ingress qdisc Block X tc qdisc add dev nfp_p0 ingress_block 22 ingress tc qdisc add dev nfp_p1 ingress_block 22 ingress tc filter add block 22 protocol ip parent ffff: flower ip_proto tcp skip_sw action drop
  • 8. ยฉ 2018 NETRONOME SYSTEMS, INC. 8 TC Shared Block as LAG Representations โ— Grouping LAG lower devices in shared blocks along with their upper device โ—‹ LAG netdev hierarchy not influenced outside the TC layer โ—‹ All lower devices receive same filters applied to master โ—‹ Effective distribution of offloaded filters - all offload ports get callback โ—‹ Stats correctly handled by default Block X nfp_p0 lower netdev nfp_p1 lower netdev bond0 upper netdev nfp_p0 lower netdev nfp_p1 lower netdev bond0 upper netdev
  • 9. ยฉ 2018 NETRONOME SYSTEMS, INC. 9 TC Shared Block Offload and Re-offload โ— TC shared blocks call offload hook for each netdev per filter โ—‹ Cannot (in 4.16 - 4.18) add new qdiscs/netdevs to block if it has offloaded rules โ—‹ Filter deletion offload hooks are only triggered on block deletion โ—‹ Removing a netdev from a shared block may still leave offloaded rules โ— LAG devices by their nature require flexible addition/removal of netdevs โ— [PATCH 0/7] net: sched: support replay of filter offload when binding to block (kernel 4.19) โ— Add the ability to replay offloaded filters when a new callback is registered โ— Replay โ€˜deleteโ€™ filter messages for each netdev on block withdrawal
  • 10. ยฉ 2018 NETRONOME SYSTEMS, INC. 10 Working with Open vSwitch Open vSwitch 2.10 [Patch 0/6] offload Linux LAG devices to the TC datapath โ— Add shared block ID support to OVS-TC API โ— Track Linux kernel netdevs and record LAG info โ— If a LAG upper dev is added to the OVS bridge, assign its qdisc a unique block ID โ— If a lower devโ€™s related upper dev is on the OVS bridge then associate it with the upper devices block ovs-vswitchd TC Datapath Upper Dev Qdisc Lower Dev 1 Qdisc Lower Dev 2 Qdisc NFP Driver NFP SmartNIC Block X Lower Device 1 filters/stats Lower Device 2 filters/stats User Space Kernel SmartNIC
  • 11. ยฉ 2018 NETRONOME SYSTEMS, INC. Thank You