SlideShare a Scribd company logo
Understanding
DPDK
Description of techniques used to achieve
high throughput on a commodity hardware
How fast SW has to work?
14.88 millions of 64 byte packets per second on 10G interface
1.8 GHz -> 1 cycle = 0,55 ns
1 packet -> 67.2 ns = 120 clock cycles
IFG
Pream
ble
DST
MAC
SRC
MAC
SRC
MAC
Type Payload CRC
84 Bytes
412 8 60
Comparative speed values
CPU to memory speed = 6-8 GBytes/s
PCI-Express x16 speed = 5 GBytes/s
Access to RAM = 200 ns
Access to L3 cache = 4 ns
Context switch ~= 1000 ns (3.2 GHz)
Packet processing in Linux
User space
Kernel space
NIC
App
Driver
RX/TX queues
Socket
Ring
buffers
Linux kernel overhead
System calls
Context switching on blocking I/O
Data copying from kernel to user space
Interrupt handling in kernel
Expense of sendto
Function Activity Time (ns)
sendto system call 96
sosend_dgram lock sock_buff, alloc mbuf, copy in 137
udp_output UDP header setup 57
ip_output route lookup, ip header setup 198
ether_otput MAC lookup, MAC header setup 162
ixgbe_xmit device programming 220
Total 950
Packet processing with DPDK
User space
Kernel space
NIC
App DPDK
Ring
buffers
UIO driver
RX/TX
queues
Kernel space
Updating a register in Linux
User space
HW
ioctl()
Register
syscall
VFS
copy_from_user()
iowrite()
Updating a register with DPDK
User space
HW
assign
Register
What is used inside DPDK?
Processor affinity (separate cores)
Huge pages (no swap, TLB)
UIO (no copying from kernel)
Polling (no interrupts overhead)
Lockless synchronization (avoid waiting)
Batch packets handling
SSE, NUMA awareness
Linux default scheduling
Core 0
Core 1
Core 2
Core 3
t1 t4t3t2
How to isolate a core for a process
To diagnose use top
“top” , press “f” , press “j”
Before boot use isolcpus
“isolcpus=2,4,6”
After boot - use cpuset
“cset shield -c 1-3”, “cset shield -k on”
Core 2Core 1
Run-to-completion model
RX/TX
thread
RX/TX
thread
Port 1 Port 2
Core 2Core 1
Pipeline model
RX
thread
TX
thread
Port 1 Port 2
Ring
Page tables tree
Linux paging model
cr3
Page
Page
Global
Directory
Page
Table
Page
Middle
Directory
TLB
TLB
Page
Table
RAM
OffsetVirtual page
Physical Page Offset
TLB characteristics
$ cpuid | grep -i tlb
size: 12–4,096 entries
hit time: 0.5–1 clock cycle
miss penalty: 10–100 clock cycles
miss rate: 0.01–1%
It is very expensive resource!
Solution - Hugepages
Benefit: optimized TLB usage, no swap
Hugepage size = 2M
Usage:
mount hugetlbfs /mnt/huge
mmap
Library - libhugetlbfs
Lockless ring design
Writer can preempt writer and reader
Reader can not preempt writer
Reader and writer can work simultaneously on
different cores
Barrier
CAS operation
Bulk queue/dequeue
Lockless ring (Single Producer)
1
cons_head
cons_tail
prod_head
prod_tail
prod_next 2
cons_head
cons_tail
prod_head
prod_next
prod_tail
3
cons_head
cons_tail
prod_head
prod_tail
Lockless ring (Single Consumer)
1
cons_head
cons_tail
prod_head
prod_tail
cons_next 2
cons_tail prod_head
prod_tail
cons_next
cons_head
3
cons_head
cons_tail
prod_head
prod_tail
Lockless ring (Multiple Producers)
1
cons_head
cons_tail
prod_head
prod_tail
prod_next1
prod_next2 3
cons_head
cons_tail
prod_head
2
cons_head
cons_tail
prod_head
prod_next2
prod_tail
prod_next1
4
cons_head
cons_tail
5
cons_head
cons_tail
prod_head
prod_tail
prod_tail
prod_head
prod_tail
prod_next1
prod_next2
prod_next1
prod_next2
Kernel space network driver
App
IP stack
Driver
NIC
Data
Desc
Config
Data
User space
Kernel space
Interrupts
UIO
“The most important devices can’t be handled
in user space, including, but not
limited to, network interfaces and block
devices.” - LDD3
UIO
User space
Kernel space
Interfacesysfs /dev/uioX
App
US driver epoll()
mmap()
UIO framework
driver
NIC User space
Access to device from user space
BAR0 (Mem)
BAR1
BAR2 (IO)
BAR5
BAR4
BAR3
Vendor Id
Device Id
Command
Revision Id
Status
...
Configuration
registers
I/O and memory
regions
/sys/class/uio/uioX/maps/mapX
/sys/class/uio/uioX/portio/portX
/dev/uioX -> mmap (offset)
/sys/bus/pci/devices
Host memory NIC memory
DMA RX
Update RDT
DMA descriptor(s)
RX queue RX FIFO
DMA packet
Descriptor ringMemory
DMA descriptors
Host memory NIC memory
DMA TX
Update TDT
DMA descriptor(s)
TX queue TX FIFO
DMA packet
Descriptor ringMemory
DMA descriptors
Receive from SW side
DD DD DDDD
RDT
DD
mbuf1
addr
DD
mbuf2
addr
RDT
RDH = 1
RDT = 5
RDBA = 0
RDLEN = 6
mbuf1
RDH
RDH
mbuf2
Transmit from SW side
DD DD DDDD
TDT
DD
mbuf1
addr
DD
mbuf2
addr
TDT
TDH = 1
TDT = 5
TDBA = 0
TDLEN = 6
mbuf1
TDH
TDH
mbuf2
NUMA
CPU 0
Cores
Memory
controller
I/O controller
Memory
PCI-E PCI-E
CPU 1
Cores
Memory
controller
I/O controller
Memory
PCI-E PCI-E
QPI
Socket 0 Socket 1
RSS (Receive Side Scaling)
Hash
function
Queue 0 CPU N
...
Queue N
Incoming traffic Indirection
table
Flow director
Queue 0 CPU N
...
Queue N
Incoming traffic
Filter table
Hash
function
Outgoing traffic
Drop Route
Virtualization - SR-IOV
NIC
VMM
VM1
VF driver
VM2
VF driver
PF driver
VF
Virtual bridge
VF PF
NIC
Slow path using bifurcated driver
Kernel DPDK
VF
Virtual bridge
PF Filter table
Slow path using TAP
User space
Kernel space
NIC
App DPDK
Ring
buffers
TAP device
RX/TX
queues
TCP/IP
stack
Slow path using KNI
User space
Kernel space
NIC
App DPDK
Ring
buffers
KNI device
RX/TX
queues
TCP/IP
stack
x86 HW
Application 1 - Traffic generator
User space
Streams generator
DUT
Traffic analyzer
x86 HW
Application 2 - Router
Kernel
User space
Routing table
Routing table cacheDUT1 DUT2
x86 HW
Application 3 - Middlebox
User space
DPIDUT1 DUT2
References
Device Drivers in User Space
Userspace I/O drivers in a realtime context
The Userspace I/O HOWTO
The anatomy of a PCI/PCI Express kernel driver
From Intel® Data Plane Development Kit to Wind River Network Acceleration
Platform
DPDK Design Tips (Part 1 - RSS)
Getting the Best of Both Worlds with Queue Splitting (Bifurcated Driver)
Design considerations for efficient network applications with Intel® multi-core
processor-based systems on Linux
Introduction to Intel Ethernet Flow Director
My blog
Learning Network Programming
Ad

More Related Content

What's hot (20)

Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
Hisaki Ohara
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
Michelle Holley
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
Andriy Berestovskyy
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
Stephen Hemminger
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
Denys Haryachyy
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
Vipin Varghese
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
Daniel T. Lee
 
Ixgbe internals
Ixgbe internalsIxgbe internals
Ixgbe internals
SUSE Labs Taipei
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
Kernel TLV
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
Kernel TLV
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
Denys Haryachyy
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
Michelle Holley
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
Thomas Graf
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
Naoto MATSUMOTO
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offload
Kevin Traynor
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
hugo lu
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
ScyllaDB
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
Kernel TLV
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack
monad bobo
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
Hisaki Ohara
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
Michelle Holley
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
Andriy Berestovskyy
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
Vipin Varghese
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
Daniel T. Lee
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
Kernel TLV
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
Kernel TLV
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
Denys Haryachyy
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
Michelle Holley
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
Thomas Graf
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
Naoto MATSUMOTO
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offload
Kevin Traynor
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
hugo lu
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
ScyllaDB
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
Kernel TLV
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack
monad bobo
 

Viewers also liked (20)

DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
Lagopus SDN/OpenFlow switch
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
Te-Yen Liu
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
Redge Technologies
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux
Naoto MATSUMOTO
 
Vagrant
VagrantVagrant
Vagrant
Denys Haryachyy
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Takuya ASADA
 
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
Kentaro Ebisawa
 
コンテナ情報交換会2
コンテナ情報交換会2コンテナ情報交換会2
コンテナ情報交換会2
Masahide Yamamoto
 
cassandra 100 node cluster admin operation
cassandra 100 node cluster admin operationcassandra 100 node cluster admin operation
cassandra 100 node cluster admin operation
oranie Narut
 
PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合
hiboma
 
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Gosuke Miyashita
 
Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)
CLOUDIAN KK
 
Structural design of tunnel lining
Structural design of tunnel liningStructural design of tunnel lining
Structural design of tunnel lining
Mahesh Raj Bhatt
 
Tunnel engg.2
Tunnel engg.2Tunnel engg.2
Tunnel engg.2
SHUBHAM DABHADE
 
Bridges precast
Bridges precastBridges precast
Bridges precast
Dr Fereidoun Dejahang
 
Ecg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineeringEcg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineering
Junaida Wally
 
Tunneling
Tunneling  Tunneling
Tunneling
FONDAZIONE INT.LE CENTRO STUDI E RICERCHE-ONLUS (NGO)
 
Guidelines
GuidelinesGuidelines
Guidelines
Šumadin Šumić
 
Precast segmental concrete bridges a
Precast segmental concrete bridges aPrecast segmental concrete bridges a
Precast segmental concrete bridges a
Palmer Consulting Services, LLC
 
Diaphragm Wall Presentation By Gagan
Diaphragm Wall Presentation By GaganDiaphragm Wall Presentation By Gagan
Diaphragm Wall Presentation By Gagan
HERITAGE INFRASPACE INDIA PRIVATE LIMITED
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
Lagopus SDN/OpenFlow switch
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
Te-Yen Liu
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux
Naoto MATSUMOTO
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Takuya ASADA
 
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
Kentaro Ebisawa
 
コンテナ情報交換会2
コンテナ情報交換会2コンテナ情報交換会2
コンテナ情報交換会2
Masahide Yamamoto
 
cassandra 100 node cluster admin operation
cassandra 100 node cluster admin operationcassandra 100 node cluster admin operation
cassandra 100 node cluster admin operation
oranie Narut
 
PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合
hiboma
 
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Gosuke Miyashita
 
Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)
CLOUDIAN KK
 
Structural design of tunnel lining
Structural design of tunnel liningStructural design of tunnel lining
Structural design of tunnel lining
Mahesh Raj Bhatt
 
Ecg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineeringEcg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineering
Junaida Wally
 
Ad

Similar to Understanding DPDK (20)

Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5
Steen Larsen
 
Polyraptor
PolyraptorPolyraptor
Polyraptor
MohammedAlasmar2
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
Anne Nicolas
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
brouer
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
Hajime Tazaki
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
MEPCO Schlenk Engineering College
 
Memory management
Memory managementMemory management
Memory management
Adrien Mahieux
 
The Spectre of Meltdowns
The Spectre of MeltdownsThe Spectre of Meltdowns
The Spectre of Meltdowns
Andriy Berestovskyy
 
Dpdk accelerated Ostinato
Dpdk accelerated OstinatoDpdk accelerated Ostinato
Dpdk accelerated Ostinato
pstavirs
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable Switch
Chun Ming Ou
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
Ferdinand Jamitzky
 
Polyraptor
PolyraptorPolyraptor
Polyraptor
MohammedAlasmar2
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Akihiro Hayashi
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems Performance
Brendan Gregg
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Hsien-Hsin Sean Lee, Ph.D.
 
Semiconductor memories
Semiconductor memoriesSemiconductor memories
Semiconductor memories
SambitShreeman
 
Brkdct 3101
Brkdct 3101Brkdct 3101
Brkdct 3101
Nguyen Van Linh
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1
wjunjmt
 
Introduction to tcpdump
Introduction to tcpdumpIntroduction to tcpdump
Introduction to tcpdump
Lev Walkin
 
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Ontico
 
Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5
Steen Larsen
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
Anne Nicolas
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
brouer
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
Hajime Tazaki
 
Dpdk accelerated Ostinato
Dpdk accelerated OstinatoDpdk accelerated Ostinato
Dpdk accelerated Ostinato
pstavirs
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable Switch
Chun Ming Ou
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
Ferdinand Jamitzky
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Akihiro Hayashi
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems Performance
Brendan Gregg
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Hsien-Hsin Sean Lee, Ph.D.
 
Semiconductor memories
Semiconductor memoriesSemiconductor memories
Semiconductor memories
SambitShreeman
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1
wjunjmt
 
Introduction to tcpdump
Introduction to tcpdumpIntroduction to tcpdump
Introduction to tcpdump
Lev Walkin
 
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Ontico
 
Ad

More from Denys Haryachyy (6)

Understanding iptables
Understanding iptablesUnderstanding iptables
Understanding iptables
Denys Haryachyy
 
Secure communication
Secure communicationSecure communication
Secure communication
Denys Haryachyy
 
Network sockets
Network socketsNetwork sockets
Network sockets
Denys Haryachyy
 
C++ 11
C++ 11C++ 11
C++ 11
Denys Haryachyy
 
Git basics
Git basicsGit basics
Git basics
Denys Haryachyy
 
History of the personal computer
History of the personal computerHistory of the personal computer
History of the personal computer
Denys Haryachyy
 

Recently uploaded (20)

Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 

Understanding DPDK

  • 1. Understanding DPDK Description of techniques used to achieve high throughput on a commodity hardware
  • 2. How fast SW has to work? 14.88 millions of 64 byte packets per second on 10G interface 1.8 GHz -> 1 cycle = 0,55 ns 1 packet -> 67.2 ns = 120 clock cycles IFG Pream ble DST MAC SRC MAC SRC MAC Type Payload CRC 84 Bytes 412 8 60
  • 3. Comparative speed values CPU to memory speed = 6-8 GBytes/s PCI-Express x16 speed = 5 GBytes/s Access to RAM = 200 ns Access to L3 cache = 4 ns Context switch ~= 1000 ns (3.2 GHz)
  • 4. Packet processing in Linux User space Kernel space NIC App Driver RX/TX queues Socket Ring buffers
  • 5. Linux kernel overhead System calls Context switching on blocking I/O Data copying from kernel to user space Interrupt handling in kernel
  • 6. Expense of sendto Function Activity Time (ns) sendto system call 96 sosend_dgram lock sock_buff, alloc mbuf, copy in 137 udp_output UDP header setup 57 ip_output route lookup, ip header setup 198 ether_otput MAC lookup, MAC header setup 162 ixgbe_xmit device programming 220 Total 950
  • 7. Packet processing with DPDK User space Kernel space NIC App DPDK Ring buffers UIO driver RX/TX queues
  • 8. Kernel space Updating a register in Linux User space HW ioctl() Register syscall VFS copy_from_user() iowrite()
  • 9. Updating a register with DPDK User space HW assign Register
  • 10. What is used inside DPDK? Processor affinity (separate cores) Huge pages (no swap, TLB) UIO (no copying from kernel) Polling (no interrupts overhead) Lockless synchronization (avoid waiting) Batch packets handling SSE, NUMA awareness
  • 11. Linux default scheduling Core 0 Core 1 Core 2 Core 3 t1 t4t3t2
  • 12. How to isolate a core for a process To diagnose use top “top” , press “f” , press “j” Before boot use isolcpus “isolcpus=2,4,6” After boot - use cpuset “cset shield -c 1-3”, “cset shield -k on”
  • 13. Core 2Core 1 Run-to-completion model RX/TX thread RX/TX thread Port 1 Port 2
  • 14. Core 2Core 1 Pipeline model RX thread TX thread Port 1 Port 2 Ring
  • 15. Page tables tree Linux paging model cr3 Page Page Global Directory Page Table Page Middle Directory
  • 17. TLB characteristics $ cpuid | grep -i tlb size: 12–4,096 entries hit time: 0.5–1 clock cycle miss penalty: 10–100 clock cycles miss rate: 0.01–1% It is very expensive resource!
  • 18. Solution - Hugepages Benefit: optimized TLB usage, no swap Hugepage size = 2M Usage: mount hugetlbfs /mnt/huge mmap Library - libhugetlbfs
  • 19. Lockless ring design Writer can preempt writer and reader Reader can not preempt writer Reader and writer can work simultaneously on different cores Barrier CAS operation Bulk queue/dequeue
  • 20. Lockless ring (Single Producer) 1 cons_head cons_tail prod_head prod_tail prod_next 2 cons_head cons_tail prod_head prod_next prod_tail 3 cons_head cons_tail prod_head prod_tail
  • 21. Lockless ring (Single Consumer) 1 cons_head cons_tail prod_head prod_tail cons_next 2 cons_tail prod_head prod_tail cons_next cons_head 3 cons_head cons_tail prod_head prod_tail
  • 22. Lockless ring (Multiple Producers) 1 cons_head cons_tail prod_head prod_tail prod_next1 prod_next2 3 cons_head cons_tail prod_head 2 cons_head cons_tail prod_head prod_next2 prod_tail prod_next1 4 cons_head cons_tail 5 cons_head cons_tail prod_head prod_tail prod_tail prod_head prod_tail prod_next1 prod_next2 prod_next1 prod_next2
  • 23. Kernel space network driver App IP stack Driver NIC Data Desc Config Data User space Kernel space Interrupts
  • 24. UIO “The most important devices can’t be handled in user space, including, but not limited to, network interfaces and block devices.” - LDD3
  • 25. UIO User space Kernel space Interfacesysfs /dev/uioX App US driver epoll() mmap() UIO framework driver
  • 26. NIC User space Access to device from user space BAR0 (Mem) BAR1 BAR2 (IO) BAR5 BAR4 BAR3 Vendor Id Device Id Command Revision Id Status ... Configuration registers I/O and memory regions /sys/class/uio/uioX/maps/mapX /sys/class/uio/uioX/portio/portX /dev/uioX -> mmap (offset) /sys/bus/pci/devices
  • 27. Host memory NIC memory DMA RX Update RDT DMA descriptor(s) RX queue RX FIFO DMA packet Descriptor ringMemory DMA descriptors
  • 28. Host memory NIC memory DMA TX Update TDT DMA descriptor(s) TX queue TX FIFO DMA packet Descriptor ringMemory DMA descriptors
  • 29. Receive from SW side DD DD DDDD RDT DD mbuf1 addr DD mbuf2 addr RDT RDH = 1 RDT = 5 RDBA = 0 RDLEN = 6 mbuf1 RDH RDH mbuf2
  • 30. Transmit from SW side DD DD DDDD TDT DD mbuf1 addr DD mbuf2 addr TDT TDH = 1 TDT = 5 TDBA = 0 TDLEN = 6 mbuf1 TDH TDH mbuf2
  • 31. NUMA CPU 0 Cores Memory controller I/O controller Memory PCI-E PCI-E CPU 1 Cores Memory controller I/O controller Memory PCI-E PCI-E QPI Socket 0 Socket 1
  • 32. RSS (Receive Side Scaling) Hash function Queue 0 CPU N ... Queue N Incoming traffic Indirection table
  • 33. Flow director Queue 0 CPU N ... Queue N Incoming traffic Filter table Hash function Outgoing traffic Drop Route
  • 34. Virtualization - SR-IOV NIC VMM VM1 VF driver VM2 VF driver PF driver VF Virtual bridge VF PF
  • 35. NIC Slow path using bifurcated driver Kernel DPDK VF Virtual bridge PF Filter table
  • 36. Slow path using TAP User space Kernel space NIC App DPDK Ring buffers TAP device RX/TX queues TCP/IP stack
  • 37. Slow path using KNI User space Kernel space NIC App DPDK Ring buffers KNI device RX/TX queues TCP/IP stack
  • 38. x86 HW Application 1 - Traffic generator User space Streams generator DUT Traffic analyzer
  • 39. x86 HW Application 2 - Router Kernel User space Routing table Routing table cacheDUT1 DUT2
  • 40. x86 HW Application 3 - Middlebox User space DPIDUT1 DUT2
  • 41. References Device Drivers in User Space Userspace I/O drivers in a realtime context The Userspace I/O HOWTO The anatomy of a PCI/PCI Express kernel driver From Intel® Data Plane Development Kit to Wind River Network Acceleration Platform DPDK Design Tips (Part 1 - RSS) Getting the Best of Both Worlds with Queue Splitting (Bifurcated Driver) Design considerations for efficient network applications with Intel® multi-core processor-based systems on Linux Introduction to Intel Ethernet Flow Director