SlideShare a Scribd company logo
3
Most read
8
Most read
16
Most read
Introduction to Linux Kernel
   TCP/IP procotol stack
                雕梁
      核心系统服务器平台组
      diaoliang@taobao.com
   simohayha.bobo@gmail.com
     http://www.pagefault.info
             2011/01/15
Agenda
Introduction

Networking code in the Linux kernel tree

L2 (Link Layer)

L3 (Network Layer)

L4 (Transport Layer)

Config and benchmark tools

Resource
Introduction

   Source
      http://git.kernel.org/
      net-next-2.6 and net-2.6
   Developer
      Alan Cox, David Miller, Eric Dumazet, Patrick Mchardy
      etc.
   Traffic directions
      input , forward and output
   Layer
      L2(Link Layer)/L3(Network Layer)/L4(Transport Layer)
   Device interface
      PCI/PCI-E
Networking code in the Linux kernel tree



Net-Kernel
source tree
Big picture
Link layer
  Frame type
     802.3/802.2/802.2-SNAP/Ethernet
  Input
     Driver
        NAPI
             Poll + Interrupt
     Soft interrupt
        GRO
             feed packet to network stack
        RPS/RFS
             make steer in SMP
     Protocol handler
        use eth_type_trans
        Packet_type list
Link layer
  Output
      Traffic Control
      Soft interrupt
          Transmit SKB
              Scatter/Gather DMA
          Free skb
          XPS
              multiqueue
              avoid cache line bouncing
              improve locality
  Bridge
      Virtual device, must bind one or more real device
      Spanning Tree Protocol
Link
Layer
bigmap
Network Layer(IP)
 Input
    Protocol handler
        net_protocol array
    defragment
        Hashtable
            Each IP packet being defragmented save in a list
        stored in kernel memory until they are totally
        processed
 Output
    fragment
        MTU
        Scatter/Gather IO
        udp
    neighboring
Network Layer(IP)
 Forward
    process ip option
    igonore defragmentation
         Router Alert option
 Route
    Forwarding Information Base(routing table)
    cache
 Netfilter
    HOOK point
         NF_IP_LOCAL_OUT/ NF_IP_LOCAL_IN etc..
 Management
    Long-living IP peer information
          AVL tree
    IP statistics
          per cpu data ipstats_mib
         /proc/net/snmp
Network
Layer
Bigmap
Transport Layer (tcp)
  Init
    bind callback (sock_create)
    Three handshrek
        accept queue
        syn table
        create new socket fd and change state
  Manage socket
    inet_ehash_bucket
            TCP_ESTABLISHED <= sk->sk_state < TCP_CLOSE
         inet_bind_hashbucket
              local binding port info
         listening_hash
              socket in TCP_LISTEN state
Transport Layer (tcp)
 Output
    Tcp push
    Congestion control
        state transition
        congestion windows
        packet count
 Input
    fast path and slow path
    Interrupt context/ Process context
    sk_backlog/receive_queue/prequeue
 Tcp state transition
    Kernel control
 Timer
    Retransmit/keep-alive/time-wait etc
TCP
Bigmap
Config and Benchmark Tools

 Ethtool
    offload fetures
 Benchmark and test tools
    Netperf/pktgen
    Mpstat/tcpstat

 Proc FileSystem
    /proc/net
    /proc/sys/net
        ipv4
        core
 Sys FileSystem
    /sys/class/net/ethx
Resource

 http://kernelnewbies.org

 http://kernel.org

 http://www.kernelplanet.org

 https://lkml.org

 http://vger.kernel.org/vger-lists.html

 http://www.pagefault.info/?tag=kernel

More Related Content

What's hot (20)

PDF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Brendan Gregg
 
PDF
Geep networking stack-linuxkernel
Kiran Divekar
 
PDF
LinuxCon 2015 Linux Kernel Networking Walkthrough
Thomas Graf
 
PDF
Intel DPDK Step by Step instructions
Hisaki Ohara
 
PDF
BPF - in-kernel virtual machine
Alexei Starovoitov
 
PDF
Dpdk pmd
Masaru Oki
 
ODP
Dpdk performance
Stephen Hemminger
 
PDF
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
Adrian Huang
 
PDF
Uboot startup sequence
Houcheng Lin
 
PDF
Linux Kernel - Virtual File System
Adrian Huang
 
PDF
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Thomas Graf
 
PDF
malloc & vmalloc in Linux
Adrian Huang
 
PDF
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
PDF
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Adrian Huang
 
PDF
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
PDF
netfilter and iptables
Kernel TLV
 
PPT
Linux Booting Steps
Anando Kumar Paul
 
PDF
Meet cute-between-ebpf-and-tracing
Viller Hsiao
 
PPSX
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
PDF
DPDK & Layer 4 Packet Processing
Michelle Holley
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Brendan Gregg
 
Geep networking stack-linuxkernel
Kiran Divekar
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
Thomas Graf
 
Intel DPDK Step by Step instructions
Hisaki Ohara
 
BPF - in-kernel virtual machine
Alexei Starovoitov
 
Dpdk pmd
Masaru Oki
 
Dpdk performance
Stephen Hemminger
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
Adrian Huang
 
Uboot startup sequence
Houcheng Lin
 
Linux Kernel - Virtual File System
Adrian Huang
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Thomas Graf
 
malloc & vmalloc in Linux
Adrian Huang
 
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Adrian Huang
 
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
netfilter and iptables
Kernel TLV
 
Linux Booting Steps
Anando Kumar Paul
 
Meet cute-between-ebpf-and-tracing
Viller Hsiao
 
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
DPDK & Layer 4 Packet Processing
Michelle Holley
 

Similar to introduction to linux kernel tcp/ip ptocotol stack (20)

PDF
packet traveling (pre cloud)
iman darabi
 
PDF
Userspace networking
Stephen Hemminger
 
PDF
Much Faster Networking
C4Media
 
PDF
05 Bcmsn Spanning Tree
gopi1985
 
PDF
CISSP Prep: Ch 5. Communication and Network Security (Part 1)
Sam Bowne
 
PDF
Zero-Copy Event-Driven Servers with Netty
Daniel Bimschas
 
PDF
4. Communication and Network Security
Sam Bowne
 
PDF
CCIE_RS_Quick_Review_Kit
Chris S Chen
 
PDF
CCIE
Lahcene Berkani
 
PDF
Cisco -Ccie rs quick_review_kit
Stoyan Stoyanov
 
PPTX
FlowER Erlang Openflow Controller
Holger Winkelmann
 
PDF
Linux Network Architecture Paperback Klaus Wehrle
wiurcewywk4391
 
PDF
Bare Metal Club ATX: Networking Discussion
Carl Perry
 
PDF
Linux network tools (Maarten Blomme)
Avansa Mid- en Zuidwest
 
PPTX
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
PDF
4. Communication and Network Security
Sam Bowne
 
PPTX
Network sockets
Denys Haryachyy
 
PDF
High perf-networking
mtimjones
 
KEY
イマドキなNetwork/IO
Takuya ASADA
 
ODP
Sockets and Socket-Buffer
Sourav Punoriyar
 
packet traveling (pre cloud)
iman darabi
 
Userspace networking
Stephen Hemminger
 
Much Faster Networking
C4Media
 
05 Bcmsn Spanning Tree
gopi1985
 
CISSP Prep: Ch 5. Communication and Network Security (Part 1)
Sam Bowne
 
Zero-Copy Event-Driven Servers with Netty
Daniel Bimschas
 
4. Communication and Network Security
Sam Bowne
 
CCIE_RS_Quick_Review_Kit
Chris S Chen
 
Cisco -Ccie rs quick_review_kit
Stoyan Stoyanov
 
FlowER Erlang Openflow Controller
Holger Winkelmann
 
Linux Network Architecture Paperback Klaus Wehrle
wiurcewywk4391
 
Bare Metal Club ATX: Networking Discussion
Carl Perry
 
Linux network tools (Maarten Blomme)
Avansa Mid- en Zuidwest
 
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
4. Communication and Network Security
Sam Bowne
 
Network sockets
Denys Haryachyy
 
High perf-networking
mtimjones
 
イマドキなNetwork/IO
Takuya ASADA
 
Sockets and Socket-Buffer
Sourav Punoriyar
 
Ad

introduction to linux kernel tcp/ip ptocotol stack

  • 1. Introduction to Linux Kernel TCP/IP procotol stack 雕梁 核心系统服务器平台组 [email protected] [email protected] http://www.pagefault.info 2011/01/15
  • 2. Agenda Introduction Networking code in the Linux kernel tree L2 (Link Layer) L3 (Network Layer) L4 (Transport Layer) Config and benchmark tools Resource
  • 3. Introduction Source http://git.kernel.org/ net-next-2.6 and net-2.6 Developer Alan Cox, David Miller, Eric Dumazet, Patrick Mchardy etc. Traffic directions input , forward and output Layer L2(Link Layer)/L3(Network Layer)/L4(Transport Layer) Device interface PCI/PCI-E
  • 4. Networking code in the Linux kernel tree Net-Kernel source tree
  • 6. Link layer Frame type 802.3/802.2/802.2-SNAP/Ethernet Input Driver NAPI Poll + Interrupt Soft interrupt GRO feed packet to network stack RPS/RFS make steer in SMP Protocol handler use eth_type_trans Packet_type list
  • 7. Link layer Output Traffic Control Soft interrupt Transmit SKB Scatter/Gather DMA Free skb XPS multiqueue avoid cache line bouncing improve locality Bridge Virtual device, must bind one or more real device Spanning Tree Protocol
  • 9. Network Layer(IP) Input Protocol handler net_protocol array defragment Hashtable Each IP packet being defragmented save in a list stored in kernel memory until they are totally processed Output fragment MTU Scatter/Gather IO udp neighboring
  • 10. Network Layer(IP) Forward process ip option igonore defragmentation Router Alert option Route Forwarding Information Base(routing table) cache Netfilter HOOK point NF_IP_LOCAL_OUT/ NF_IP_LOCAL_IN etc.. Management Long-living IP peer information AVL tree IP statistics per cpu data ipstats_mib /proc/net/snmp
  • 12. Transport Layer (tcp) Init bind callback (sock_create) Three handshrek accept queue syn table create new socket fd and change state Manage socket inet_ehash_bucket TCP_ESTABLISHED <= sk->sk_state < TCP_CLOSE inet_bind_hashbucket local binding port info listening_hash socket in TCP_LISTEN state
  • 13. Transport Layer (tcp) Output Tcp push Congestion control state transition congestion windows packet count Input fast path and slow path Interrupt context/ Process context sk_backlog/receive_queue/prequeue Tcp state transition Kernel control Timer Retransmit/keep-alive/time-wait etc
  • 15. Config and Benchmark Tools Ethtool offload fetures Benchmark and test tools Netperf/pktgen Mpstat/tcpstat Proc FileSystem /proc/net /proc/sys/net ipv4 core Sys FileSystem /sys/class/net/ethx
  • 16. Resource http://kernelnewbies.org http://kernel.org http://www.kernelplanet.org https://lkml.org http://vger.kernel.org/vger-lists.html http://www.pagefault.info/?tag=kernel