SlideShare a Scribd company logo
© 2014 VMware Inc. All rights reserved.
L2 over L3 Encapsulations
VXLAN, NVGRE, STT, Geneve, etc.
Motonori Shindo
Network & Security Business Unit
VMware
July. 13, 2014
Tunneling vs Encapsulation
• Tunneling Protocols
– Signaling + Encapsulation
• Usually equips some sort of “signaling” mechanism, which manages the tunnel.
• Encapsulation is another part of tunneling protocol.
– E.g. ) PPTP, L2TP, IPsec (IKE), etc.
• Encapsulations
– A way of wrapping (i.e. encapsulating) something
– E.g) GRE, VXLAN, NVGRE, STT, (Ethernet, IP, TCP, ….)
• What I’m going to talk about today is “encapsulation”
• I am not going to talk about “control plane” today (though it’s very important)
CONFIDENTIAL 2
L2 over L3 encapsulations typically seen in Network
Virtualization
• GRE (Generic Routing Encapsulation) *
• VXLAN (Virtual Extensible LAN)
• NVGRE (Network Virtualization using GRE)
• STT (Stateless Transport Tunneling)
* Strictly speaking GRE is not an L2 over L3 encapsulation
as it can encapsulate not only L2 but also L3
CONFIDENTIAL 3
VXLAN
• Proposed by Cumulus / Arista / Broadcom / Cisco / VMware / Citrix / RedHat
– draft-mahalingam-dutt-dcops-vxlan-09.txt
• Extends VLAN ID (12bit) to VNI (24bit)
• Encapsulation by UDP/IP
– L3 overlay
– Multipath
• Encapsulates Ethernet Frame only
• Simple so that it can be implemented by hardware
• Forming an “ecosystem”
CONFIDENTIAL 4
VXLAN Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R|R|R|R|I|R|R|R| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VXLAN Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 5
Fabric Network
• Service Oriented Architecture
• 2 or 3 layer network to Leaf & Spine
• High density and bandwidth required
• Layer 3 ECMP
• No oversubscription
• Low and uniform delay characteristic
• Wire & configure once network
• Uniform network configuration
WAN/Internet
WAN/Internet
CONFIDENTIAL 6
Multipath Network
• Background
– In order to support significant increase of East-West traffic, Fabric Network based on multipath is
getting popular
• Requisites
– A given flow must traverse over the same paths
– Must have enough “entropy” to make an efficient use of fabric
CONFIDENTIAL 7
Multipath by VXLAN
VXLAN (8)UDP (8)IP (20)
Hash (src/dst MAC addr,
src/dst IP addr,
src/dst port number, etc.) *
dst port = 4789
src port = Hash()
Ether IP TCP Data
original packet
* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.
CONFIDENTIAL 8
VXLAN Ecosystem
• Switch / Router
– Arista, Brocade, Cisco, Cumulus, DELL, HP,
Huawei, Juniper, Open vSwitch, Pica8
• Operating System
– Linux, VMware
• Appliances
– A10, Citrix F5
• Testers
– IXIA, Spirent
• ASIC / NIC
– Broadcom, Intel (Fulcrum), Emulex, Mellanox
• Cloud Orchestrator
– CloudStack, OpenStack, vCAC
CONFIDENTIAL 9
Note: this is not an exhaustive list
This is a list of venders who participated in
VXLAN interoperability test at INTEROP Tokyo
2014, which went all successful.
NVGRE
• Proposed by Microsoft / Arista / Intel / Google / HP / Broadcom / Emulex
– draft-sridharan-virtualization-nvgre-04.txt
• 24bit Virtual Subnet ID (VSID) and 8bit FlowID
• Encapsulation is GRE as is:
– Put VSID + FlowID in Key Field
– L3 Overlay
– Multipath possible (in theory) but difficult
• Windows affinity
CONFIDENTIAL 10
NVGRE Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| |1|0| Reserved0 | Ver | Protocol Type 0x6558 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Virtual Subnet ID (VSID) | FlowID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 11
Multipath in NVGRE
GRE (8)IP (20)
Hash (src/dst MAC addr,
src/dst IP addr,
src/dst port number, etc.) *
FlowID = Hash()
Ether IP TCP Data
Original Packet
Router / Switch needs to
lookup the Key Field in GRE
header to do an ideal
multipath!
* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.
CONFIDENTIAL 12
NVGRE ecosystem
• Switch / Router
– Huawei
– Arista and Brocade claim they are going to support but product hasn’t come out yet??
• Operating System
– Microsoft (Windows Server 2012 R2)
• Appliances
– F5
• ASIC / NIC
– Emulex Mellanox
• Cloud Orchestrator
– System Center 2012 R2
CONFIDENTIAL 13
Note: this is not an exhaustive list
STT (Stateless Transport Tunneling)
• L2 over L3 encapsulation proposed by VMware
– draft-davie-stt-06.txt
• Why yet another L2 over L3 encapsulation ?
– Performance
– Richer context information
– Multipath
– Software oriented
CONFIDENTIAL 14
TSO (TCP Segmentation Offload)
• Modern NIC (shipped within 4-5 years) equips various hardware acceleration features:
– RSS, GSO/TSO, Checksum Offload, etc.
• With TSO, NIC will perform TCP segmentation processing on behalf of Operating System (in
software)
– Operating system can now send up to 64K bytes packet. This will lead to a significant decrease of the
number of packet processing (i.e. interrupt) hence much less context switches needed.
• To take advantage of TSO in NIC, STT encapsulates packets as if it looks like “TCP”!
CONFIDENTIAL 15
Encapsulation / Segmentation in STT
STT (18)TCP’ (20)IP (20)
Payload 1STT (18)TCP’ (20)IP (20)
Payload 2TCP’ (20)IP (20)
Payload nTCP’ (20)IP (20)
L2 Frame (up to 64K)
・
・
・
・
Segmentation
By
Hardware
CONFIDENTIAL 16
TCP-like Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number(*) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number(*) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields marked as * are
repurposed in STT
CONFIDENTIAL 17
STT Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Flags | L4 Offset | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Max. Segment Size | PCP |V| VLAN ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ Context ID (64 bits) +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Padding | data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| |
CONFIDENTIAL 18
Throughput and CPU Utilization
0
10
20
30
40
50
60
70
80
90
100
0
1
2
3
4
5
6
7
8
9
10
Linux Bridge OVS Bridge OVS-GRE OVS-STT
スループット CPU (Receive) CPU (Send)
(Gbps) (%)Source: http://networkheresy.com/2012/06/08/the-overhead-of-software-tunneling/
CONFIDENTIAL 19
Multipath in STT
STT (18)TCP’ (20)IP (20)
Hash (src/dst MAC addr,
src/dst IP addr,
src/dst port number, etc.)
dst port = 7471 (TBD)
src port = Hash()
Ether IP TCP Data
Original Packet
* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.
CONFIDENTIAL 20
Geneve (Generic Network Virtualization Encapsulation)
• New encapsulation being proposed by VMware, Microsoft, RedHat, Intel
– draft-gross-geneve-00.txt
• Goals
– Extensibility
• Service Chaining, Metadata support, etc.
– Leverage NIC offload
– Above two at the same time! (each one is straightforward, but two at the same time is difficult)
• Highlights
– Information can be added as Option field in TLV formart
– Format carefully designed so that NIC can perform TSO
– OAM and Criticality (indicating parsing the option fields mandatory)
CONFIDENTIAL 21
Geneve Header & Option Header
Geneve Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| Opt Len |O|C| Rsvd. | Protocol Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Virtual Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable Length Options |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Option
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |R|R|R| Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable Option Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 22
Geneve Implementation
• Recently implemented in Open vSwitch (OVS) and merged into master branch on GitHub
– VNI can be specified
– Geneve Options can’t be specified (at this point)
– Can’t mark OAM flag?? (I tried but didn’t work)
– Looks like Critical flag supported as long as critical options are present
• Geneve dissector for Wireshark also implemented and merged to master branch of Github
• Geneve-aware NIC is not available yet
CONFIDENTIAL 23
Running Geneve on Open vSwtich
CONFIDENTIAL 24
host-1:~$ sudo ovs-vsctl add-br br0
host-1:~$ sudo ovs-vsctl add-br br1
host-1:~$ sudo ovs-vsctl add-port bra eth0
host-1:~$ sudo ifconfig eth0 0
host-1:~$ sudo dhclient br0
host-1:~$ sudo ifconfig br1 10.0.0.1 netmask 255.255.255.0
host-1:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface 
geneve1 type=geneve options:remote_ip=192.168.203.149
host-2:~$ sudo ovs-vsctl add-br br0
host-2:~$ sudo ovs-vsctl add-br br1
host-2:~$ sudo ovs-vsctl add-port bra eth0
host-2:~$ sudo ifconfig eth0 0
host-2:~$ sudo dhclient br0
host-2:~$ sudo ifconfig br1 10.0.0.2 netmask 255.255.255.0
host-2:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface 
geneve1 type=geneve options:remote_ip=192.168.203.151
Dissecting Geneve Packets by Wireshark (1)
CONFIDENTIAL 25
Dissecting Geneve Packets by Wireshark (2)
CONFIDENTIAL 26
Information about Geneve
• English
– http://tools.ietf.org/html/draft-gross-geneve-00
– http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/
– http://www.enterprisenetworkingplanet.com/netsp/geneve-generic-network-virtualization-encapsulation-
protocol-advances-video.html
– http://searchsdn.techtarget.com/news/2240219051/VMware-Microsoft-end-encapsulation-protocol-turf-
war-with-GENEVE
– http://www.plexxi.com/2014/06/attention-overlay-tunnel-construction-ahead
– http://blog.shin.do/2014/07/geneve-on-open-vswitch/
• Japanese
– http://blog.shin.do/2014/05/geneve-encapsulation/
– http://blog.shin.do/2014/07/geneve-on-open-vswitch/
CONFIDENTIAL 27
Geneve replaces VXLAN / STT / NVGRE ?
• Geneve replaces VXLAN ?
– NO
– VXLAN ecosystem has already grown big enough so it is unlikely to be replaced by something else
– VMware will continue to support VXLAN and ecosystem partners
• Geneve replaces STT?
– In short term, NO. In the long run, maybe if
• Geneve is accepted by the market and Geneve-aware NIC becomes widely available in the same level as STT
today.
• Geneve replaces NVGRE ?
– In short term, NO. In the long run, maybe if
• Geneve gets implemented on Windows and ecosystem is formed in the same level as NVGRE as to today.
CONFIDENTIAL 28
Encapsulation is like a wire, right cable in the right place
CONFIDENTIAL 29
http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/
World is not that simple 
• Some people are against Geneve
• Their claims are more or less as follows:
– What Geneve tries to accomplish can be achieved by existing encapsulation (such as L2TP static
tunneling or VXLAN) as is or with a small extension !?
– Service Chaining, Metadata stuff should not be bound to a particular encapsulation. It should be
independent from encapsulation !?
– 24bit as VNI not long enough !?
CONFIDENTIAL 30
L2TPv3 static tunneling
• L2TPv3 being as a tunneling protocol, inherently it has a signaling. That said, it can be used a
plain encapsulation method (i.e. pseudo wire) without using signaling. That is called “L2TPv3
static tunneling” where configuration is made at both ends manually.
• L2TPv3 became an RFC in 2005 (RFC3931) and been in market for many years. Cisco IOS
and Linux (l2tpd) have L2TPv3 static tunneling.
31
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|T|x|x|x|x|x|x|x|x|x|x|x| Ver | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Session ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Cookie (optional, maximum 64 bits)...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL
L2TPv3 static tunneling as a L2 over L3 encapsulation
• Session ID (32bit) corresponds to VNI
• L2TPv3 can be transported directly over IP or UDP. For multipath, UDP would be better.
• No explicit field for context information (metadata, etc.). It has to be configured manually on
both ends (if possible) and express it implicitly as a part of Session ID
– Therefore 32bit Session ID can’t be used entirely for VNI
• Strictly speaking, there is no way in L2TPv3 to tell (in the packet) where the subsequent packet
starts at so that NIC can do TSO. However, L2TPv2 had an “offset” option for this purpose.
Many L2TPv3 implementations still have this “offset” option for backward compatibility to
L2TPv2. So TSO is possible (if NIC understands this legacy option). Cisco and Linux l2tpd
support the offset field.
CONFIDENTIAL 32
VXLAN Generic Protocol Extension (a.k.a. eVXLAN)
• Proposed by Cisco、Huawei、Intel、Microsoft
– draft-quinn-vxlan-gpe-03.txt
• An extension to VXLAN
– Support protocols other than Ethernet
• IPv4 (0x01), IPv6 (0x02), Ethernet (0x03), Network Service Header [NSH] (0x04)
– Note that “Net Protocol” is only 8bits width. Protocol type (usually 16bits) has to be specifically encoded to fit into 8bits.
– OAM support
– Version field
• Used by Cisco ACI
CONFIDENTIAL 33
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R|R|R|R|I|P|R|O|Ver| Reserved |Next Protocol |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VXLAN Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
VXLAN-gpe as L2 over L3 encapsulation
• Mostly identical to VXLAN
– VNI length (24bits)
– Multipath property
– Hardware friendliness
• The biggest motivation of VXLAN-gpe is probably to allow Service Chaining by NSH (network
service header)
• No further extensibility
CONFIDENTIAL 34
Thank You!
35

More Related Content

PDF
Mikrotik fastpath
Achmad Mardiansyah
 
PDF
Using mikrotik with radius
Achmad Mardiansyah
 
PDF
BGP on RouterOS7 -Part 1
GLC Networks
 
PDF
Mikrotik Hotspot
GLC Networks
 
PPT
ospf routing protocol
Ameer Agel
 
PDF
大規模DCのネットワークデザイン
Masayuki Kobayashi
 
PDF
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Thomas Graf
 
Mikrotik fastpath
Achmad Mardiansyah
 
Using mikrotik with radius
Achmad Mardiansyah
 
BGP on RouterOS7 -Part 1
GLC Networks
 
Mikrotik Hotspot
GLC Networks
 
ospf routing protocol
Ameer Agel
 
大規模DCのネットワークデザイン
Masayuki Kobayashi
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Thomas Graf
 

What's hot (20)

PPT
OpenFlow tutorial
openflow
 
PDF
Mikrotik Fastpath vs Fasttrack
GLC Networks
 
PDF
Application Centric Infrastructure (ACI), the policy driven data centre
Cisco Canada
 
DOC
Basic command to configure mikrotik
Tola LENG
 
PPTX
FPGA
Syed Saeed
 
PPTX
OpenZeppelin + Remix + BNB smart chain
Gene Leybzon
 
PDF
Introduction of Networking
Netwax Lab
 
PDF
MTCNA Intro to routerOS
GLC Networks
 
PDF
MikroTik Hotspot 2.0 (IEEE 802.11u) - MUM Jakarta 2016
Rofiq Fauzi
 
PDF
ISP Load Balancing with Mikrotik ECMP
GLC Networks
 
PDF
MPLS on Router OS V7 - Part 2
GLC Networks
 
PPTX
Mpls technology
Naveen Sihag
 
PPTX
HTTP2 and gRPC
Guo Jing
 
PDF
DoS and DDoS mitigations with eBPF, XDP and DPDK
Marian Marinov
 
PDF
Mikrotik Hotspot User Manager
KHNOG
 
PDF
Q2.12: Debugging with GDB
Linaro
 
PDF
Configuration of hybrid topology in cisco packet tracer by Tanjilur Rahman
TanjilurRahman6
 
PDF
IPsec on Mikrotik
GLC Networks
 
PDF
Mininet introduction
Vipin Gupta
 
PDF
Network Monitoring with The Dude and Whatsapp
GLC Networks
 
OpenFlow tutorial
openflow
 
Mikrotik Fastpath vs Fasttrack
GLC Networks
 
Application Centric Infrastructure (ACI), the policy driven data centre
Cisco Canada
 
Basic command to configure mikrotik
Tola LENG
 
OpenZeppelin + Remix + BNB smart chain
Gene Leybzon
 
Introduction of Networking
Netwax Lab
 
MTCNA Intro to routerOS
GLC Networks
 
MikroTik Hotspot 2.0 (IEEE 802.11u) - MUM Jakarta 2016
Rofiq Fauzi
 
ISP Load Balancing with Mikrotik ECMP
GLC Networks
 
MPLS on Router OS V7 - Part 2
GLC Networks
 
Mpls technology
Naveen Sihag
 
HTTP2 and gRPC
Guo Jing
 
DoS and DDoS mitigations with eBPF, XDP and DPDK
Marian Marinov
 
Mikrotik Hotspot User Manager
KHNOG
 
Q2.12: Debugging with GDB
Linaro
 
Configuration of hybrid topology in cisco packet tracer by Tanjilur Rahman
TanjilurRahman6
 
IPsec on Mikrotik
GLC Networks
 
Mininet introduction
Vipin Gupta
 
Network Monitoring with The Dude and Whatsapp
GLC Networks
 
Ad

Viewers also liked (20)

PDF
OpenStack Congress and Datalog (English)
Motonori Shindo
 
PDF
OpenStack Congress and Datalog (Japanese)
Motonori Shindo
 
PDF
Korejanai Story
Kentaro Takeda
 
PDF
Mk state in-programming-01
Miya Kohno
 
PDF
Cloud stackユーザ会大阪 運用Tips 20130802
hirokihojo
 
PDF
of_protocol_tremaday5
エイシュン コンドウ
 
PDF
FlexPod Day 2016 - Cisco session (Publish edition)
Takao Setaka
 
PDF
Tokyo meetup 20160224
Takao Setaka
 
PDF
中国にOpenflowを入れてきた話
cloretsblack
 
PDF
TLS, HTTP/2演習
shigeki_ohtsu
 
PDF
Bird in show_net
Tomoya Hibi
 
PDF
Node最新トピックス
shigeki_ohtsu
 
PDF
Loom openflow controller in 10 min
エイシュン コンドウ
 
PDF
自動でできるかな?
_norin_
 
PDF
10分で作るクラスライブラリ
_norin_
 
PDF
HTTP/2, QUIC入門
shigeki_ohtsu
 
PDF
【Interop Tokyo 2016】 Seminar - EA-14 : シスコ スイッチが標的型攻撃を食い止める ~新しい内部対策ソリューション「C...
シスコシステムズ合同会社
 
PDF
最近のBurp Suiteについて調べてみた
zaki4649
 
PDF
Web制作・運用会社に必要なCDNサービスとは?
J-Stream Inc.
 
PDF
ノリとその場の勢いでPocを作った話
zaki4649
 
OpenStack Congress and Datalog (English)
Motonori Shindo
 
OpenStack Congress and Datalog (Japanese)
Motonori Shindo
 
Korejanai Story
Kentaro Takeda
 
Mk state in-programming-01
Miya Kohno
 
Cloud stackユーザ会大阪 運用Tips 20130802
hirokihojo
 
of_protocol_tremaday5
エイシュン コンドウ
 
FlexPod Day 2016 - Cisco session (Publish edition)
Takao Setaka
 
Tokyo meetup 20160224
Takao Setaka
 
中国にOpenflowを入れてきた話
cloretsblack
 
TLS, HTTP/2演習
shigeki_ohtsu
 
Bird in show_net
Tomoya Hibi
 
Node最新トピックス
shigeki_ohtsu
 
Loom openflow controller in 10 min
エイシュン コンドウ
 
自動でできるかな?
_norin_
 
10分で作るクラスライブラリ
_norin_
 
HTTP/2, QUIC入門
shigeki_ohtsu
 
【Interop Tokyo 2016】 Seminar - EA-14 : シスコ スイッチが標的型攻撃を食い止める ~新しい内部対策ソリューション「C...
シスコシステムズ合同会社
 
最近のBurp Suiteについて調べてみた
zaki4649
 
Web制作・運用会社に必要なCDNサービスとは?
J-Stream Inc.
 
ノリとその場の勢いでPocを作った話
zaki4649
 
Ad

Similar to L2 over l3 ecnaspsulations (english) (20)

PDF
How to configure and manage cisco nexus switch
atingupta21
 
PPTX
Session 2
ahmed elmeghiny
 
PDF
Introduction to Computer Networks: Peter L Dordal
Shabista Imam
 
PDF
Tcpip Tutorial And Technical Overview 7th Edition Ibm Redbooks
dolatazalini
 
PDF
Ibm flex system and pure flex system network implementation with cisco systems
Edgar Jara
 
PDF
Cisco ccna-security note
jihad nader
 
ODP
7. protocols
Marian Marinov
 
PPTX
Vxlan deep dive session rev0.5 final
KwonSun Bae
 
PDF
Networking on z/OS
IBM India Smarter Computing
 
PPTX
Advanced Network Chapter I: Which is very best lecture note
abdisani3
 
PDF
ComputerNetworks.pdf
MeetMiyatra
 
PDF
Layer 3 Tunnel Support for Open vSwitch
Netronome
 
PDF
VXLAN BGP EVPN: Technology Building Blocks
APNIC
 
PDF
5 продвинутых технологий Cisco, которые нужно знать
SkillFactory
 
PDF
Lecture 7.pdf
ssusercf79b32
 
PPTX
98 366 mva slides lesson 7
suddenven
 
PPTX
MVA slides lesson 7
Fabio Almeida- Oficina Eletrônica
 
KEY
Fosscon 2012 firewall workshop
jvehent
 
PDF
Huawei_HCNA_Routing_and_Switching.pdf
PauloDiniz60
 
DOCX
Expt no.3
rahul kbp
 
How to configure and manage cisco nexus switch
atingupta21
 
Session 2
ahmed elmeghiny
 
Introduction to Computer Networks: Peter L Dordal
Shabista Imam
 
Tcpip Tutorial And Technical Overview 7th Edition Ibm Redbooks
dolatazalini
 
Ibm flex system and pure flex system network implementation with cisco systems
Edgar Jara
 
Cisco ccna-security note
jihad nader
 
7. protocols
Marian Marinov
 
Vxlan deep dive session rev0.5 final
KwonSun Bae
 
Networking on z/OS
IBM India Smarter Computing
 
Advanced Network Chapter I: Which is very best lecture note
abdisani3
 
ComputerNetworks.pdf
MeetMiyatra
 
Layer 3 Tunnel Support for Open vSwitch
Netronome
 
VXLAN BGP EVPN: Technology Building Blocks
APNIC
 
5 продвинутых технологий Cisco, которые нужно знать
SkillFactory
 
Lecture 7.pdf
ssusercf79b32
 
98 366 mva slides lesson 7
suddenven
 
Fosscon 2012 firewall workshop
jvehent
 
Huawei_HCNA_Routing_and_Switching.pdf
PauloDiniz60
 
Expt no.3
rahul kbp
 

More from Motonori Shindo (19)

PPTX
おうち Lab で GitDNSOps / GitDNS Ops in My Home Lab
Motonori Shindo
 
PPTX
Tanzu Mission Control における Open Policy Agent (OPA) の利用
Motonori Shindo
 
PDF
Open Policy Agent (OPA) と Kubernetes Policy
Motonori Shindo
 
PDF
Open Policy Agent (OPA) 入門
Motonori Shindo
 
PPTX
急速に進化を続けるCNIプラグイン Antrea
Motonori Shindo
 
PPTX
Cluster API によるKubernetes環境のライフサイクル管理とマルチクラウド環境での適用
Motonori Shindo
 
PPTX
宣言的(Declarative)ネットワーキング
Motonori Shindo
 
PPTX
Service Mesh for Enterprises / Cloud Native Days Tokyo 2019
Motonori Shindo
 
PDF
Idea Hackathon at vFORUM 2019 Tokyo
Motonori Shindo
 
PPTX
Containers and Virtual Machines: Friends or Enemies?
Motonori Shindo
 
PPTX
Open Source Projects by VMware
Motonori Shindo
 
PPTX
Serverless Framework "Disptach" の紹介
Motonori Shindo
 
PPTX
コンテナネットワーキング(CNI)最前線
Motonori Shindo
 
PPTX
フロー技術によるネットワーク管理
Motonori Shindo
 
PDF
Viptela 顧客事例
Motonori Shindo
 
PDF
ViptelaのSD-WANとクラウド最適化ネットワーク
Motonori Shindo
 
PDF
L2 over L3 ecnaspsulations
Motonori Shindo
 
PDF
VMware NSXがサポートするトンネル方式について
Motonori Shindo
 
PPTX
CloudStack 4.1 + NVP Integration
Motonori Shindo
 
おうち Lab で GitDNSOps / GitDNS Ops in My Home Lab
Motonori Shindo
 
Tanzu Mission Control における Open Policy Agent (OPA) の利用
Motonori Shindo
 
Open Policy Agent (OPA) と Kubernetes Policy
Motonori Shindo
 
Open Policy Agent (OPA) 入門
Motonori Shindo
 
急速に進化を続けるCNIプラグイン Antrea
Motonori Shindo
 
Cluster API によるKubernetes環境のライフサイクル管理とマルチクラウド環境での適用
Motonori Shindo
 
宣言的(Declarative)ネットワーキング
Motonori Shindo
 
Service Mesh for Enterprises / Cloud Native Days Tokyo 2019
Motonori Shindo
 
Idea Hackathon at vFORUM 2019 Tokyo
Motonori Shindo
 
Containers and Virtual Machines: Friends or Enemies?
Motonori Shindo
 
Open Source Projects by VMware
Motonori Shindo
 
Serverless Framework "Disptach" の紹介
Motonori Shindo
 
コンテナネットワーキング(CNI)最前線
Motonori Shindo
 
フロー技術によるネットワーク管理
Motonori Shindo
 
Viptela 顧客事例
Motonori Shindo
 
ViptelaのSD-WANとクラウド最適化ネットワーク
Motonori Shindo
 
L2 over L3 ecnaspsulations
Motonori Shindo
 
VMware NSXがサポートするトンネル方式について
Motonori Shindo
 
CloudStack 4.1 + NVP Integration
Motonori Shindo
 

L2 over l3 ecnaspsulations (english)

  • 1. © 2014 VMware Inc. All rights reserved. L2 over L3 Encapsulations VXLAN, NVGRE, STT, Geneve, etc. Motonori Shindo Network & Security Business Unit VMware July. 13, 2014
  • 2. Tunneling vs Encapsulation • Tunneling Protocols – Signaling + Encapsulation • Usually equips some sort of “signaling” mechanism, which manages the tunnel. • Encapsulation is another part of tunneling protocol. – E.g. ) PPTP, L2TP, IPsec (IKE), etc. • Encapsulations – A way of wrapping (i.e. encapsulating) something – E.g) GRE, VXLAN, NVGRE, STT, (Ethernet, IP, TCP, ….) • What I’m going to talk about today is “encapsulation” • I am not going to talk about “control plane” today (though it’s very important) CONFIDENTIAL 2
  • 3. L2 over L3 encapsulations typically seen in Network Virtualization • GRE (Generic Routing Encapsulation) * • VXLAN (Virtual Extensible LAN) • NVGRE (Network Virtualization using GRE) • STT (Stateless Transport Tunneling) * Strictly speaking GRE is not an L2 over L3 encapsulation as it can encapsulate not only L2 but also L3 CONFIDENTIAL 3
  • 4. VXLAN • Proposed by Cumulus / Arista / Broadcom / Cisco / VMware / Citrix / RedHat – draft-mahalingam-dutt-dcops-vxlan-09.txt • Extends VLAN ID (12bit) to VNI (24bit) • Encapsulation by UDP/IP – L3 overlay – Multipath • Encapsulates Ethernet Frame only • Simple so that it can be implemented by hardware • Forming an “ecosystem” CONFIDENTIAL 4
  • 5. VXLAN Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R|R|R|R|I|R|R|R| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VXLAN Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CONFIDENTIAL 5
  • 6. Fabric Network • Service Oriented Architecture • 2 or 3 layer network to Leaf & Spine • High density and bandwidth required • Layer 3 ECMP • No oversubscription • Low and uniform delay characteristic • Wire & configure once network • Uniform network configuration WAN/Internet WAN/Internet CONFIDENTIAL 6
  • 7. Multipath Network • Background – In order to support significant increase of East-West traffic, Fabric Network based on multipath is getting popular • Requisites – A given flow must traverse over the same paths – Must have enough “entropy” to make an efficient use of fabric CONFIDENTIAL 7
  • 8. Multipath by VXLAN VXLAN (8)UDP (8)IP (20) Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.) * dst port = 4789 src port = Hash() Ether IP TCP Data original packet * Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation. CONFIDENTIAL 8
  • 9. VXLAN Ecosystem • Switch / Router – Arista, Brocade, Cisco, Cumulus, DELL, HP, Huawei, Juniper, Open vSwitch, Pica8 • Operating System – Linux, VMware • Appliances – A10, Citrix F5 • Testers – IXIA, Spirent • ASIC / NIC – Broadcom, Intel (Fulcrum), Emulex, Mellanox • Cloud Orchestrator – CloudStack, OpenStack, vCAC CONFIDENTIAL 9 Note: this is not an exhaustive list This is a list of venders who participated in VXLAN interoperability test at INTEROP Tokyo 2014, which went all successful.
  • 10. NVGRE • Proposed by Microsoft / Arista / Intel / Google / HP / Broadcom / Emulex – draft-sridharan-virtualization-nvgre-04.txt • 24bit Virtual Subnet ID (VSID) and 8bit FlowID • Encapsulation is GRE as is: – Put VSID + FlowID in Key Field – L3 Overlay – Multipath possible (in theory) but difficult • Windows affinity CONFIDENTIAL 10
  • 11. NVGRE Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| |1|0| Reserved0 | Ver | Protocol Type 0x6558 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Virtual Subnet ID (VSID) | FlowID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CONFIDENTIAL 11
  • 12. Multipath in NVGRE GRE (8)IP (20) Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.) * FlowID = Hash() Ether IP TCP Data Original Packet Router / Switch needs to lookup the Key Field in GRE header to do an ideal multipath! * Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation. CONFIDENTIAL 12
  • 13. NVGRE ecosystem • Switch / Router – Huawei – Arista and Brocade claim they are going to support but product hasn’t come out yet?? • Operating System – Microsoft (Windows Server 2012 R2) • Appliances – F5 • ASIC / NIC – Emulex Mellanox • Cloud Orchestrator – System Center 2012 R2 CONFIDENTIAL 13 Note: this is not an exhaustive list
  • 14. STT (Stateless Transport Tunneling) • L2 over L3 encapsulation proposed by VMware – draft-davie-stt-06.txt • Why yet another L2 over L3 encapsulation ? – Performance – Richer context information – Multipath – Software oriented CONFIDENTIAL 14
  • 15. TSO (TCP Segmentation Offload) • Modern NIC (shipped within 4-5 years) equips various hardware acceleration features: – RSS, GSO/TSO, Checksum Offload, etc. • With TSO, NIC will perform TCP segmentation processing on behalf of Operating System (in software) – Operating system can now send up to 64K bytes packet. This will lead to a significant decrease of the number of packet processing (i.e. interrupt) hence much less context switches needed. • To take advantage of TSO in NIC, STT encapsulates packets as if it looks like “TCP”! CONFIDENTIAL 15
  • 16. Encapsulation / Segmentation in STT STT (18)TCP’ (20)IP (20) Payload 1STT (18)TCP’ (20)IP (20) Payload 2TCP’ (20)IP (20) Payload nTCP’ (20)IP (20) L2 Frame (up to 64K) ・ ・ ・ ・ Segmentation By Hardware CONFIDENTIAL 16
  • 17. TCP-like Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number(*) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number(*) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fields marked as * are repurposed in STT CONFIDENTIAL 17
  • 18. STT Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Flags | L4 Offset | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Max. Segment Size | PCP |V| VLAN ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Context ID (64 bits) + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Padding | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | CONFIDENTIAL 18
  • 19. Throughput and CPU Utilization 0 10 20 30 40 50 60 70 80 90 100 0 1 2 3 4 5 6 7 8 9 10 Linux Bridge OVS Bridge OVS-GRE OVS-STT スループット CPU (Receive) CPU (Send) (Gbps) (%)Source: http://networkheresy.com/2012/06/08/the-overhead-of-software-tunneling/ CONFIDENTIAL 19
  • 20. Multipath in STT STT (18)TCP’ (20)IP (20) Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.) dst port = 7471 (TBD) src port = Hash() Ether IP TCP Data Original Packet * Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation. CONFIDENTIAL 20
  • 21. Geneve (Generic Network Virtualization Encapsulation) • New encapsulation being proposed by VMware, Microsoft, RedHat, Intel – draft-gross-geneve-00.txt • Goals – Extensibility • Service Chaining, Metadata support, etc. – Leverage NIC offload – Above two at the same time! (each one is straightforward, but two at the same time is difficult) • Highlights – Information can be added as Option field in TLV formart – Format carefully designed so that NIC can perform TSO – OAM and Criticality (indicating parsing the option fields mandatory) CONFIDENTIAL 21
  • 22. Geneve Header & Option Header Geneve Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver| Opt Len |O|C| Rsvd. | Protocol Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Virtual Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Variable Length Options | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Class | Type |R|R|R| Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Variable Option Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CONFIDENTIAL 22
  • 23. Geneve Implementation • Recently implemented in Open vSwitch (OVS) and merged into master branch on GitHub – VNI can be specified – Geneve Options can’t be specified (at this point) – Can’t mark OAM flag?? (I tried but didn’t work) – Looks like Critical flag supported as long as critical options are present • Geneve dissector for Wireshark also implemented and merged to master branch of Github • Geneve-aware NIC is not available yet CONFIDENTIAL 23
  • 24. Running Geneve on Open vSwtich CONFIDENTIAL 24 host-1:~$ sudo ovs-vsctl add-br br0 host-1:~$ sudo ovs-vsctl add-br br1 host-1:~$ sudo ovs-vsctl add-port bra eth0 host-1:~$ sudo ifconfig eth0 0 host-1:~$ sudo dhclient br0 host-1:~$ sudo ifconfig br1 10.0.0.1 netmask 255.255.255.0 host-1:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface geneve1 type=geneve options:remote_ip=192.168.203.149 host-2:~$ sudo ovs-vsctl add-br br0 host-2:~$ sudo ovs-vsctl add-br br1 host-2:~$ sudo ovs-vsctl add-port bra eth0 host-2:~$ sudo ifconfig eth0 0 host-2:~$ sudo dhclient br0 host-2:~$ sudo ifconfig br1 10.0.0.2 netmask 255.255.255.0 host-2:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface geneve1 type=geneve options:remote_ip=192.168.203.151
  • 25. Dissecting Geneve Packets by Wireshark (1) CONFIDENTIAL 25
  • 26. Dissecting Geneve Packets by Wireshark (2) CONFIDENTIAL 26
  • 27. Information about Geneve • English – http://tools.ietf.org/html/draft-gross-geneve-00 – http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/ – http://www.enterprisenetworkingplanet.com/netsp/geneve-generic-network-virtualization-encapsulation- protocol-advances-video.html – http://searchsdn.techtarget.com/news/2240219051/VMware-Microsoft-end-encapsulation-protocol-turf- war-with-GENEVE – http://www.plexxi.com/2014/06/attention-overlay-tunnel-construction-ahead – http://blog.shin.do/2014/07/geneve-on-open-vswitch/ • Japanese – http://blog.shin.do/2014/05/geneve-encapsulation/ – http://blog.shin.do/2014/07/geneve-on-open-vswitch/ CONFIDENTIAL 27
  • 28. Geneve replaces VXLAN / STT / NVGRE ? • Geneve replaces VXLAN ? – NO – VXLAN ecosystem has already grown big enough so it is unlikely to be replaced by something else – VMware will continue to support VXLAN and ecosystem partners • Geneve replaces STT? – In short term, NO. In the long run, maybe if • Geneve is accepted by the market and Geneve-aware NIC becomes widely available in the same level as STT today. • Geneve replaces NVGRE ? – In short term, NO. In the long run, maybe if • Geneve gets implemented on Windows and ecosystem is formed in the same level as NVGRE as to today. CONFIDENTIAL 28
  • 29. Encapsulation is like a wire, right cable in the right place CONFIDENTIAL 29 http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/
  • 30. World is not that simple  • Some people are against Geneve • Their claims are more or less as follows: – What Geneve tries to accomplish can be achieved by existing encapsulation (such as L2TP static tunneling or VXLAN) as is or with a small extension !? – Service Chaining, Metadata stuff should not be bound to a particular encapsulation. It should be independent from encapsulation !? – 24bit as VNI not long enough !? CONFIDENTIAL 30
  • 31. L2TPv3 static tunneling • L2TPv3 being as a tunneling protocol, inherently it has a signaling. That said, it can be used a plain encapsulation method (i.e. pseudo wire) without using signaling. That is called “L2TPv3 static tunneling” where configuration is made at both ends manually. • L2TPv3 became an RFC in 2005 (RFC3931) and been in market for many years. Cisco IOS and Linux (l2tpd) have L2TPv3 static tunneling. 31 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T|x|x|x|x|x|x|x|x|x|x|x| Ver | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Session ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cookie (optional, maximum 64 bits)... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CONFIDENTIAL
  • 32. L2TPv3 static tunneling as a L2 over L3 encapsulation • Session ID (32bit) corresponds to VNI • L2TPv3 can be transported directly over IP or UDP. For multipath, UDP would be better. • No explicit field for context information (metadata, etc.). It has to be configured manually on both ends (if possible) and express it implicitly as a part of Session ID – Therefore 32bit Session ID can’t be used entirely for VNI • Strictly speaking, there is no way in L2TPv3 to tell (in the packet) where the subsequent packet starts at so that NIC can do TSO. However, L2TPv2 had an “offset” option for this purpose. Many L2TPv3 implementations still have this “offset” option for backward compatibility to L2TPv2. So TSO is possible (if NIC understands this legacy option). Cisco and Linux l2tpd support the offset field. CONFIDENTIAL 32
  • 33. VXLAN Generic Protocol Extension (a.k.a. eVXLAN) • Proposed by Cisco、Huawei、Intel、Microsoft – draft-quinn-vxlan-gpe-03.txt • An extension to VXLAN – Support protocols other than Ethernet • IPv4 (0x01), IPv6 (0x02), Ethernet (0x03), Network Service Header [NSH] (0x04) – Note that “Net Protocol” is only 8bits width. Protocol type (usually 16bits) has to be specifically encoded to fit into 8bits. – OAM support – Version field • Used by Cisco ACI CONFIDENTIAL 33 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R|R|R|R|I|P|R|O|Ver| Reserved |Next Protocol | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VXLAN Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  • 34. VXLAN-gpe as L2 over L3 encapsulation • Mostly identical to VXLAN – VNI length (24bits) – Multipath property – Hardware friendliness • The biggest motivation of VXLAN-gpe is probably to allow Service Chaining by NSH (network service header) • No further extensibility CONFIDENTIAL 34