Pensando SSDK Ipsec GW User Guide
Pensando SSDK Ipsec GW User Guide
Pipeline
User Guide
AMD Pensando Software-in-Silicon Development Kit
Chapter 2: Architecture...............................................................................................6
Transport and Tunnel Modes.....................................................................................................6
Encryption Path........................................................................................................................... 7
Decryption Path........................................................................................................................... 8
Stages......................................................................................................................................... 10
Features......................................................................................................................................10
End-to-End Testing....................................................................................................................11
Chapter 1
Introduction
This guide presents an overview of the IPsec_GW reference pipeline (Transport and Tunnel
mode). It provides details of the P4 program that you can load onto an AMD Pensando™ second
generation ("Elba") data processing unit (DPU), or run within the provided x86 simulator. P4
pipeline programmability gives flexible software-defined constructs that enable you to develop
your networking software quickly, load it onto an AMD Pensando DPU, and test it. The reference
pipelines supply P416 source code for P4I and P4E (protocol processing) modules and P416-based
libraries in binary format for P4 RxDMA and P4 TxDMA that you can access via APIs for handling
message transfer with host and local CPU. This combination of source code and libraries
facilitates the implementation of the provided reference pipeline. The provided P4 binary can be
deployed alongside your P4 code on the same DPU to integrate the functionality into the
customer's system. This approach streamlines the implementation process and ensures seamless
compatibility.
In cloud service provider, enterprise, and public sector environments, encryption services are
commonly delivered via Edge/VPN gateway physical or virtual (VNF) appliances. However, this
approach compromises performance and drives up costs if the encryption offering needs to scale
the number of tunnels or if high IPsec throughput is required. Physical appliances tend to
increase in cost and size as performance requirements increase for both throughput and tunnel
scale. VNF or other software-only solutions typically pin a tunnel to a single CPU and can only
achieve a peak throughput of 1.25-2.5 Gb/s per tunnel. Performance challenges of software-only
IPsec solutions include:
• High CPU utilization: Encryption and decryption of IPsec packets can be CPU-intensive,
leading to high CPU utilization on the server or appliance. This can impact the performance of
other applications running on the same server or appliance.
• Limited throughput: CPU speed limits the throughput of an IPsec tunnel on a software-only
solution. This can be a problem for networks that need to support high-bandwidth traffic.
• Latency: Encryption and decryption of IPsec packets can add latency to network traffic. This
can be a problem for applications that are sensitive to latency, such as VoIP and video
conferencing.
When there is a need to encrypt high-speed links or provide a scalable encryption service that
does not consume racks of CPUs, P4-programmable DPUs provide a more scalable and
performant solution.
Customers of cloud providers often want to encrypt on-ramp circuits between colocation
facilities, enterprise data centers, and their cloud resources. High-speed cloud on-ramp circuits
offer sub 1 Gb/s to 100 Gb/s links with one or more IPsec tunnels, which is challenging to
encrypt using current IPsec implementations that rely on CPUs.
The AMD Pensando DPU can offload encryption services from the x86 server, significantly
increasing throughput per tunnel and the number of tunnels supported, without requiring
additional compute resources from a software-only solution (VNF) or relying on large appliances.
Third-party vendors can use the AMD Pensando DPU to improve the IPsec throughput and
tunnel scale of their appliance offerings, while also reducing their footprint.
The benefits of using an AMD Pensando DPU for encryption services include:
• Scalability: DPUs and current SSDK software can scale to support up to 64,000 encrypted
tunnels.
• Performance: DPUs can encrypt and decrypt IPsec packets at line rate without impacting the
performance of other applications.
• Flexibility: The AMD Pensando DPU can support multi-service offerings, leveraging a flow-
based approach, additional networking and security functions with encryption (Packet
Rewrite, SDN Policy Offload, Flow Offloads, NAT, Stateful Firewall, Observability, and Massive
Control and Data plane scale), or a policy-based VPN for stateless environments.
• Flexible encryption: P4 programmability enables you to select what type of traffic should be
encrypted and how the traffic is mapped to service associations (SAs) and IPsec tunnels.
The IPsec_GW P4 reference pipeline is a robust method to enhance security without sacrificing
network performance. The AMD Pensando DPU and P4 pipeline have achieved 100 Gb/s for a
single IPsec tunnel and up to 260 Gb/s of bidirectional throughput for a single DPU. The
IPsec_GW reference pipeline is a bump-in-the-wire (BITW) implementation ideal for a
SmartSwitch or an appliance form factor, as shown in the following figure. When deployed inside
a SmartSwitch as a top-of-rack device or in an appliance, the DPU accelerates IPsec services for
any traffic that enters or leaves the device. A DPU with the IPsec_GW reference pipeline
typically connects to a switching ASIC as a bump-in-the-wire. Alternatively, the BITW approach
includes an appliance mode using an x86 system with DPUs deployed in PCIe® slots. It takes
unencrypted packets from the wire, encrypts them at line rate, and sends them back to the
switching ASIC to be forwarded to their next hop or destination. To scale out performance,
multiple DPUs can be connected to an ASIC.
DPU ×N
ASIC
Traffic Traffic
To/From To/From
Network Network
X28791-110623
The IPsec pipeline can also be enhanced to support a host-to-network deployment, as shown in
the following figure. This allows offloads to be performed on a per-compute node basis.
DPU
Traffic
To/From
Network
X28792-110623
Chapter 2
Architecture
The P4 pipeline is a software-defined data plane programmed to perform various tasks, including
encryption and decryption. In the case of IPsec, the P4 pipeline is programmed to perform inline
encryption and decryption without the need to offload to a crypto engine. This allows the P4
pipeline to encrypt and decrypt packets at line speed without impacting network performance or
latency-sensitive applications.
Original Packet
IP L4 Header Data
Encrypted
Authenticated
Tunnel Mode
Encrypted
Authenticated
X28810-110823
Encryption Path
Packet paths for encryption and decryption follow P4 Ingress to P4 Egress pipelines.
For the encryption path, where a packet comes in from an uplink un-encrypted, the P4 pipeline
uses the crypto block between the P4I deparser and the packet buffer. The P4 pipeline can also
be used for post-encryption/decryption operations (multi-service), such as packet rewrites, SDN
policy offload, flow offloads, NAT, stateful firewall, observability, or additional flow lookup. See
the following figure.
P4 TxDMA P4 RxDMA
Inline
Crypto
P4 P4
Egress Packet Buffer Ingress
Traffic Manager
Inline
Crypto
X28794-110623
It is worth noting that Arm® cores are not in the path of the IPsec data path, unless desired. All
IPsec functionalities can be implemented in P4 with an inline crypto engine, which provides near-
line rate performance for the IPsec data plane. The current reference pipeline supports
configuring static security associations (SAs) and using a custom control plane (strongSwan or
other third party). The SSDK provides corresponding SA APIs, which customers can use to
program the P4 SA table. The P4 pipeline can also be extended to support dynamic SA
configuration (strongSwan or other third party vendors). This allows the SAs to be configured
automatically based on the traffic flows in the network.
• Because IPsec_GW is a stateless app, SA index required to encrypt the packet is derived from
packet vlan-id.
• The tunnel-mapping table entry contains the information about vlan-id -> tunnel-id mapping.
• The tunnel table entry provides ipsec tunnel mode, encrypt sa-index, nexthop info, and so on.
• Based on the sa index, the ipsec_info required for encryption is derived in the ipsec_encrypt
table and it is passed to the deparser.
• The P4I deparser sets the IPSec_info_valid bit, populates the IPSec_info, inserts a valid ESP
header and trailer, and adds an empty ESP authentication trailer. The modified packet is sent
to the inline IPsec engine.
• The Inline IPsec engine encrypts the plain text payload and fills the ESP authentication trailer.
The resultant packet is sent to the Packet Buffer.
• The P4E Parser identifies the packet and parses the ESP header. The ipsec_post_encrypt table
sets up esp iv header and UDP NAT (if enabled).
• The Out uplink port and packet MAC address are derived in the nexthop table, and the packet
is sent to the designated uplink port.
Decryption Path
Packet paths for encryption and decryption follow P4 Ingress to P4 Egress pipelines.
For the Decryption path, where a packet comes in from the uplink encrypted, the packet gets
decrypted in P4I. The decrypted packet goes to P4E where it is validated, and can be sent
directly to the uplink interface, or can be sent back to the P4 pipeline for a 'flow lookup' on inner
packet-header fields for additional services such as packet rewrites, SDN policy offload, flow
offloads, NAT, stateful firewall, or observability. See the following figure.
CPU
+
DPU
Stateful with
High Performance
X28812-110823
Stages
P4I and P4E consist of eight match-action stages, and have a parser and a deparser unit at the
beginning and end (respectively) of the pipeline. The RxDMA and TxDMA have eight match-
action stages each and do not have a parser or deparser unit but instead feed into dedicated
DMA blocks at the end of the pipeline. The SxDMA pipeline in Elba has four match-action stages
and its own DMA block, and is intended for offload unit control and other memory-to-memory
transfers, not to manipulate packets. The match-action stages in all pipelines are identical and
consist of a lookup unit called Table Engine (TE) and four Match Processing Units (MPUs) at each
stage that execute the P4 action code.
Features
The Elba DPU has the following inline IPsec encryption/decryption features:
• Encryption of encapsulated packet, anti-reply and 64-bit ESN support is not added/verified.
• Flow-based forwarding is not supported.
• AES-GSM/XTS Crypto/Decrypt
• AES CCM/AES CBC/DES/3DES/SHA1/SHA2/SHA3/HMAC
• SHA3-256
• SHA3-384/512
• PKE, DRBG, TLS 1.1 & 1.3
End-to-End Testing
You can perform end-to-end testing for the IPsec_GW reference pipeline.
As shown in the following figure, two DPUs are connected back-to-back using port eth1/2. The
other uplink ports of DSC1 & DSC2 (eth1/1) are connected to Ixia traffic generator port-1 and
port-2 respectively. The ipsec_gw P4 program runs on both DSCs. DSC-1 encrypts Ixia IP
traffic from port-1, fed to DSC-2 via port eth1/2. DSC-2 decrypts the IP packet and sends the
decrypted packet to Ixia port-2. For performance analysis, traffic should be started from Ixia
port-1 and port-2 and run symmetrically for testing bi-directional throughput.
unencrypted
E1/1
Elba DSC-1
Arm P4 Pipeline
E1/2
encrypted
E1/2
Elba DSC-2
Arm P4 Pipeline
E1/1
unencrypted
Chapter 3
Note: This chapter assumes you already have installed the developer environment. If not refer to the
following:
• The SSDK Quick Start Guide or Getting Started Guide for installation of the SSDK.
• The SSDK Getting Started Guide to setup your environment.
• A container image that constitutes the build environment and includes generic tools, such as
the Linaro Arm® cross-compile toolchain.
• A tarball with the source code of the reference pipelines and AMD-specific tools, such as the
P4 compiler and the Distributed Services Card (DSC) simulator.
• A tarball with documentation.
Directory Contents
/sw/nic SSDK main folder for build scripts
/sw/nic/rudra Relevant code and files for development
/sw/nic/rudra/src/pds-core Core Application, Environment Initialization, Starts services,
PDS Agent, gRPC Listener, PDS Layer API, and Common
Libraries
/sw/nic/rudra/src/lib Common code and files for all pipelines, LIFS, Memory Utils,
Debug, Tracing, and Interface Management
/sw/nic/rudra/src/<reference-pipeline> The reference pipelines
Directory Contents
/sw/nic/rudra/src/<reference-pipeline>/p4/ P4 Source Code for the specific pipeline
p4-16/
/sw/nic/rudra/src/conf Platform, Pipeline, and Infrastructure configuration files
/sw/nic/rudra/src/<reference-pipeline>/dp Dataplane and Pipeline specific DP application
/sw/nic/tools The developer tools
Note: This container can be used to build both x86 and Arm images.
# tools/build.sh --help
ARCH=<architecture> --p4_program <p4_program> [--clean] [--create-sim-
target]
Architectures: aarch64
x86_64 (default)
--p4_program: hello_world
flow_offload
classic_host_offload
classic_rtr flow_offload
flow_ha
ipsec_gw
sdn_policy_offload
--clean: cleaning target and build directories (optional)
--create-sim-target: build the rudra target sim
At this point both the ASIC simulator and core application are running, and the two uplink
interfaces (uplink0 and uplink1) are exposed to Linux® as tap interfaces (Eth1-1 and Eth1-2).
Validate this by running show commands such as the following:
As a part of the simulator startup, the dp-app process is started, which reads the file /sw/nic/
rudra/src/conf/ipsec_gw/config.json. This file contains instructions to create the flow
with policy and route results, and to populate specific entries into the tables. To view the file, run
the command:
cat /sw/nic/rudra/src/conf/ipsec_gw/config.json
If desired you can expose the uplink interfaces outside the container, so you can run traffic tools
from outside to inject packets to the uplinks, To do so, run the following commands on the host
of the container:
cd $TTOP/src/github.com/pensando/sw/nic/
DSC_INSTANCE=<ssdk-dev-container-name> ./rudra/test/tools/setup-uplink-
interfaces.sh
To stop the simulator environment, execute the dsc-sim-stop.sh script. This kills the
processes including the ASIC simulator and pds core application:
./tools/dsc-sim-stop.sh
To restart the simulation environment, execute the dsc-sim-restart.sh script. This restarts
all the processes including the ASIC simulator and pds core application, which is useful in
recovering from scenarios such as a process crashing:
This creates an image for the DSC/DPU as a tarball in the /sw/nic directory, for example
dsc_fw_elba_.tar.
4. List the built DSC firmware image:
ls /sw/nic/dsc_fw_elba_.tar
3. Confirm the DSC enumerates as a PCIe® device, and note the PCIe address of its
management controller:
# lspci -d 1dd8:
12:00.0 PCI bridge: Pensando Systems Inc Device 0002
13:00.0 PCI bridge: Pensando Systems Inc DSC Virtual Downstream Port
13:01.0 PCI bridge: Pensando Systems Inc DSC Virtual Downstream Port
13:02.0 PCI bridge: Pensando Systems Inc DSC Virtual Downstream Port
14:00.0 Ethernet controller: Pensando Systems Inc DSC Ethernet Controller
15:00.0 Ethernet controller: Pensando Systems Inc DSC Ethernet Controller
16:00.0 Ethernet controller: Pensando Systems Inc DSC Management
Controller
The final line shows that the driver is using is using the eth1 interface for the DSC
management controller.
6. Configure an IP address for this interface:
# ifconfig eth1 169.254.<mgmt_ctlr_pcie_bus_number>.2/24 up
where:
• <mgmt_ctlr_pcie_bus_number> is the first component of the PCIe address for the
management controller from step 3, converted from hexadecimal to decimal.
In this example, the PCIe address is 16:00.0, so the bus address is 0x16 which converts
to 22 decimal.
The command corresponding to the example output shown in step 3 is therefore:
# ifconfig eth1 169.254.22.2/24 up
8. Verify that you can ping the DSC from the host:
Note: The internal management NIC for the DSC has a default IP address of 169.254.22.1.
# ping 169.254.22.1 -c 3
PING 169.254.22.1 (169.254.22.1) 56(84) bytes of data.
64 bytes from 169.254.22.1: icmp_seq=1 ttl=64 time=0.205 ms
64 bytes from 169.254.22.1: icmp_seq=2 ttl=64 time=0.091 ms
64 bytes from 169.254.22.1: icmp_seq=3 ttl=64 time=0.085 ms
9. SSH to the DSC from the host as root with a password of pen123 :
# ssh -lroot 169.254.22.1
The authenticity of host '169.254.22.1 (169.254.22.1)' can't be
established.
ECDSA key fingerprint is SHA256:AoU0vi8BifouUOfqSg78t08JgaH7vHHBZfK58CnS
+EI.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '169.254.22.1' (ECDSA) to the list of known
hosts.
[email protected]'s password:
10. Confirm the IP address for the internal Management NIC (MNIC) interface on the DSC is
169.254.<pcie_bus_number>.1:
Note: The int_mnic0 interface is the internal MNIC net device on the DSC that is created by default.
# ifconfig int_mnic0
int_mnic0 Link encap:Ethernet HWaddr 00:AE:CD:01:C6:C4
inet addr:169.254.22.1 Bcast:169.254.22.255 Mask:255.255.255.0
inet6 addr: fe80::2ae:cdff:fe01:c6c4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:41 errors:0 dropped:0 overruns:0 frame:0
TX packets:65 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5351 (5.2 KiB) TX bytes:10066 (9.8 KiB)
11. Use scp to copy the image over the internal management NIC interface from the host
(specified by the IP address that you set in step 6) to the /data/ directory on the card:
$ scp <user>@<host_ip_address>:<SSDK_dir>/src/github.com/pensando/sw/nic/
dsc_fw_elba_.tar /data/
For example:
$ scp [email protected]:/home/jdoe/ssdk/src/github.com/pensando/sw/nic/
dsc_fw_elba_.tar /data/
In this example the PCIe addresses are 13:00.0 and 16:00.0, respectively.
4. Compile the memtun binary for the host using the code from $sw/platform/src/app/
memtun.
5. Start the memtun binary on the host:
# ./memtun -s <downstream_port_pcie_address>
169.1.<mgmt_ctlr_pcie_bus_number>.2 &
where:
• <downstream_port_pcie_address> is the PCIe address of the virtual downstream
port from step 3.
In this example it is 13:00.0.
• <mgmt_ctlr_pcie_bus_number> is the first component of the PCIe address for the
management controller from step 3, converted from hexadecimal to decimal.
In this example, the PCIe address is 16:00.0, so the bus address is 0x16 which converts
to 22 decimal.
The command corresponding to the example output shown in step 3 is therefore:
# ./memtun -s 13:00.0 169.1.22.2 &
The memtun utility is started by default on DSCs running goldfw or ssdk firmware. When
memtun starts on the host it establishes a connection with the DSC and creates a tun device
visible in ifconfig on both the host and the DSC.
6. Confirm the IP address has been set:
# ifconfig tun0
tun0 Link encap:UNSPEC HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:169.1.22.2 P-t-P:169.1.22.3
Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:2 errors:0 dropped:0 overruns:0 frame:0
TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:96 (96.0 B) TX bytes:96 (96.0 B)
7. Verify that you can ping the DSC from the host:
Note: The tun device for the DSC has a default IP address of 169.254.22.3.
# ping 169.254.22.3 -c 3
PING 169.254.22.3 (169.254.22.3) 56(84) bytes of data.
64 bytes from 169.254.22.3: icmp_seq=1 ttl=64 time=0.205 ms
64 bytes from 169.254.22.3: icmp_seq=2 ttl=64 time=0.091 ms
64 bytes from 169.254.22.3: icmp_seq=3 ttl=64 time=0.085 ms
8. SSH to the DSC from the host as root with a password of pen123 :
# ssh -lroot 169.254.22.1
The authenticity of host '169.254.22.1 (169.254.22.1)' can't be
established.
ECDSA key fingerprint is SHA256:AoU0vi8BifouUOfqSg78t08JgaH7vHHBZfK58CnS
+EI.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '169.254.22.1' (ECDSA) to the list of known
hosts.
[email protected]'s password:
9. Confirm the IP address for the tun0 interface on the DSC is 169.254.<pcie_bus_number>.3:
Note: The tun0 interface is the memtun device on the DSC that is created by default.
# ifconfig tun0
tun0 Link encap:UNSPEC HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:169.1.22.3 P-t-P:169.1.22.2
Mask:255.255.255.255
inet6 addr: fe80::b1a3:b97a:c6a3:3b12/64 Scope:Link
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:2 errors:0 dropped:0 overruns:0 frame:0
TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:96 (96.0 B) TX bytes:96 (96.0 B)
10. Use scp to copy the image over the memtun interface from the host (specified by the IP
address that you set in step 5) to the /data/ directory on the card:
$ scp <user>@<host_ip_address>:<SSDK_dir>/src/github.com/pensando/sw/nic/
dsc_fw_elba_.tar /data/
For example:
$ scp [email protected]:/home/jdoe/ssdk/src/github.com/pensando/sw/nic/
dsc_fw_elba_.tar /data/
Loading an Image
To load an image you have copied to the DSC, do the following on the DSC:
Note: You can also use the table access tools (p4ctl or, if one of the reference pipelines had been
used to create the image, pdsctl) to verify that the correct software is executed.
To do so:
• To use a different config file specify it with the -c option. For example:
start-dp-app.sh -c /nic/config/config_scale.json
To do so:
/sw/nic/rudra/docs/ipsec_gw/quickstart.md#sim-packet-test
Troubleshooting
The following files in the container are useful for troubleshooting:
Appendix A
The AMD Documentation Hub is an online tool that provides robust search and navigation for
documentation using your web browser. To access the Documentation Hub, go to https://
www.amd.com/en/search/documentation/hub.html.
Support Resources
For support resources such as Answers, Documentation, Downloads, and Forums, see Support.
References
Pensando Documents
Revision History
The following table shows the revision history for this document.
Copyright
© Copyright 2023 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, Pensando, and
combinations thereof are trademarks of Advanced Micro Devices, Inc. AMBA, AMBA Designer,
Arm, ARM1176JZ-S, CoreSight, Cortex, PrimeCell, Mali, and MPCore are trademarks of Arm
Limited in the US and/or elsewhere. PCI, PCIe, and PCI Express are trademarks of PCI-SIG and
used under license. Other product names used in this publication are for identification purposes
only and may be trademarks of their respective companies.