npp-linux-03-gateway
npp-linux-03-gateway
Firewall
Match-Actions Tables
• Match - compare input packets to the rule to see if there’s a match
• e.g., dstIP = 1.2.3.0/24 and dstPort=80
• Action – what to do with the packet if there is a match
• E.g., drop, allow
Match Action
dstIP = 1.2.3.0/24 and dstPort=80 drop
Match Action
dstIP = 1.2.3.0/24 and dstPort=80 Allow
…
default Drop
10.0.0.3
Use Case 1: Static Mapping - Internal to External
123.11.11.11 10.0.0.1
e.g., each server gets 123.11.11.12 10.0.0.2
public and private IP
123.11.11.11
10.0.1.1
(Minecraft server)
Src Dest Src Dest Src Dest Src Dest
123.11.11.11 151.101.2.133 25565 80 10.0.1.1 151.101.2.133 25565 1234
IP TCP IP TCP
iptables
Course: Networking Principles in Practice – Linux Networking
Module: Creating a Gateway with Linux
Linux netfilter framework
(https://www.netfilter.org/)
User space utilities (iptables, nftables) that can configure the Linux
kernel’s filtering framework
iptables User
space
Packet
Kernel
processing
Examples
iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables is used to set up, maintain, and inspect the tables of IP packet
filter rules in the Linux kernel. Several different tables may be defined.
Each table contains a number of built-in chains and may also contain
user-defined chains.
Each chain is a list of rules which can match a set of packets.
iptables (ADD/REM/CHANGE) (TABLE) (CHAIN) (RULE)
Rules
• Match on some criteria
• Take some action
(MATCH) (ACTION)
Rules: Matching - Basic
-p, --protocol [!] protocol
-s, --source [!] address[/mask]
-d, --destination [!] address[/mask]
-i, --in-interface [!] name
-o, --out-interface [!] name
Rules: Matching - Extensions
• If -p or --protocol is used, protocol specific extensions get loaded.
• tcp
--destination-port [!] port[:port]
-p tcp --destination-port 8080
--tcp-flags [!] mask comp
-p tcp --tcp-flags SYN,ACK,FIN,RST SYN
iptables is used to set up, maintain, and inspect the tables of IP packet
filter rules in the Linux kernel. Several different tables may be defined.
Each table contains a number of built-in chains and may also contain
user-defined chains.
Each chain is a list of rules which can match a set of packets.
iptables (ADD/REM/CHANGE) (TABLE) (CHAIN) (RULE)
Tables – 4 defined Tables
• filter – default table
• nat - This table is consulted when a packet that creates a new
connection is encountered.
• mangle -used for specialized packet alteration
• raw - used mainly for configuring exemptions from connection
tracking
Packet Traversal in Kernel
https://wiki.nftables.org/wiki-nftables/index.php/Netfilter_hooks
Packet Traversal in Kernel
zoom in on IP layer
https://wiki.nftables.org/wiki-nftables/index.php/Netfilter_hooks
Chains
• Chain is just a set of rules that are evaluated sequentially
• Rules can be terminating or non-terminating
• Recall: -j, --jump target
The target can be a user-defined chain (other than the one this rule is
in), one of the special builtin targets which decide the fate of the
packet immediately, or an extension
Chain 1 Chain 2
Rule 1 Rule 1
Rule 2 Rule 2
Rule 3 Rule 3
… …
At each hook point, there’s pre-defined chains
Each Table Consists of Multiple Chains (named for the hook point)
• Filter – input, output, forward
• Nat – pre-routing, input, output, post-routing
• Mangle – all
Specifying Tables / Chains
• -t <table>
• -A <chain> (to add an entry)
ext1 int1
gw
ext2 int2
Load Balancing
Course: Networking Principles in Practice – Linux Networking
Module: Creating a Gateway with Linux
What is Load Balancing
• A load balancer serves as the point of entry for a service, and
directs traffic to one of N servers that can handle the request
• Used for both scaling and for resilience
• Likely involves NAT
Service
Clients
Load Balancing Algorithm
• Client initiates (TCP) connection
• Load balancer must select which server to forward the request to
Service
server = lb_alg()
Clients
Important Considerations: Server Set
• Which servers are part of the set to choose from
• Set is statically assigned
• May include liveness checks (through heartbeats)
Servers Live
10.0.0.1 Yes
10.0.0.2 Yes Servers
10.0.0.3 No
heartbeat
Important Considerations: Flow Affinity
• TCP is stateful, so need to ensure all packets from the same flow (TCP
connection) go to the same server
• First packet – algorithm will select server
• IP of server called VIP (virtual IP)
• Subsequent packets – look in table
lb_alg(): Servers
src = (srcIP, srcPort)
server = hash(src)
client
Alg 2: Round Robin
• Server selection iterates through each server in order (1, 2, 3, 1, 2,…)
• Ensures balanced load, assuming each request is roughly same load
• Load balancer needs to keep state for flow affinity
client
Alg 3: Least Connections
• Select server based on which has least number of active connections
• Takes into account that some requests may be longer, but assumes
each connection imposes similar load on server
• Load balancer needs to keep state (flow affinity and algorithm)
lb_alg():
server = min_active(conns) Servers
conns[server] = conns[server]+1
client
Layer 4 vs Layer 7 Load Balancing
• Layer 4 considers network (IP) and transport (TCP) headers
• Also called a network load balancer
Servers
client
Two Parts
• Part 1: define the service
• Part 2: define the servers
Servers
client
Define the Service
ipvsadm COMMAND [protocol] service-address [scheduling-method]
[persistence options]
• COMMAND
• -A or --add-service
• -E or --edit-service
• -D or --delete-service
Running example:
ipvsadm -A
Define the Service
ipvsadm COMMAND [protocol] service-address [scheduling-method]
[persistence options]
• Protocol
• -t or --tcp-service
• -u or --udp-service
Running example:
ipvsadm -A -t
Define the Service
ipvsadm COMMAND [protocol] service-address [scheduling-method]
[persistence options]
Running example:
ipvsadm -A -t 207.175.44.110:80
Define the Service
ipvsadm COMMAND [protocol] service-address [scheduling-method]
[persistence options]
• scheduling-method
• -s or --scheduler scheduling-method
• rr – round robin
• lc – least connections
• sh – source hashing
Running example:
ipvsadm -A -t 207.175.44.110:80 -s rr
Adding a Server
ipvsadm command [protocol] service-address server-address [packet-
forwarding-method] [weight options]
Command:
• -a or --add-server
• -e or --edit-server
• -d or --delete-server
Running example:
ipvsadm -a
Adding a Server
ipvsadm command [protocol] service-address server-address [packet-
forwarding-method] [weight options]
service-address:
• Match whatever specified for adding the service
Running example:
ipvsadm -a -t 207.175.44.110:80
Adding a Server
ipvsadm command [protocol] service-address server-address [packet-
forwarding-method] [weight options]
Running example:
ipvsadm -a -t 207.175.44.110:80
Adding a Server – Packet Forwarding Method
packet-forwarding-method
• -g, --gatewaying = Use gatewaying (direct routing). This is the default.
• -i, --ipip = Use ipip encapsulation (tunneling).
• -m, --masquerading = Use masquerading (network access translation, or NAT).
Servers
client
Servers
client
Aside: Direct Server Return (DSR)
When a server replies to the request how is traffic forwarded:
• Option 1: via the Load Balancer
• Option 2: directly to the client (bypassing the load balancer)
Servers
client
Running example:
ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.1:80 -m
Adding a Server
ipvsadm command [protocol] service-address server-address [packet-
forwarding-method] [weight options]
Weight options
• -w, --weight weight
• Weight is an integer specifying capacity of server (used in scheduling
algorithms with weight – e.g., weighted round robin)
Running example:
ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.1:80 -m
Practice on your own
• Vagrant
• Container Lab
• Set of commands
e.g., run a service on int1 and int2 (using nc), balance between them
ext1 int1
gw
ext2 int2
Quality of Service
Course: Networking Principles in Practice – Linux Networking
Module: Creating a Gateway with Linux
Quality of Service
Quality of Service (QoS) refers to the overall performance of a service
Classification
Classification Mechanisms
• Based on packet headers
• Look at Layer 3 and 4 headers to identify traffic
• e.g., port 80 = web traffic
• e.g., IP source 1.2.3.0/24 pays more for service
• Based on Deep Packet Inspection (inspect packet payload)
• Look at the payload for know headers or bit patterns
• Computationally expensive, but helps identify traffic that can’t be at L3/4
• Based on fingerprinting
• Protocols, applications, operating systems all have fingerprints, such as
distribution of packet sizes, inter-packet gap, etc. so performing statistical
analysis over many packets can identify traffic
Traffic Shaping
• Goal is for traffic of a given class to conform to some shape
• Two key properties:
• Traffic rate
• Burst rate
Allowed burst
Traffic rate
Allowed rate
Time
Token Bucket – Rate Limiting
• Tokens are added to a bucket at rate R
• A packet can be transmitted if there is a token in the bucket
Rate
(R)
… Pkt Pkt Pkt Pkt Pkt Token Pkt Pkt Pkt Pkt
Check
Token Bucket - Bursts
• The Bucket size (B), determines the burst rate
• e.g., say there’s no traffic for a while, tokens start filling up the bucket
Rate
(R)
Burst Size
(B)
Token
Pkt Pkt Pkt
Check
Scheduling
Two main functions:
• Given a queue that is starting to fill up, determine what/when to drop
• Given multiple queues, determine which packet to transmit next
Queues
• Queue Size:
• Too big – long delays with congestion X
• What to Drop?
• Tail Drop – when queue fills up, just drop from the tail
X
Problem with Tail Drop
• Consider TCP – it uses packet loss as an
indication of congestion, and then
slows sending rate.
• With tail-drop, packet loss would occur
when congestion reached peak.
• Then a round trip time or a timeout
amount of time would be when TCP
detects… in the meantime, many packets
were sent (and likely dropped)
• RED (Random Early Detection), CoDel,
FQ-CoDel, etc.
• Drop packets earlier to signal to TCP earlier
What Queue to Transmit From
• Strict Priority – If top queue has a packet send that, else look in next…
• Round Robin – Cycle through each queue, transmitting one packet
from each
• Fair Queuing - mimic a bit-per-bit multiplexing by computing
theoretical departure date for each packet
Classification
Recap: QoS Management
• Classification
• Inspecting packets to identify what class of traffic they belong to
• Shaping
• Ensuring a given class of traffic conforms to desired properties such as rates
• Scheduling
• Determining which packet to transmit next and which packets to drop
NIC
Host
Ingress
Policing Classification
Policing
Policing
tc
Course: Networking Principles in Practice – Linux Networking
Module: Creating a Gateway with Linux
tc – Linux Traffic Control Utility
https://man7.org/linux/man-pages/man8/tc.8.html
Key constructs:
• qdisc
• class
• filter
Queuing Discipline (qdisc)
• Main object for shaping / scheduling
• Every interface must have an ingress and egress qdisc
• Packets go into a qdisc and then goes out the other side
• Simple example: pfifo_fast – first in first out
• Default used by Linux (if you don’t configure anything)
Pic: https://medium.com/criteo-engineering/demystification-of-tc-de3dfe4067c2
Adding/Removing a qdisc
tc qdisc [add | delete | replace…] dev DEV \
Example:
tc qdisc add dev eth0 …
Adding/Removing a qdisc
tc qdisc [add | delete | replace…] dev DEV \
[ parent qdisc-id | root ] [ handle qdisc-id ] \
• Then need to specify the qdisc to add along with its parameters
(sfq, tbf, pfifo_fast, codel)
qdisc params
Example:
tc qdisc add dev eth0 root handle 1: tbf rate 1mbit burst 32kbit latency 400ms
Example qdisc (1)
pfifo pfifo_fast
bfifo
https://man7.org/linux/man-pages/man8/tc-bfifo.8.html
https://man7.org/linux/man-pages/man8/tc-pfifo_fast.8.html
Pics: https://tldp.org/HOWTO/Traffic-Control-HOWTO/classless-qdiscs.html
Example qdisc (2)
sfq tbf
https://man7.org/linux/man-pages/man8/tc-sfq.8.html
https://man7.org/linux/man-pages/man8/tc-tbf.8.html
Pics: https://tldp.org/HOWTO/Traffic-Control-HOWTO/classless-qdiscs.html
Classes
• qdiscs can be classless or classful
• Examples on previous two slides were all classless
Pic: https://medium.com/criteo-engineering/demystification-of-tc-de3dfe4067c2
Adding / Removing classes
tc [ OPTIONS ] class [ add | change | replace | delete | show ] dev DEV
parent qdisc-id [ classid class-id ] qdisc [ qdisc specific parameters ]
99:1
htb
tc class add dev eth0 parent 99: classid 99:1 htb rate 100mbit rate class
100Mbit
99:10 99:11
tc class add dev eth0 parent 99:1 classid 99:10 htb rate 30mbit htb htb
rate rate class
tc class add dev eth0 parent 99:1 classid 99:11 htb rate 70mbit
30Mbit 70Mbit
99:100 99:101
tc qdisc add dev eth0 parent 99:10 fq_codel
fq_codel qdisc
tc qdisc add dev eth0 parent 99:11 fq_codel fq_codel
(optional)
Filters: specifying which traffic is in which class
tc filter [ add | replace | delete | … ] dev DEV \
[ parent qdisc-id | root ] [ handle filter-id ] \
protocol protocol prio priority \
filtertype [ filtertype specific parameters ]
Example:
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip src 1.2.3.4 classid 99:10
Filters: specifying which traffic is in which class
tc filter [ add | replace | delete | … ] dev DEV \
[ parent qdisc-id | root ] [ handle filter-id ] \
protocol protocol prio priority \
filtertype [ filtertype specific parameters ]
Example:
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip src 1.2.3.4 classid 99:10
Match Filters
• u32 Generic filtering on arbitrary packet data, assisted by syntax to
abstract common operations.
https://man7.org/linux/man-pages/man8/tc-u32.8.html
Example:
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip src 1.2.3.4 classid 99:10
Another Interesting qdisc: netem
• Provides Network Emulation functionality for testing protocols by
emulating the properties of real-world networks.
https://man7.org/linux/man-pages/man8/tc-netem.8.html
• Add delays to packets
• Add packet loss with some probability
•…
Example use of netem
• Say I created a new application or protocol and I want to test out how
it would behave under different conditions on a wireless network
App
App App
Wireless
Network
Example use of netem
• Say I created a new application or protocol and I want to test out how
it would behave under different conditions on a wireless network
ip route
App
App App
tc
eth0 eth1
tc qdisc replace dev eth0 root netem delay 100ms 12ms 10%
tc qdisc replace dev eth1 root netem delay 100ms 12ms 10%
Practice on your own
• Vagrant
• Container Lab
• Set of commands
e.g., run iperf on ext1 and int1 and control the rate of traffic
ext1 int1
gw
ext2 int2