TCP Tuning On Sender Side
TCP Tuning On Sender Side
TCP is a reliable transport layer protocol that offers a full duplex connection byte stream
service. The bandwidth of TCP makes it appropriate for wide area IP networks where there is
a higher chance of packet loss or reordering. What really complicates TCP are the flow control
and congestion control mechanisms. These mechanisms often interfere with each other, so
proper tuning is critical for high-performance networks. Here we describe in detail how to tune
TCP, depending on the actual deployment.
Startup Phase
In startup phase, TCP sender starts to initially on a particular connection. One of the issues
with a new connection is that there is no information about the capabilities of the network pipe.
Start injecting packets blindly at a faster and faster rate until we understand the capabilities and
adjust accordingly. Manual TCP tuning is required to change macro behavior, such as when
having very slow pipes as in wireless or very fast pipes such as 10 Gbit/sec. Sending an initial
maximum burst has proven disastrous. It is better to slowly increase the rate at which traffic is
injected based on how well the traffic is absorbed.
During this phase, the congestion window is much smaller than the receive window. This
means the sender controls the traffic injected into the receiver by computing the congestion
window and capping the injected traffic amount by the size of the congestion window. Any
minor bursts can be absorbed by queues. there are three important TCP tunable parameters:
• tcp_slow_start_initial: sets up the initial congestion window just after the socket
connection is established.
• tcp_slow_start_after_idle: initializes the congestion window after a period of inactivity.
Since there is some knowledge now about the capabilities of the network, we can take a
shortcut to grow the congestion window and not start from zero, which takes an
unnecessarily conservative approach.
• tcp_cwnd_max: places a cap on the running maximum congestion window. If the receive
window grows, then tcp_cwnd_max grows to the receive window size.
In different types of networks, these values can be tuned slightly to impact the rate at which
you can ramp up. If you have a small network pipe, you want to reduce the packet flow,
whereas if you have a large pipe, you can fill it up faster and inject packets more aggressively.
Steady State Phase
after the connection has stabilized and completed the initial startup phase, the socket
connection reaches a phase that is fairly steady and tuning is limited to reducing delays due to
network and client congestion. An average condition must be used because there are always
some fluctuations in the network and client data that can be absorbed. Tuning TCP in this
phase, we look at the following network properties:
• Propagation Delay – This is primarily influenced by distance. This is the time it takes one
packet to traverse the network. In WANs, tuning is required to keep the pipe as full as
possible, increasing the allowable outstanding packets.
• Link Speed – This is the bandwidth of the network pipe. Tuning guidelines for link speeds
from 56kbit/sec dial-up connections differ from 10Gbit/sec optical local area networks
(LANs).
TCP Adjustment
TCP tuning techniques adjust the network congestion avoidance parameters of Transmission
Control Protocol (TCP) connections over high-bandwidth, high-latency networks.
Bandwidth-delay product (BDP) is a term primarily used in conjunction with TCP to refer to
the number of bytes necessary to fill a TCP "path", i.e. it is equal to the maximum number of
simultaneous bits in transit between the transmitter and the receiver.
High performance networks have very large BDPs. To give a practical example, two nodes
communicating over a geostationary satellite link with a round-trip delay time (or round-trip
time, RTT) of 0.5 seconds and a bandwidth of 10 Gbit/s can have up to 0.5×1010 bits, i.e., 5
Gbit = 625 MB of unacknowledged data in flight. Despite having much lower latencies than
satellite links, even terrestrial fiber links can have very high BDPs because their link capacity
is so large. Operating systems and protocols designed as recently as a few years ago when
networks were slower were tuned for BDPs of orders of magnitude smaller, with implications
for limited achievable performance.
Original TCP configurations supported TCP receive window size buffers of up to 65,535
(64 KiB - 1) bytes, which was adequate for slow links or links with small RTTs. Larger
buffers are required by the high performance options described below.
Buffering is used throughout high performance network systems to handle delays in the
system. In general, buffer size will need to be scaled proportionally to the amount of data "in
flight" at any time. For very high performance applications that are not sensitive to network
delays, it is possible to interpose large end to end buffering delays by putting in intermediate
data storage points in an end to end system, and then to use automated and scheduled non-
real-time data transfers to get the data to their final endpoints.
where MSS is the maximum segment size and Ploss is the probability of packet loss. If packet
loss is so rare that the TCP window becomes regularly fully extended, this formula doesn't
apply.
Network Characteristics
As the Internet has progressed, user experience has always been the most important factor. The
new breadth of access technologies leads to a wide spread of network characteristics.
Nowadays, many network access has shifted from wired networks to 3G and 4G cellular
networks.
Modern network traffic is harder to control because packet loss does not necessarily mean
congestion in the networks, and congestion does not necessarily mean packet loss. As shown
in above table, 3G and 4G networks both exhibit different types of behavior based on their
characteristics, but a server may view the different aspects as congestion. This means that an
algorithm cannot only focus on packet loss or latency for determining congestion. Other
modern access technologies, such as fiber to the home (FttH) and WiFi, expand upon the
characteristics represented above by 3G and 4G, making congestion control even more
difficult. With different access technologies having such different characteristics, a variety of
congestion control algorithms has been developed in an attempt to accommodate the various
networks.
Packet-Loss Algorithms
Initial algorithms, such as TCP Reno, use packet loss to determine when to reduce the
congestion window, which influences the send rate. TCP Reno increases the send rate and
congestion window by 1 MSS (maximum segment size) until it perceives packet loss. Once
this occurs, TCP Reno slows down and cuts the window in half. However, as established in the
previous section, modern networks may have packet loss with no congestion, so this algorithm
is not as applicable.
Bandwidth-Estimation Algorithms
The next generation of algorithms is based on bandwidth estimation. These algorithms change
the transmission rate depending on the estimated bandwidth at the time of packet loss. TCP
Westwood and its successor, TCP Westwood+, are both bandwidth-estimating algorithms, and
have higher throughput and better fairness over wireless links when compared to TCP Reno.
However, these algorithms do not perform well with smaller buffers or quality of service (QoS)
policies.
Latency-Based Algorithms
The latest congestion control algorithms are latency-based, which means that they determine
how to change the send rate by analyzing changes in round-trip time (RTT). These algorithms
attempt to prevent congestion before it begins, thus minimizing queuing delay at the cost of
goodput (the amount of useful information transferred per second). An example of latency-
based algorithms is TCP Vegas. TCP Vegas is heavily dependent upon an accurate calculation
of a base RTT value, which is how it determines the transmission delay of the network when
buffers are empty. Using the base RTT, TCP Vegas then estimates the amount of buffering in
the network by comparing the base RTT to the current RTT. If the base RTT estimation is too
low, the network will not be optimally used; if it is too high, TCP Vegas may overload the
network. Also, as mentioned earlier, large latency values do not necessarily mean congestion
in some networks, such as 4G.
By knowing the traffic characteristics and keeping the current inadequate algorithms in mind,
service providers can implement an ideal TCP stack.
Figure1: Comparison of real network tests between three carriers of TCP High Speed,
TCP Illinois, and TCP Woodside algorithms. TCP Woodside performs particularly well.
o Buffer Bloat
Buffer bloat occurs when too many packets are buffered, increasing queuing delay and
jitter in the network. Buffer bloat leads to performance issues by impacting interactive
and real-time applications. It also interferes with the RTT calculation and negatively
impacts retransmission behaviors. Thus, minimizing buffer bloat is ideal for an
optimized TCP stack. Loss-based algorithms fail to minimize buffer bloat because they
react after packets have been lost, which only happens once a buffer has been filled.
These algorithms fail to lower the send rate and allow the buffer to drain. Instead, the
algorithms choose rates that maintain the filled buffer.
Buffer bloat can be avoided by pacing the flow of data transmitted across the network. By
knowing the speed at which different flows are being sent, the stack can control how quickly
to send the packets through to the end device. This allows the buffers to adjust up without being
overfilled. As a result, inconsistent traffic behaviors and packet loss due to network congestion
are prevented.
o Flow Fairness
Fairness between flows ensures that no one user’s traffic dominates the network to the
detriment of other users. Delay-based algorithms fail to fulfill this criteria because loss-
based flows will fill all of the buffers. This leads to the delay-based flows backing off
and ultimately slowing down to a trickle.
Rate pacing not only help with buffer bloat, but it also improves the fairness across flows.
Without rate pacing, packets are sent immediately and consecutively. Having two flows at the
same time means one flow will see different network conditions than the other flow, usually
with respect to congestion. These conditions will affect the behavior of each flow.
Sometimes one flow has more bandwidth and sends more information. However, the next
second, another flow may gain that bandwidth and stop the flow of others.
Controlling the speed at which packets are sent on a connection allows gaps to occur between
packets on any individual flow. Instead of both flows attempting to send consecutive packets
that become intermixed, one flow will send a packet, and the second flow can then send another
packet within the time gap of the first flow. This behavior changes how the two flows see the
network as well. Rather than one flow seeing an open network and the other seeing a congested
network, both flows will likely recognize similar congestion conditions and be able to share the
bandwidth more efficiently.