10 1 1 103 316 PDF
10 1 1 103 316 PDF
Linux has had multiple TCP congestion meth- At present there is an effort to abstract much of
ods added to it and the subsequent growth of the TCP codebase and move it into a generic IP
the codebase has made development difficult. implementation. The reasoning behind this is
As a result a congestion control framework has to facilitate the implementation of new proto-
been introduced. cols like DCCP and to improve the implemen-
tation of existing protocols such as SCTP.
This paper also outlines how the congestion
control framework was used to implement The rationale behind Datagram Congestion
TCP–Nice. TCP–Nice is an experimental con- Control Protocol (DCCP) and the current sta-
gestion control mechanism that uses less than tus of the protocol is also discussed. DCCP is
it’s fair share of bandwidth when there is con- a new IP based transport protocol which aims
gestion, much like nice does for CPU usage by to replace TCP and UDP in some uses. The
processes in the Unix operating system. implementation of the DCCP protocol in the
Linux kernel is outlined.
1 Introduction
2 TCP Congestion
Data can be stored by allocating data ref- ... existing reno code follows
erenced through the inet csk ca function
which points to an area of private data. In In this case the code is modeled on TCP Reno
TCP–Nice this is used to record the last loss but keeps snd ssthresh and snd cwnd at two if
time through the tcp nice data structure. there has been loss in the last LOSS TIME CLAMP
seconds.
The following code is run when the the TCP
congestion state changes: Below are the functions for setting the slow
static void tcp nice state(struct sock ∗sk, start threshold and minimum congestion win-
u8 new state) dow:
{ /∗ Slow start threshold is quarter the conges-
struct tcp sock ∗tp = tcp sk(sk); tion window (min 2) ∗/
struct tcp nice data ∗ca = u32 tcp nice ssthresh(struct tcp sock ∗tp)
inet csk ca(sk); {
return max(tp−>snd cwnd >> 2U, 2U);
if (new state == TCP CA Loss) { }
tp−>snd ssthresh = 2;
tp−>snd cwnd = 2; /∗ Lower bound on congestion window. ∗/
ca−>last loss = jiffies; u32 tcp nice min cwnd(struct tcp sock ∗tp)
} {
} return 2;
}
In this case the slow start threshold
(snd ssthresh) and congestion window In implementing TCP–Nice it was decided to
(snd cwnd) are reduced to two and the time of use a slow start threshold of a quarter the con-
the loss is recorded. gestion window instead of a half as per Reno
and to set min cwnd to two.
The congestion avoidance function is imple-
mented as follows: The implementation of TCP–Nice shows that it
#define LOSS TIME CLAMP 4 is relatively simple to implement a new conges-
tion control mechanism in the kernel because
void tcp nice cong avoid(struct sock ∗sk, of the framework that has been introduced into
u32 ack, u32 rtt, u32 in flight, int flag) the Linux kernel.
{
struct tcp sock ∗tp = tcp sk(sk);
struct tcp nice data ∗ca = 3.2 Results
inet csk ca(sk);
if (in flight < tp−>snd cwnd) TCP–Nice was tested with competing TCP
return; flows to see how the flow was reduced and the
time taken to recover. With the code imple-
if ((jiffies − ca−>last loss) < mented as per the previous section, or slight
variations of this, all of the desired characteris- sending rate available rather than halving
tics were not achieved. the window in response to congestion as
TCP can do.
To achieve all of the characteristics desired will
require further experimentation and/or mathe-
matical modelling. 4.1 History
• CCID2 [5] is TCP–like congestion control One of the challenges of implementing CCID3
and implements a congestion control mech- was the implementing of the mathematical cal-
anism based on TCP. culation of the rate. The challenge was two fold
— the lack of 64 bit integer operations on 32
• CCID3 [6] is based on TCP Friendly Rate bit architecture and the inability to use float-
Control (TFRC) [8]. TFRC aims to pro- ing point instructions in the kernel. This was
vide a smoother response to congestion resolved by converting the Lulea floating point
than TCP while still using a “fair” share lookup tables to an integer based one with some
of bandwidth compared to other flows. reasonably complex manipulations. As part
TFRC achieves this goal by estimating the of this a large amount of testing was carried
out which showed that most implementations to achieve full compliance with the DCCP spec-
to date had implemented the rate calculation ification. The two major things that need to
incorrectly. occur for this to happen are CCID2 implemen-
tation and feature negotiation. Further inter-
It has proved relatively trivial to port user level operability testing needs to be carried out once
applications to DCCP. Netcat, iperf, ttcp, ssh other operating systems implement DCCP.
all have had DCCP support added relatively
quickly. Programs that depend on the format Devices that implement NAT will have to be
of the packet e.g. tcpdump, ethereal have taken modified to allow DCCP to traverse them. The
more effort. implementation of this will be similar to the im-
plementation of TCP or UDP NAT. To imple-
For applications to make full use of the features ment NAT, port mapping will need to be put in
available in DCCP such as rate and loss feed- place and the checksum recalculated as DCCP
back a new API will need to be developed. Fur- checksums the pseudo–header in the same way
ther research is being carried out in this area by as TCP. This should be a priority to be imple-
the author [19]. mented in future versions of the Linux kernel.
6 Testing
#include <linux/config.h>
A Structure of tcp congestion ops #include <linux/module.h>
#include <net/tcp.h>
struct tcp congestion ops {
struct list head list; #define LOSS TIME CLAMP 4
[1] R. Allman, V. Paxson, and W. Stevens. [15] E. Kohler, M. Handley, and S. Floyd.
Tcp congestion, 1999. Datagram congestion control protocol
(dccp), March 2005.
[2] Lawrence S. Brakmo and Larry L. Peter-
son. Tcp vegas: End to end congestion [16] Yee-Ting Li and Doug Leith. Bictcp im-
avoidance on a global internet. IEEE Jour- plementation in linux kernels. Technical
nal on Selected Areas in Communications, report, Hamilton Institute, February 2004.
13(8):1465–1480, 1995.
[17] Dccp projects, Accessed 2005.
[3] Carlo Caini and Rosario Firrincieli. Tcp
[18] Saverio Mascolo, Claudio Casetti, Mario
hybla: a tcp enhancement for heteroge-
Gerla, M. Y. Sanadidi, and Ren Wang.
neous networks. International Journal of
Tcp westwood: Bandwidth estimation for
Satellite Communications and Networking,
enhanced transport over wireless links. In
22(5):547–566, August 2004.
MobiCom ’01: Proceedings of the 7th an-
nual international conference on Mobile
[4] S. Floyd. Highspeed tcp for large conges-
computing and networking, pages 287–297,
tion windows, 2002.
New York, NY, USA, 2001. ACM Press.
[5] S. Floyd and E. Kohler. Profile for dccp [19] I. McDonald. Phd research proposal: Con-
congestion control id 2: Tcp-like conges- gestion control for real time media appli-
tion control, Accessed 2005. cations, 2005.
[6] S. Floyd, E. Kohler, and J. Padhye. Pro- [20] Dccp implementation by patrick mcmanus,
file for dccp congestion control id 3: Tfrc Accessed 2005.
congestion control, Accessed 2005.
[21] Arnaldo C. Melo. Tcpfying the poor
[7] Leith S. Hamilton. H-tcp: Tcp for high- cousins. In Ottawa Linux Symposium, July
speed and long-distance networks. 2004.
[22] Arnaldo C. Melo. Dccp on linux. In Ottawa
Linux Symposium, pages 305–311, 2005.