Transmission Control Protocol: Introduction To TCP, The Internet'S Standard Transport Protocol
Transmission Control Protocol: Introduction To TCP, The Internet'S Standard Transport Protocol
com
TCP
TCP - Transmission Control Protocol
TRANSMISSION CONTROL
PROTOCOL
INTRODUCTION TO TCP, THE INTERNET'S
STANDARD TRANSPORT PROTOCOL
Peter R. Egli
INDIGOO.COM1/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
Contents
1. Transport layer and sockets
2. TCP (RFC793) overview
3. Transport Service Access Point (TSAP) addresses
4. TCP Connection Establishment
5. TCP Connection Release
6. TCP Flow Control
7. TCP Error Control
8. TCP Congestion Control RFC2001
9. TCP Persist Timer
10. TCP Keepalive Timer – TCP connection supervision
11. TCP Header Flags
12. TCP Header Options
13. TCP Throughput / performance considerations
14. A last word on „guaranteed delivery“
2/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
1. Transport layer and sockets
The transport layer is made accessible to applications through the socket layer (API).
The transport layer runs in kernel space (Operating System) while application processes run
in user space.
User Space
OSI Layer > 4 Upper Layers TSAPs Data units: (application)
Application (sockets)
Application
message
Socket = API
OSI Layer 4
Transport Layer TCP/UDP TCP Segment
3/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
2. TCP (RFC793) overview
TCP is a byte stream oriented transmission protocol:
App App
300 600
500 Application Application 210
writes reads
Socket 150 140 Socket
interface interface
TCP TCP
100
700 TCP Segments
150
N.B.: The size of application data chunks (data units passed over socket interface)
may be different on the sending and receiving side; the segments sent by TCP may
again have different sizes.
TCP error control provides reliable transmission (packet order preservation, retransmissions
in case of transmission errors and packet loss).
TCP uses flow control to maximize throughput and avoid packet loss.
Congestion control mechanisms allow TCP to react and recover from network congestion.
4/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
3. Transport Service Access Point (TSAP) addresses (1/3)
TSAPs are identified by 16bit port numbers. 65536 port numbers are available (0...65535, but
0 is never used).
IP Header
TCP Header
5/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
3. Transport Service Access Point (TSAP) addresses (2/3)
Each TSAP (TCP port) is bound to at most 1 socket, i.e. a TCP port can not be opened
multiple times.
Application Server
process Application Layer process
TSAP 1208 TSAP 31452
(port number) (port number)
NSAP TCP Transport Layer TCP NSAP
6/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
3. Transport Service Access Point (TSAP) addresses (3/3)
There are different definitions for the port ranges:
ICANN (former IANA) Well-known ports [1, 1023] (Standard ports, e.g. POP3 110)
Registered ports [1024, 49151]
Dynamic/private ports [49152, 65535]
7/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
4. TCP Connection Establishment (1/4)
A TCP connection is established with 3 TCP packets (segments) going back and forth.
9/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
4. TCP Connection Establishment (3/4)
If no server is listening on the addressed port number TCP rejects the connection
and sends back a RST (reset) packet (TCP segment where RST bit = 1).
Host 1 Host 2
SYN (SEQ=x)
RST
10/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
4. TCP Connection Establishment (4/4)
A TCP connection is identified by the quadruple source/destination IP address and
source/destination port address.
If only one of the addresses of this quadruple is different the quadruple identifies a different
TCP connection.
Host 1
TCP port = 3544
208.1.2.3 Host 3
Host 1
TCP port = 3545 TCP port = 80
208.1.2.3 177.44.4.2
Host 2
TCP port = 37659
17.6.5.4
Connection 1: 208.1.2.3 / 3544 / 177.44.4.2 / 80
Connection 2: 208.1.2.3 / 3545 / 177.44.4.2 / 80
Connection 3: 17.6.5.4 / 37659 / 177.44.4.2 / 80
11/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
5. TCP Connection Release (1/2)
The 2 half-duplex connections are closed independently of each other.
Host 1 Both half-duplex Host 2
connections established
Half-close: only one half-duplex connection is closed (still traffic in the other direction).
FIN segments occupy 1 number in the sequence number space (as do SYN segments)!
12/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
5. TCP Connection Release (2/2)
Different scenarios as to how both half-duplex connections are closed:
Both half-duplex Both half-duplex Host 2
Normal 4-way Host 1 connections established
Host 2
3-way Host 1 connections established
close: close:
FIN FIN
ACK ACK
Both half-duplex
Host 1
Both half-duplex
Host 2
Few applications use half-close,
Host 1 Host 2
connections established connections established e.g. UNIX rsh command:
Simultaneous Half-
#rsh <host> sort < datafile
close: FIN FIN close: FIN This command is executed remotely.
ACK of FIN The command needs all input from
Host1. The closing of the connection
Data
Host1 Host2 is the only way to
ACK ACK ACK (Data) tell Host2 that it can start executing
the command.
FIN
The output of the command is sent
back to Host1 through the still
ACK of FIN existing half-duplex connecion
Host2 Host1.
13/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
6. TCP Flow Control (1/10)
Sliding window mechanism: ISN Initial
Window size
Sequence Number
Unlike lock-step protocols, TCP allows advertisment
data burst for maximizing throughput.
The receiver advertises the size of receive buffer.
The sequence and acknowledge numbers are
per byte (not per segment/packet).
The receiver‘s ack number is the number of the
next byte it expects; this implicitly acknowledges
all previously received bytes. Thus acks are cumulative,
Ack=X acknowledges all bytes up to and including
X-1.
TCP Header
14/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
6. TCP Flow Control (2/10)
3-way handshake for connection establishment. Through corresponding socket calls (indicated by ‚socket()..‘) host ‚A‘ and ‚B‘
open a TCP connection (host ‚A‘ performs an active open while host ‚B‘ listens for an incoming connection request). Host ‚A‘
1 and ‚B‘ exchange and acknowledge each other the sequence numbers (ISN Initial Sequence Number). Host ‚A‘ has a receive
buffer for 6000 bytes and announces this with Win=6000. Host ‚B‘ has a receive buffer for 10000 bytes and announces this with
Win=10000. Note that the SYNs occupy 1 number in the sequence number space thus the first data byte has Seq=ISN+1.
Application Process (AP) ‚A‘ writes 2KB into TCP. These 2KB are stored in the transmit buffer (Tx buffer). The data remains
2 in the Tx buffer until its reception is acknowledged by TCP ‚B‘. In case of packet loss TCP ‚A‘ has the data still in the
Tx buffer for retransmissions.
TCP ‚A‘ sends a first chunk of 1500 bytes as one TCP segment. Note that Seq=1001 = ISN+1. These 1500 bytes are stored
3 in TCP ‚B‘s receive buffer.
TCP ‚A‘ sends a chunk of 500 bytes. Again, these 500 bytes are stored in TCP ‚B‘s receive buffer (along with the previous
4 1500 bytes). The sequence number is Seq=2501 = previous sequence number + size of previous data chunk.
Note that the initial 2KB data are still in TCP ‚A‘s Tx buffer awaiting to be acknowledged by TCP ‚B‘.
TCP ‚B‘ sends an acknowledge segment acknowledging successful reception of 2KB. This is indicated through Ack=3001
5 which means that TCP ‚B‘ expects that the sequence number of the next data byte is 3001 or, in other words, the sequence
number of the last data byte successfully received is 3000. Upon reception of the acknowledge TCP ‚A‘ flushes the Tx buffer.
AP ‚B‘ reads out 2KB with 1 socket call. Application ‚B‘ may have called ‚receive()‘ much earlier and only now
6 TCP ‚B‘ (the socket interface) unblocked the call and returned the chunk of 2KB data.
The 2KB data are deleted from host ‚B‘s Rx buffer.
TCP ‘B’ sends a pure window update segment to signal to TCP ‘A’ that the receive window size is now 10000 again (Rx buffer).
Note that a real TCP implementation would not send a window update if the Rx buffer still has reasonable free capacity.
7 A real TCP implementation would wait for more data to arrive, acknowledge this next data and together with the acknowledge
segment also signal the new receive window size. Only when the Rx buffer’s capacity falls below a threshold it is advisable to
send a TCP segment merely updating the window size.
15/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
6. TCP Flow Control (3/10)
8 AP ‚A‘ writes 4KB into TCP. Shortly after application ‚B‘ writes 2KB into TCP.
9 TCP ‚A‘ sends a chunk of 1500 bytes as one TCP segment. Seq = 3001 = last sequence number + size of last data segment.
TCP ‚B‘ sends a chunk of 1500 bytes as one TCP segment. Seq = 4001 = ISN + 1 (since it is the first TCP segment with data).
Win = 8500 = Rx buffer size – size of data in buffer. Ack = 4501 = sequence number of last received segment + size of data.
10 TCP ‚A‘ deletes the acknowledged 1500 bytes from the Tx buffer (they are successfully received by TCP ‚B‘, even though
not necessarily received by application ‚B‘; thus these 1500 do not need to be kept in the Tx buffer for retransmissions and can
be deleted).
AP ‚A‘ writes another 2KB into TCP ‚A‘. These 2 KB are stored in TCP ‚A‘s Tx buffer along with previous 2.5KB of data.
11 Around the same time AP ‚B‘ reads 1KB of data from its socket. These 1KB are immediately freed from the Rx buffer to make
room for more data from TCP ‚A‘.
TCP ‚B‘ sends a chunk of 500 bytes (Seq = 5501 = last sequence number + data size of last segment). The window update in
12 this segment indicates that the Rx buffer has room for 9500 bytes.
TCP ‚A‘ sends a segment with 1000 bytes. TCP ‚B‘ writes this data into its Rx buffer there joining the previous 500 byte.
13 Shortly after AP ‚B‘ reads 1KB from its socket interface leaving 500 byte of data in the Rx buffer.
TCP ‚A‘ sends a segment with 1000 bytes. TCP ‚B‘ writes this data into its Rx buffer there joining the previous 500 byte.
Shortly after AP ‚B‘ reads 1KB from its socket interface leaving 500 byte of data in the Rx buffer.
14 Thereupon TCP ‚A‘ sends an acknowledgment segment with Ack= 4001 + sizes of last 2 received data segments. This Ack
segments makes TCP ‚B‘ delete the 2KB in its Tx buffer since these are no longer needed for possible retransmissions.
TCP ‚A‘ sends a segment with 1500 bytes. TCP ‚B‘ writes this data into its Rx buffer there joining the previous 500 byte.
15 Shortly after AP ‚B‘ reads 2KB from its socket interface thus emptying the Rx buffer.
16/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
6. TCP Flow Control (4/10)
TCP ‚B‘ sends an acknowledge segment acknowledging successful reception of all data so far received.
This leaves 2KB of unsent data in TCP ‚A‘s Tx buffer.
16 Since no data is in the Rx buffer the receive window size is at its maximum again (Win=10000).
Around the same time AP ‚A‘ reads out 2KB from the Rx buffer.
17 TCP ‚A‘ sends a segment with 1500 bytes. TCP ‚B‘ writes this data into its Rx buffer.
TCP ‚A‘ sends a last data segment with 500 bytes. TCP ‚B‘ writes this data into its Rx buffer there joining the previous 1500
18 bytes. Shortly after that AP ‚B‘ reads out 2KB and thus empties the Rx buffer.
19 TCP ‚B‘ sends an acknowledgment segment that acknowledges all data received from TCP ‚A‘.
AP ‚A‘ is finished with sending data and closes its socket (close()). This causes TCP ‚A‘ to send a FIN segment in order to
close the half-duplex connection ‚A‘‘B‘.
20 TCP ‚B‘ acknowledges this FIN and closes the connection ‚B‘‘A‘ with a FIN.
Note well that FINs also occupy 1 number in the sequence number space. Thus the acknowledgment sent back by TCP ‚B‘
has Ack=previous sequence number in ‚A‘s FIN segment + 1.
Legend:
CTL = Control bits in TCP header (URG, ACK, PSH, RST, SYN, FIN); if a bit is listed its value is 1
Seq = Sequence number
Win = Window size
Ack = Acknowledgement number
Data = Size of application data (TCP payload)
17/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
1 <CTL=SYN, ACK><Ack=1001><Seq=4000><Win=10000>
<CTL=ACK><Ack=4001><Seq=1001><Win=6000>
Tx Buffer Rx Buffer
send(2KB)
2K
2 Rx Buffer Tx Buffer
Tx Buffer Rx Buffer
<CTL=ACK><Ack=4001><Seq=1001><Win=6000>
1.5
2K
<Data=1500B>
3 Rx Buffer Tx Buffer
Tx Buffer Rx Buffer
<CTL=ACK><Ack=4001><Seq=2501><Win=6000>
2K
2K
4 <Data=500B>
Rx Buffer Tx Buffer
Tx Buffer Rx Buffer
2K
5 <CTL=ACK><Ack=3001><Seq=4001><Win=8000>
Rx Buffer <Data=0 (no data)> Tx Buffer
18/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
2K
6
Rx Buffer Tx Buffer
Tx Buffer Rx Buffer
<CTL=ACK><Ack=3001><Seq=4001><Win=10000>
7 <Data=0 (no data)>
Rx Buffer Tx Buffer
Tx Buffer Rx Buffer
send(4KB)
4K
2K
Tx Buffer Rx Buffer
<CTL=ACK><Ack=4001><Seq=3001><Win=6000>
1.5
4K
<Data=1500B>
9 Rx Buffer Tx Buffer
2K
Tx Buffer Rx Buffer
2.5
1.5
<CTL=ACK><Ack=4501><Seq=4001><Win=8500>
10 Rx Buffer Tx Buffer
<Data=1500B>
1.5
2K
19/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
4.5K
send(2KB) receive(1KB)
1K
0.5
11
Rx Buffer Tx Buffer
1.5
2K
Tx Buffer Rx Buffer
4.5K
0.5
12 Rx Buffer Tx Buffer
<CTL=ACK><Ack=4501><Seq=5501><Win=9500>
2K
2K
<Data=500B>
Tx Buffer Rx Buffer
4.5K
<CTL=ACK><Ack=6001><Seq=4501><Win=4000> receive(1KB)
0.5
1K
<Data=1000B>
13 Rx Buffer Tx Buffer
2K
2K
Tx Buffer Rx Buffer
4.5K
0.5
14 Rx Buffer Tx Buffer
<CTL=ACK><Ack=6001><Seq=5501><Win=4000>
2K
<CTL=ACK><Ack=6001><Seq=5501><Win=4000> receive(2KB)
2K
<Data=1500B>
15 Rx Buffer Tx Buffer
2K
20/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
2.0
16 <Data=0 (no data)>
receive(2KB) Rx Buffer Tx Buffer
2K
Tx Buffer Rx Buffer
<CTL=ACK><Ack=6001><Seq=7001><Win=6000>
2.0
1.5
<Data=1500B>
17
Rx Buffer Tx Buffer
Tx Buffer Rx Buffer
<CTL=ACK><Ack=6001><Seq=8501><Win=6000> receive(2KB)
2.0
2K
<Data=500B>
18
Rx Buffer Tx Buffer
Tx Buffer Rx Buffer
<CTL=ACK><Seq=6001><Ack=9001><Win=10000>
19 <Data=0 (no data)>
Rx Buffer Tx Buffer
21/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
6. TCP Flow Control (5/10)
Sliding window (TCP) versus lock-step protocol:
1. Lock-step protocol: sender must wait for Ack before sending next data packet.
2. Sliding window: sender can send (small) burst before waiting for Ack.
Ack
Data
Sender
blocked Data
Ack
Data
Data
Data
Sender Ack
blocked Ack
22/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
6. TCP Flow Control (6/10)
Sliding window mechanism:
Window size, acknowledgments and sequence numbers are byte based, not segment based.
The lower window edge is
incremented as bytes are acknowledged;
it is initialized to ISN+1.
1 2 3 4 5 6 7 8 9 10
Bytes sent and acknowledged Bytes waiting Can be Bytes waiting to be sent
to be sent in send buffer
acknowledged anytime
Time or
sequence number
23/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
6. TCP Flow Control (7/10)
TCP layer and application interworking:
ACK and window size advertisments are „piggy-backed“ onto TCP segments.
TCP sender and receiver „process“ on a host are totally independent of each other in terms
of sequence and acknowledge numbers.
AP ‘A’ TCP ‘A’ TCP ‘B’ AP ‘B’
Sender Receiver
Process Process
AP writes AP reads
8 TCP connection =
7 6
6 2 half-duplex 5
Tx Buffer connections Rx Buffer
Receiver Sender
Process Process
AP reads AP writes
6
5
3 4
Rx Buffer Tx Buffer
24/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
6. TCP Flow Control (8/10)
Delayed acknowledgments for reducing number of segments:
The receiver does not send Acks immediately after the receipt of an (error-free) packet but
waits up to ~200ms/500ms if a packet is in the send buffer (depends on host OS).
If so it „piggybacks“ the Ack onto the transmit packet; if no transmit packet is available the
receiver sends an Ack latest after ~200ms/500ms.
Ack Up to 500ms
Data + ACK
Data
Ack
RTT Round Trip Time (time between sending a packet and receiving the response).
26/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
6. TCP Flow Control (10/10)
Silly window syndrome:
This Problem occurs when the receiver reads out only small amounts of bytes from the receive
buffer and the sender sends relatively fast (and in large chunks). Then the flow-control
mechanism becomes inefficient and ruins TCP performance.
TCP
segment (MSS)
SYN ACK
ACK, Win=4096
Seq=X
ACK=X+512, Win=3584
Seq=X+1024
28/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
7. TCP Error Control (2/6)
Lost packets cause retransmissions triggered by the expiry of the retransmission timer:
Both half-duplex
Host 1 Host 2
connections established
network
temporarily
Retransmission 8 out of order
timer expires (6) 7 (6) SEQ = X+2560
6
6 write (6) to RB MSS: Max. segment
ACK=X+3072 size (=512 in example).
8 Retransmission timer
Retransmission stopped (not expired).
7
timer stopped Retransmission timer
expired.
8 Packet Loss.
(7) (7) SEQ = X+3072
7 RB: Application’s
receive buffer
7 write (7) to RB
ACK=X+3584
29/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
7. TCP Error Control (3/6)
Lost packets / retransmissions, 3 acks („fast retransmit“):
Both half-duplex
Host 1 Host 2
connections established
1 write (1) to RB
(2) (2) SEQ = X+512
2
1 ACK=X+512
Retransmission
timer stopped (3) (3) SEQ = X+1024
3
2
3
(4) (4) SEQ = X+1536
4 ACK=X+512
3
2 4
3
MSS: Max. segment
(5) (5) SEQ = X+2048 size (=512 in example).
5
4 Retransmission timer
3 5 stopped (not expired).
ACK=X+512 4
2 Retransmission timer
3rd ACK 3
expired.
received (2) SEQ = X+512 (retransmission) Packet Loss.
RB: Application’s
2 receive buffer
ACK=X+2560 5
4 write (2) (3) (4) (5) to RB
Retransmission 3
timers stopped (6) (6) SEQ = X+2560
6
6
30/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
7. TCP Error Control (4/6) MSS: Max. segment
Lost packets / retransmissions, lost ack: size (=512 in example).
Both half-duplex Retransmission timer
Host 1 Host 2 stopped (not expired).
connections established
Retransmission timer
expired.
Packet Loss.
(1) (1) SEQ = X RB: Application’s
1
receive buffer
1 write (1) to RB
ACK=X+512
Retransmission
timer stopped
ACK=X+2560
5 write (5) to RB
TCP does not specify how often a specific segment has to be retransmitted (in case of
repeated packet loss of the same segment). Typical values are max. 5 or 8 retransmissions of the
same segment (if more retransmissions necessary connection is closed).
31/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
7. TCP Error Control (5/6)
Retransmission timer (RTO Retransmission Timeout) assessment:
Problem:
a. Single data link RTO = (2 + e)*RTT where e<<2 (RTT rather deterministic).
b. Internet (multiple data links) RTT varies considerably, also during lifetime of a TCP conn.
Solution:
Constantly adjust RTO based on measurement of RTT. For each segment that was sent start a
(retransmission) timer.
RTO = RTT + 4 * D
where D is mean deviation as per Dnew=αDold + (1- α) * |RTT – M|
M = observed RTT value for a specific ACK
α = smoothing factor that determines how much weight is given to the old value (typ. α=7/8)
32/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
7. TCP Error Control (6/6)
TCP Header Checksum Calculation:
TCP uses a 16-bit checksum for the detection of transmission errors (bit errors). The checksum
field is the 16 bit one's complement of the one's complement sum of all 16 bit words in the
pseudo IP header, TCP header and data (the checksum field is initialized with zeroes before
the checksum calculation). By including the IP header in the checksum calculation TCP depends
on IP; it may not run on anything other than IP (this is a violation of the layering principle!).
Pseudo
header
Checksum calculated
over pseudo header,
TCP header and
data
33/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
8. TCP Congestion Control RFC2001 (1/6)
Congestion control is a control (feedback control) mechanism to prevent congestion
(congestion avoidance).
Problem (a): A fast network feeding a low capacity receiver (congestion in receiver).
Problem (b): A slow network feeding a high-capacity receiver (congestion in network).
Aggregate bandwidth
exceeds available
Router 1 bandwidth of output link.
Router 3
Packets dropped
by router 3.
Router 2
34/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
8. TCP Congestion Control RFC2001 (2/6)
Prior to 1988 TCP did not define a congestion control mechanism. TCP stacks just sent as
many TCP segments as the receiver’s buffer could hold (based on advertise window). With the
growth of the Internet router links became congested and thus buffers got full which resulted
in packet loss. Packet loss lead to retransmissions which in turn aggravated the congestion
problem.
It became necessary to avoid congestion in the first place (it’s always better to
avoid a problem in the first place rather than cure it!).
A congestion control window is used in addition to the flow control window (both windows
work in parallel).
The max. number of segments that can be sent = min(Ws, Wc).
Ws = Flow control window
Wc = Congestion control window
1 2 3 4 5 6 7 8 9 10
36/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
8. TCP Congestion Control RFC2001 (4/6)
Normal congestion control window procedure: Slow start phase:
Wc Wc starts opening „slowly“ from 1 segment
segments or kbytes until threshold (SST) reached (exponential
growth of Wc).
Slow start Congestion avoidance Constant phase Transmission number:
phase phase (Wc fully open) 0 1 segment sent
1 2 segments sent (burst)
64
2 4 segments sent (burst)
etc.
Constant phase:
Wc remains constant.
32 Slow start threshold
16
8
4
2 RTT
1 23 45 10 20 30 37
Wc remains constant
Wc
segments/kbytes
64
Wc=17
16
8
4
2 RTT
1 23 45 10 20 30 37
38/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
8. TCP Congestion Control RFC2001 (6/6)
Heavy congestion (no reception of Acks):
Wc
segments/kbytes When retransmission timer expires Wc
is immediately reset to 1. From there
First RTO slow-start restarts normally.
64
Second RTO
16
8
4
2 RTT
1 23 45 10 20 30 37
39/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
9. TCP Persist Timer (1/2)
Problem: Possible deadlock situation in TCP flow control:
The receiver buffer is full thus the receiver sends an ACK with Win=0 (window is closed).
Thereafter data is read from the receiver buffer to the application. The receiver is ready again
to receive data and sends an ACK with Win>0 (window is re-opened). This segment however is
lost. Sender and receiver are now deadlocked: sender waits for the window to be re-opened,
the receiver waits for data.
Packet Loss
Both half-duplex
Host 1 Host 2
connections established
6
5
ACK=X+2048, Win=0 4
3 write (3) to RB
6
5
ACK=X+2048, Win=512 4
40/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
9. TCP Persist Timer (2/2)
Solution: The sender sends a probe segments containing 1 single byte of data to invoke the
receiver to acknowledge previous bytes again.
Persist timer values are ascertained by a „exponential backoff algorithm“ that produces output
values from 5s (min.) to 60 (max.) seconds.
Packet Loss
Both half-duplex
Host 1 Host 2
connections established
6
5
ACK=X+2048, Win=0 4
(1) 3 write (3) to RB
6
5
ACK=X+2048, Win=512 4
ACK=X+2049, Win=2048
Persist
timer stopped
SEQ=X+2049
7
41/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
10. TCP Keepalive Timer – TCP connection supervision (1/2)
The keepalive timer is used to periodically check if TCP connections are still active.
Case 1: TCP connection is still alive (i.e. client is still alive and the connection open).
Both half-duplex
Host 1 (Client) Host 2 (Server)
connections established
Data Segment
Server starts
keepalive timer
ACK=Y
2 hours
2 hours
42/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
10. TCP Keepalive Timer – TCP connection supervision (2/2)
The keepalive timer is used to periodically check if TCP connections are still active.
Case 2: The TCP connection dead.
Both half-duplex
Host 1 (Client) Host 2 (Server)
connections established
Data Segment
ACK=Y
2 hours
Keep Alive Probe 1 Keep alive
No data timer expires
SEQ=X-1
Keep Alive Probe 2 75s Keep alive
No data timer expires
SEQ=X-1
Keep Alive Probe 3 75s Keep alive
No data timer expires
SEQ=X-1
43/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
11. TCP Header Flags (1/2)
PUSH flag:
Problem: Segment size and time of transmit are determined by TCP (flow control, congestion
control, TCP does internal buffering). This is unsuitable for interactive applications like
X Windows, TELNET where data should be sent as soon as possible.
Solution: With PUSH flag data can be expedited:
Sender: Send data immediately without further buffering.
Receiver: When receiving PUSH flag „pushes“ all buffered data to the
application (no further Buffering).
Sender Receiver
Send(data,PUSH)
Data, PUSH=1
Write immediately to
Application process
Ack (no buffering in TCP)
Send(data,PUSH)
Data, PUSH=1
Write immediately to
Ack Application process
(no buffering in TCP)
44/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
11. TCP Header Flags (2/2)
URGENT flag:
This flag allows a sender to place urgent (=important) data into the send stream; the
urgent pointer points to the first byte of urgent data; the receiver can then directly jump to
the processing of that data; example: Ctrl+C abort sequence after other data.
Sender Receiver
Send(data)
Send(data,URGENT)
Data, URGENT=1
TCP header:
Urgent URG=1
data URGPTR=2219
TCP segment
45/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
12. TCP Header Options (1/3)
Window scale option:
Problem: High-speed networks with long delay; network (transmission path) then stores large
amounts of data („long fat pipe“).
E.g. 155Mbps line with 40ms delay 775Kbytes are underway in the transmission path
but the maximum window size is restricted to 65535 bytes (16 bit field).
Solution: With the window scaling option the window size can be increased (exponentially)
thus making the window large enough to „saturate“ the transmission path. TCP Header
Lower window edge
incremented as bytes are acknowledged
initialized to ISN+1
Upper window edge
Send window incremented by number in window field
Byte stream
initialized to ISN+1 + advertised window
T1 SEQ=X; TS=T1
T3
3 store TSrecent = T4
47/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
12. TCP Header Options (3/3)
SACK permitted option (selective retransmissions):
Receiver can request sender to retransmit specific segments (due to loss).
The SACK does not change the meaning of the ACK field; if the SACK option is not supported
by the sender TCP will still function (less optimally though in case of retransmissions).
Host 1 Host 2
SYN; SACK-permitted option SACK-permitted option
1=nop 1=nop kind=4 len=2
SYN ACK
48/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
13. TCP Throughput / performance considerations („goodput“) (1/2)
TCP is delay sensitive!
max. throughput = Ws / RTT [Padhye 98]
This is not an exact formula and it is valid only for average RTT values.
The maximum throughput is bound by the window size Ws and decreases with increased
RTT (=delay). TCP as such is NOT suited for networks with long delays (e.g. satellite and
interplanetary links).
TCP is not independent of the underlying network (as should be the case in theory)!
TCP was designed to run over wired networks (low Bit Error Rate BER, packet loss mostly
due to congestion). TCP performs badly on radio links (high BER, packet loss due to errors).
In case of packet loss on a radio link the sender should try harder instead slowing down.
Slowing down just further decreases the throughput.
On wired networks the sender should slow down in case of packet loss (caused by
congestion) in order to alleviate the problem.
How to handle TCP connection that spans a wired and a radio link?
„Split TCP“: Effectively 2 separate TCP connections interconnected by the base station
passing TCP payload data between the connections. Each of the TCP connections is optimized
for their respective use. Example:
TCP accelerator devices for satellite links.
49/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
13. TCP Throughput / performance considerations („goodput“) (2/2)
TCP adaptations for radio (satellite) links as per RFC2488, RFC3481:
a. Data link protocol:
Use data link protocols that do not do retransmissions and flow control or make sure that
retransmission mechanism and flow control of data link do not negatively affect TCP performance.
b. Path MTU discovery:
Path MTU discovery enables TCP to use maximum sized segments without the cost of
fragmentation/reassembly. Larger segments allow the congestion window to be more rapidly
increased (because slow start mechanism is segment and not byte based).
c. Forward Error Correction:
TCP assumes that packet loss is always due to network congestion and not due to bit
errors. On radio links (satellite, microwave) this is not the case – packet loss is primarily
due to bit errors. The congestion recovery algorithm of TCP is time consuming since it
only gradually recovers from packet loss. The solution is to use FEC to avoid false signals
as much as possible. FEC means that the receiver can detect and correct bit errors in the
data based on a FEC code.
d. Selective ACKs (SACK):
The TCP algorithms „Fast Retransmit“ and „Fast Recovery“ may considerably reduce TCP
throughput (because after a packet loss TCP gingerly probes the network for available
bandwidth; in a radio environment this is the wrong strategy). To overcome this use
selective ACKs (SACK) option to selectively acknowledge segments.
e. Window scale option:
Long delay in fast transmission links limits TCP throughput because large amounts of data are
underway but TCPs sliding window only allows up to 65536 bytes to be „in the air“.
To overcome this use the window scaling option thus increasing the size of the sliding window.
50/51
© Peter R. Egli 2017 Rev. 3.70
TCP - Transmission Control Protocol indigoo.com
14. A last word on „guaranteed delivery“
TCP provides guaranteed transmission between 2 APs. That‘s it. There are
still zillions of reasons why things can fail! Thus it is still up to the application to
cater for application error detection and error handling (see RFC793 chapter 2.6 p.8).
Packet Loss
both half-duplex
Host 1 Host 2
connections established