TCP and UDP Protocols
TCP and UDP Protocols
Purdue e-Pubs
Department of Computer Science Technical
Department of Computer Science
Reports
1995
Lebin Cheng
Bharat Bhargava
Purdue University, [email protected]
Zhiyuan Zhao
Report Number:
95-046
Liu, Xiangning; Cheng, Lebin; Bhargava, Bharat; and Zhao, Zhiyuan, "Experimental Study of TCP and UDP Protocols for Future
Distributed Databases" (1995). Department of Computer Science Technical Reports. Paper 1221.
https://docs.lib.purdue.edu/cstech/1221
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for
additional information.
EXPERIMENTAL STUDY OF TCP AND UDP
PROTOCOLS FOR FUTURE DISTRIDUTED DATABASES
Xiangning Liu
Lebin Cheng
nharat nhargava
Zhiyuan Zhao
CSD-TR-95-046
July 1995
Experimental Study of Tep and UDP Protocols for Future
Distributed Databases
Xiangning Liu, Lebin Cheny, Bharat Bhargavu, and Zhiyuan Zhao
Contents
1 Introduction 2
1.1 Research Overview 3
1.2 Related Research . 4
1.3 Structure of the Paper 5
4 Conclusions 31
Abstract
1
applications and network systems. Of particular relevance to database applications is
our discussion of tile impact of network behavior on the performance and availability
of distributed database systems, especially on databases that are scaled up in multiple
dimensions. We suggest that database systems should adapt to different environments
to enhance efficiency and reliability. We weigh the merit of Tep and UDP for use
wi til different applications on currently-existing networks and the future ATM network.
Abnormal network behavior observed in connection with the transmission of messages of
particular sizes via TCP over an ATM network is disc\ISSed and solutions are proposed.
Fi.nally, we attempt a description of the ATM-based database systems in the future.
J(eywords: Adaptability, availability, performance, scale up, TCP, tJDP/IP, Inter-
net, LAN (Local Area Network), WAN (Wiele Area Network), ATM.
1 Introduction
Future datalla.se applications, such as multimedia applications and digital libraries, will
involve large volumes of data, long distances between distributed sites, large numbers of
sites, and large data units with complex structures [SSU91]. An increase in the number of
data items in a database is termed a volume scale-up, while an increase in the number of
structural scalc-up occurs when data items become complex objects. When databases scale
up in all these dimensions, communication of data items assumes a highly significant role
At present, TCP and UDP are the network protocols most commonly used for data
communication. Most distributed database systems are implemented with Tep or UDP,
and this is likely to remain the case into the foreseeable future. The aspects of network
performance and behavior to be discussed in this paper can therefore form the basis of
further database performance analysis and simulation. Database designers can use these
2
It has frequently been suggested that the computer networks of the future will be based
on the fast ATM (Asynchronous Transfer Mode) network [Vet95]. The current network
structure, with its slow speed and unreliability, will not be adequate to the needs of the
future applications, which will be scaled up in all dimensions. The ATM Forum has been
formed by a large group of companies which have made a commitment to the support of
ATM development. For these reasons, databases in the future are likely to incorporate the
ATM technology. The behavior and construction of such distributed database systems are
Experiments were conducted using TCP (Transmission Control Protocol) and UDP (User
Datagram Protocol) [PSC81, Com91], with objects of various sizes and with several trans-
mission options, on currently-existing LANs and WANs. Communication delay and failure
rates were measured. Experiments were also conducted using 19 NASA image files. On the
basis of these results, we analyzed the respective merits ofTCP and UDP and the various
transmission options.
Local experiments using the ATM indicated that the desired performance level can be
achieved for the transmission of messages smaller than 4 Kbytes. With messages of certain
sizes, however, TCP is not fast enough to utilize the bandwidth provided by the ATM
network at the physical layer. The implementation of the TCP protocol becomes in itself
a performance barrier. The potential advantages of using the ATM to efficiently transmit
In one experiment, we removed the ATM switch, directly connected the two end-point
hosts with fiber optic cable, and compared these results to those of tests in whlch an ATM
switch was used to connect the end points. Little difference was observable between the
results of the two tests. This experiment shows that the ATM switch itself is not responsible
for the performance downgrade. The problem was caused by the nusmatch of the computer
3
system, especially the operating system network software, with the ATM physical1ayer.
Many tools and methods have been develolled to measure various aspects of computer
network performance and availability. Layered refinement [PKL91] permits the effective
measurement of the performance of different layers of TCP lIP by filtering noise. [GoI92]
introduced network measurement of end-to-end performance to prediet Internet behavior.
[CJRSS9] described the overhead entailed in mnning TCP and IP by identifying the normal
paths through the compiled code of TCP implementation. Many efficient communication
Mach [Ras86], Amoeba [TRvS+90], Sprite [OCD+S8], and x-kernel [PHOR90]. The devel-
opment of distributed databases has greatly benefited from the results of these research
efforts.
Intense research activity directed toward the development of the computer networks of
the future is taking place in both industrial and academic settings. [Kun92] discussed
the high-speed local area network in which gigabit bandwidths will change dramatically the
domain offuture computer applications. [Vet95) surveyed the basic concepts and technology
of the ATM. [PR9S] presented a simulation study for the TCP protocol of the problem of
data fiooiling on ATM switches. Solutions such as increasing the ATM switch buffer and
additional congestion control methods were proposed to address the mismatch between
ATM technology and the existing TCP implementation. The poor performance of TCP
has also been reported in a number of studies [Rom93, AM94]. [CL9t1] discussed the TCP
lmffer problem of UNIX 4.3 BSD-Tahoe implementation for ATM communication. [CS92]
discussed some internetworking strategies and issues that arise when ATM networks are
tributed) database system [BR89b, BRS9a] was developed. With RAID, the adaptability
4
of the communication subsystem is addressed through the UDP protocol. Several major
impmvements have since been implemented to achieve high performance (BZM91, MB91].
• Using the efficient UDP protocol, rather than TCP, to provide services of simple
naming, communication surveillance, and long datagram mechanisms for large data
items;
The WANCE tools developed as part of RAID support emulation of distributed trans-
action systems over a WAN [ZB93] by taking advantage of the echo facility on each Internet
host. The RAID project has continued to investigate efficient communication mechanisms
which incorporate both the TCP protocol and the ATM network. These investigations,
which are directed toward the anticipated topologic, metric and volume scale-ups offuture
database applications, are described in this paper.
ATM network. The problem ofTCP implementation on the ATM is described and solutions
are proposed. In Section 4, we draw conclusions and provide directions for future work.
5
2 Data Communication on Existing Networks
When database systems scale up topologicaJly, metrically, or in data size, the impact on
the widely-used Internet with its Tep and vDr/IP protocols must be considered. Our ex-
periments use the Ethernet as the LAN (local area network) physical data transfer medium.
For WAN (wide area data) communlcation, the Internet is the testbed which is an integra-
vides reliable and ordered stream delivery of data [PosSI]. UDP is a simple connectionless
datagram transport protocol that provides peer-ta-peer addressing and fast but. unreliable
and unordered delivery of data [Pos80J. The two protocols are thus best suited to different
Problem Statement
The purpose of these experiments was to measure and compare data communication
performance of TCP and UDP on a LAN for 1) short messages in traditional databases and
2) large messages in future applications such as illgilallibraries. UDP is faster than TCP
but is less reliable. Measurements of message round-trip time and loss rate made in the
course of these investigations will permit an objective assessment of the tradeoff between
speed and reliability. The result of this experiment will be the basis for the discussion of
The development of new applications has brought with it increasing demands for shorter
delays, larger bandwidths, lower packet loss rates, and higher throughput. The capacities
of the physical layer may be incompletely exploited by inefficient high-level software. Ap-
plications which do not choose appropriate and properly-configured protocols will not. make
6
optimal use of network capabilities.
Procedure
All experiments on the local area network were conducted between two Sun Sparc work-
stations raid9. cs. purdue. edu and raid!! . cs. purdue. edu connected with 10 Mbps Eth-
ernet. An extension of the ping programs originally authored by M. Muuss [Ste90] was
used as the experimental vehicle. In this set-up, a sender process running on one machine
sends messages to the echo receiver on another machine. The receiver will not echo the
message back to the sender until it receives the entire message. The found-trip time is then
calculated by the sender using the time interval between the delivery of a message and the
arrival of the echo message. Each experimental trial consisted of 50 message echoes, and
the experiment was repeated twenty times. The experiments were conducted using three
TCP modes and two UDP modes. These modes were as follows:
sage sent. Some applications can send several messages through one TCP connection,
while in other instances the number of messages pMsed through a given connection
is unpred.ictable. A system can maintain only a limited number of simultaneous con-
nections. The closing of a connection may increase the overhead for later messages
but will allow other applications to use these limited connection resources.
• No-delay TCP: Small packets are sent immediately by setting the option TCP-NODELAY.
By default and following the small message avoidance protocol, the TCP software
saves small packets in buffers instead of sending them immecllately [Nag84]. The
TCP --..l~ODELAY option allows users to force a packet to be sent immediately after it
is created.
• TCP or single-connection TCP: Regular TCP messages, by default, are sent within
a single connection without invoking the TCP _NODELAY option.
• Non-aggressive UDP: Large messages are divided into 8 Kbyte packets which are
sent at short intervals by UDP. A for loop was used to ratchet up an integer 2000
7
times, thus generating the interpacket interval used In our LAN experiments. The
• UDP or aggressive UDP: For large messages, each 8 Kbyte packet is sent immediately
following the previous packet.
connectlon TCP) and no-delay transm1ssion of small packets (for No-delay Tep)).
• Measurement of the precise round-trip tlme. The echo receiver awaits arrival of the
entire message before returning it to the sender. The original ping program starts to
echo back when part of the message has arrived.
• Automatic adjustment of the time interval between the individual messages sent out
by the sender. IT the interval (1 second by default) is too short and the messages
are large, the system may break down if a new message is sent before the previous
messages return.
• AvoleUng intermecUate output. Under UDP, large messages are divided into small
packets. If the sender must output the round-trip time after the arrival of each packet,
the accumulated round-trip time measured for a large message will be artificially
increased by the lag of the print statements which illvolves slow device I/O. For
example, as shown in Figure 1, test results with the original program seem to indicate
that UDP was sometimes slower than TCP. When the extra print statements were
• Adding an option to adjust the interpacket interval for non-aggressive UDP data
transmission.
8
Tep"", UDP On LAN
""
., UDP
.........
.------- •......•
.•...
.. .-..
-"--'
, 3 4 S
Mn""ge Size (byl&3)
6 , "
X 10'
UOP ,-
,
t-_ +_ +- .... _ +
~ ...... -+- ...t \.
¥10
,E. t
... I
¥
j"'''+''
~ t ... +
.'
j ....... -... - ... -... "
.,0 ".........
j"
I ,,, t \ t
1'......."'
,-It.
....
to¥'
.......'
~ ; '~,' -'
,.
5
~
~ 0 .,f"
\-
•
__I __ .. '
Figure 2: Round-trip time for small messages in a LAN a) shorter than 512 bytes; b) shorter
than 2 Kbytes.
9
Table I: Large NASA image files
7708 carthl.gif: Very sharp contours, green on blue globe. Res: 160x160
17027 gaLline.gif: Red on black, a whole line of only dots. Res: 450x450
29668 gaLgrccn.gif: Green on green, lots of dots, striations of colors. Res: 384x330
35543 comet.gif: White eye, blue Lill, tail Cades into background. Res: 512x:480
74058 surface.gif: Sloping surface of white and blue, sharp contours, shades. Res: 550x450
80385 jupiter.gif: Huge circle of red and yeUow shades, yellow text on black. Res: 71 Ox765
]04365 hubble.costar.gif: Shades of concentric red, orange, yellow colors; shading, texL. Res: 566x384
114323 clll'th_detail.gif: Blurred contours; text; pink color; black bg.. Res: 1152x864
153634 4gal...red.gif: Bright red, orange; black and white dots; shading. Res: 44lx400
175405 sf.gif: Sharp boundary contours, blue, white and red colors. Res: 500x500
236199 mitwavcl.gif: Orange and white shades, delicate, multicolored ridges. Res: 1024x1024
279786 earth...highres.gif: Blurred contours; text; blue color; hlack bg.; Res: ] 152x864
486430 eclipsel.gif: A huge number of orange and )'ellow sltades. Res: 1280xl024
10
Results
1. Figures 2 a.) and b) show the round-trip times for messa.ges under 512 bytes and 2
2. Figure 3 provides results for larger messages of between 1 and 80 Kbytes in size. Since
the TCP..NODELAY option would have no impact on these results, we did not repeat
the test with th.is option involved.
3. Figure 4 shows the results ol)tained from sending very large messages of between 50
and 1000 Kbytes.
4. Figure 5 provides the results of sending large multimedia NASA files. The sizes and
TCP .... UDP I", 10'1l0m.... lIO'I on LAN UDP lou lo,lor;.lTIOtIIlllH on !.AN
,,•
••
... - .... Indvidu.. ~Q'InOdj"" TCP
<>-------<> TCP •
....... UDPmnmgor-po"'"'!lllIO
•
• ' ....• UDP ,..
, •
,
l
<\l:2.S
,
..... .. .' . ...... .. 2" •• , • !g ,
,. : '
•
•
.. "..•.
'
V
,
•'"
'. II.....-
-' \,
• _.2'-
..
.. ,.' .....
,
•• 0.'
.•.•.. •
'-
°ol-''---~-,,----:,:----,.:--~.---:.---,---:,:--.:----:. '. , 3 • S
M..."!!e 50.. (byIo'l
6 , • •
"'_llgoSOze(br1"j ~ I.' Xl.'
Figure 3: a) Rou nd- trip time faT large messages; b) UDP packet loss rate for large messages
Discussion
For small messages, the experiments show that UDP is the most efficient mechanism, with
almost nO message loss noted. In view of the absence of message loss, TCP need undertake
11
LlDPI"., lo'varv1o'110 "",",,1j'OO "" LAN
..'!. ll. ",_.
,
!'. " "
... - - - + In<tilOdlllll-oomocti<n Tep
!"".•"" \ .-/. - \/
,. 0----0
X-'---'I<
Tep
tl«l-Q!I9~LlDP
it" -"·;l . .i.
• -, ... _--'. UDP •,.
~ ,
1 ....... ,. LlDP
J
.~,.5
"] ,
o.
'. , 4 e e 10 ~
'. , • • •
M.. mgo~... (byIMJ .10' MM..... Sizo (byIMJ "
Figure 4: a) Round-trip time for very large messagesj b) UDP packet loss rate for very large
messages
+--- ....
TCP.,.. UDP lorNASAimOiOlIlH O!\LAtl
,,
/
. . -- - --- ..• UDP loa I", NASA im"~ !I•• on LAN
LlDP"'"",l•
• . .•
,,
Non-<l\lIl'~.UDP 1= mt•
,. .......-_.-
x_· x ,, , • -.....-.. .......
.' .... - .. ,,
, , ,,
, ,
,
..-
,
,
'. .. M
-
1
..~1f"._~,_.;:,:-.:.:~.:-:::.-::.~::::.~::-;.:~::1l:.--_.__._.-
1~ ~ ~S J
M"'llgoSizo(tP{looo)
35 4 ~5
Xl0'
•
5
" .,
,. •
1.5 2
1.10....."
2.5
Si,.
3
(brtn)
J.S 4.5 5
xl0'
Figure 5: a) Round-trip time for large files; b) UDP packet loss rate for large files
12
no retransmission; its performance here is therefore comparable to that of UDP. Further-
more, TCP-.NODELAY has no effect when only one small message is sent. The effects of
TCP -.NODELAY will be discussed further in Section 3.2. Connection establishment has
been noted to add significant overhead for the sending of small messages. Therefore, the
It is clear that, while UOP is much faster than other mechanisms for messages of larger
size, its reliability gradually declines under these circumstances. The Ethernet employs the
be genl3ratecl for each message, the chance of collisions and out-of-order delivery becomes
signiflcantly higher. For this reason, TCP must retransmit large amounts of data when
The extent of misordered data and data loss increases significantly when the message size
exceeds 200 Kbytes. The overhead caused by connection re-establishment, in thls case, is
negligible.
For very large messages, non-aggressive UDP transm..ission can reduce packet loss signif-
icantly while causing little increase in speed-related overhead. This situation is depicted in
for(i=O;i<2000;i++)
were used to generate the interval between each UDP packet. Experiments show that
intervals longer than these may increase transmission delays with little further improvement
in reliability. Shorter intervals, on the other hand, may not be enough to bring significant
improvement. The number of iterations is system-dependent.
13
2.2 Wide Area Network (WAN) Experiments
Problem Statement
These experiments were intended to investigate the questions discussed above within the
context of a wide area network (WAN). Based on the result, database topological and metric
Procedure
TIle ping program as described in Section 2.1 was again employed for these experiments.
An echo server was installed on a Sun Spare workstation at Stanford University; unlike the
common echo server on port 7, this server has the capacity to handle the large messages
used for these experiments. The input data from the LAN experiments was used llcrc again.
Results
1. Figure 6 shows round-trip time for messages under 1 Kbytes. The UDP packet loss
was small (less than 7%), and the TCP -l~ODELAY option had no impact in this case.
2. Figure 7 provides results obtained from sending large messages of between 1 and 80
Kbytes.
3. Figure 8 shows the results of sending very large messages of between 50 and 1000
Kbytes. The overhead arising from connection time is negligible in this case, but
there is an unacceptably high UDP loss rate. It therefore proved impossible to gather
4. Figure 9 provides results obtained from sending the same large NASA multimedia
Discussion
14
TCPv. UDP lor ..nolIm ........ o:l WAN
IndMduo/-emn&<lion TCP,'
0.
... ,t--...
, ,
UDP loa lonmDII m... og... o:l WAN
.
,,{"
•.. ,..• UDPI_rnlOi
\
...'
{
~
700 •
: ..
,,'1
:\,' '.,.~ •
t .. , {, I
11; t +J\" , , I
", " , I
....t l " .. - ..
,I ,(
i "-:.
~._<> No-dolDyTCP
=
.. .. "" --
,~
.'
,o •
.....
..... !l..
.•.... •
.•/·-,c·'"" A "
•
.i',
j
R.. ..,.
~
,~
......:::: j ..
--., ...•...•.. - ..
--.:.
Figure 6: a) Round-trip time, b) UDP packet loss rate for small messages over WAN
.-...• UDP
0
0
.. .. .... ..........
' , . 2
-
:I 4 5 e
- .
M..."S.Siz.(byIn)
Figure 7: a) Round-trip time, b) UDP packet loss rate for large messages OYer WAN
15
UOF ~ lor""",lorgom"..og.. onWNl
.......... -.•.........•- ,.-- --.-
---- ..- -..•....... , .
8 0-----0 Tep
••:---c,:----;---,,:---,.:---"~.--__:,,' '"\--;-----c,--;--,;-----;;-''!<---::---:
05 2. U 1~ 3 U 4 4~
M......II"Sim(by\&o) "10' MoaogoSizo (byl.. j "10'
Figure 8: a) Round-trip time for very large messages; b) UDP message lost rate for very
large messages
,. .,0' TCF"'UDFI"'~ASAim.lI"fil
. .. ""WNl
'00 !l'-
UDP lOll lor ~ASA im"ll"~'" fNMWNl
,..•....•..-.,'.- -- --- .. -- - -•
0-----<1 TCF
, " ..... ···_x uOP
.- --------. UOPloo.rlIlO
••• 00 ,
Figure 9: a) Round-trip time for NASA image files on WANj b) UDP packot loss rate for
16
These results from the LAN and WAN experiments lead to the following conclusions:
pecially when databases scale up topologically and metrically. Despite its speed ad-
vantages, the reliability of UDP is inadequate for remote communication and large
data transmission. Data loss and out-of-order messages may cause transaction failure
from which the database may have difficulty recoverying. When databases scale up on
topological and metric dimension, the problem becomes more severe. A reliable data
On a Local Area Network (LAN), the efficiency of TCP is usually adequate for the
transmission of small messages and is only slightly less than that of UDP.
• UDP should be used only for time-critical applications, such as video services and
digital libraries, that may tolerate a certain amount of data loss. Since the aggressive
sending of continuous UDP packets will result in heavy data losses, transmissions
should instead be spaced at short intervals. Experimental results using this method
on a LAN show a significant reduction in losses, and, although the speed is somewhat
decreased, it is still much faster than TCP. Unfortunately, this method does little to
improve UDP reliability in a WAN context.
• Applications which act immediately upon partially-received data can also employ
UDP. For example, an image can be partially or roughly displayed to a user on the
basis or partial data received from a remote site, with the image completed as the
remaining data arrives. Lost packets can be retransmitted for later arrival. The
Database scale-up on topological, metric, and voll1me dimensions requires fast and re-
17
supporting both circuit and packet-switching services. ATM cells, or fLXed-Iength packets,
form the basic data units. An ATM cell, as defined by lTV (formerly CCITT) recommenda-
tion 1.361, contains 48 octets of data and 5 octets of control information_ This fixed-length
cell and the early binding of routing information during the connection setup make the
ATM suitable for high-speed data communication. Its bandwidth reservation and graceful
multiplexing also reuder it suitable for multi-media traffic. Four major benefits of the ATM
discussed in [KW95] are scalability, statistical multiplexing, traffic integration, and network
simplicity. These characteristics are the very things required by database scale-up.
Many applications will benefit from the flexibility in switching and high speed provided by
the ATM. For example, consider the case of an oil company with several computing centers
generating large numerical simulations of drill sites. Engineers would find it useful to display
of x-ray images involving several gigabits of information. In many cases, real-time access
is required for collaborative diagnosis between physicians in different locations. This large
quantity of data can only be delivered in real time by networks running at multimegabit
speeds. Ideally, the collaborating physicians may wish to set up video conference connections
for their discussion. The ATM provides flexible multiplexing technology to meet these needs_
The ATM was originally designed for the B-ISDN (Broadband Integrated Services Digital
Network), a network intended to integrate time critical and data-heavy graphic, imaging,
video, and audio services. As database systems scale up in multiple dimensions, tlley IlluSt
also involve applications of this sort, for which the ATM willlJe the most suitable IJlatform.
In order to take full advantage of an ATM network, adaptation lllllSt be made to match the
ATM with the popular Tep lIP and UDP protocols upon which most existing applications
18
of TCP lIP and UDP when applYIng to an ATM network. The experiment was designed to
observe the impact of this change in the lower layer on the upper-level database systems.
We expected that the ATM would provide fast and reliable data transmission for the ap-
plications. Due to the resource limitation, we can only conduct experiments on an ATM
LAN. The scale-up experiments are only conducted on the data volume dimension. But we
still can use the results fOT the (liscussion of scale-up on topological and metric dimension.
Our experiments involved two Sun Microsystems SPARCstation IPCs, called percival
and fibonacci, running SunDS 4.1.3 with the 4.3 BSO-Tahoe TCP implementation. As
shown in Figure 10, each host connects to a Fore System ASX-lOO ATM switch via 100
Mbp/s multi-mode nber cables, which supplement the conventional 10 Mbp/s Ethernet.
Both hosts use Fore System SBA-200 ATM adapter cards, the driver of which supports
ATM Adaptation Layer 5 (ALL5). The Fore ATM switch comes with a dedicated processor
ASX-IOO
ATM Swilch
,-----
100 Mbfs fibl:r lillk 100 Mb/. fibl:r link
'-----
SBA-200 SBA_200
A<bplcr card Ado3plcr card
[ICrcival fibollolCCi
(SunOS 4.1.3) (SunOS 4.1.3)
10 Mb/s Elhcrn.l
19
3.2.2 Experiment I: TCP Implementation on ATM
Problem Statement
These experiments are intended to measure the performance of the Tep and U0 P pro-
Locols on an ATM network and to investigate the relative merits of an ATM network.
Compared to the conventional Ethernet, the message transfer delay on an ATM network,
with its high-speed fiber cable, is greatly reduced. TCP and UDP were developed before
the existence of the ATM, and -problems may therefore arise from the combination of new
•
~ :: ~O,?
i'0.6.
= :.
.,!, ;: :: '
Is:-ooo~J ~i oooooo~oooooooooooooooooo ~OOO~O
Ethernet
In the experiments shown in Figure 11-13,50 messages were sent repeatedly through both
the ATM and Ethernet interfaces using the ping program described in Section 2.1. Figure
11 illustrates the results for the l.ransmlssion of 1 Kbyte data. In this case, the ATM proved
faster than the Ethernet for both UDP and TCP. Figure 12 shows the abnormal behavior
encountered when sending 8 Kbyte messages with TCP on the ATM. Round trip times in
20
rcp end UDP round-lnp Ii...... on ATM end Ethernel
~I(
. .•• •....
:
••
•
'.'.
".
••
' ..•
'
'.'.
• .•••.
1('\1(
. .. •
••
•
' •
•
.
, 1(•••••••••
0- .. ···_·0
rep en ATM
rep en Elhernot
)(. ····· __ ·-x UDPenATM
+- .....__ .-+ UDP en Elhorne1
00 ,
~)(X~)(XX~XXXXXXXXXkXXkkXkXkkXkkkkkkkkxxxxxxxxxxxx
Figure 12: Round-trip time for 8 Kbyte message with TCP and UDP using ATM and
Ethernet
~<
.¥,i ,
,
;) ,
Figure 13: Round-trip time for 100 Kbyte message with TCP using ATM and Ethernet
21
this context fluctuated widely and were generally long. In contrast, Figure 13 reveals stable
and short round-trip times for messages sent over the ATM network and long and oscillating
round-trip times on the Ethernet. Figure 14 provides experimental results for sending fJles
of size discussed ill Section 2.1. While the average round-trip times on the ATM is long
and almost constant for small fJles, the round-trip times measured for large files fits well
with predictions. These large-file times are shorter on the ATM than on the EthernC!t and
!~..
_ ";lhTCP""ATM
·K
K'-- --····K ";!hUDP""ATM
'"
~GO •
~
~.- - "
l'-·.
! 50 , ~..'-.~ x
~ 40 ;
o .
" M
,
20 '
•
r\.,~ 11 _K·· •.,.••• K •• '· " •.••••• K •• ···'·····,···,·· .. ····-
Figure 14: NASE image files on ATM a) Round-trip time; h) UDP packet loss rate
Discussion
The origin of the abnormal behavior noted above lies in the implementation of the
TCP lIP protocol. Both [CL94] and [LMK90] provide detailed discussions on this ques-
tion. Those factors which are most pertinent to our experiments are:
22
• The small packet avoidance algorithm; and
By default, the sender and receiver window sizes are 16 Kbyte. The MTU of the Ethernet
is 1500 bytes, while the MTU of the ATM is 9188 bytes. Accordingly, the TCP software
sets the MSS to 1460 bytes for the Ethernet and 9148 bytes for the ATM. The MSS for
TCP is calculated by the formula MSS=MTU-40, where 40 is the TPC and IF header size.
TCP Buffers are implemented as chains of mbufs [LMK90] in the Sun OS 4.1.3. Each
mbuf can store either up to 112 bytes of data or a llOinter to a 1 Kbyte data page. The
TCP output routine allocates mbufs to contain the TCP headers and the data. As long as
4 Kbyte of data is available in the buffer, the operating system invokes network routines to
transmit a packet.
TCP will delay the sending of small packets according to the Small packet avoidance
algorithm [Nag84]. Thus, TCP will send packets only under the following conditions:
2. when min(D, U) ~ M 5S, where D is the size of the data to be sent and U is the usable
window size. The latter is defined as the receiver window size minus the amount of
outstanding data sent by the sender for which no acknowledgement has been received;
According to the Silly window syndrome avoidance algorithm [ClaS2, Jac88] in Tep, the
receiver should avoid small window advertisements as a waste of bandwidth. The receiver
TCP will send an ACK with a window update only under the following conditions:
23
2. when R 2: 35% receiver window size;
Based on the previous discussion, when sending a 8 Kbyte message, TCP first places 4
Kbyte of data in the mbufs. This data is transmitted immediately, since the connection is
initially i<Ue. The remaining 4 Kbyte of data is then stored in the buffer. Tilis second data
instalhnent cannot be be sent before the ACK of the first 4 Kbyte message is returned,
since 4 J( < MSS = 9148 and 4 J( < half of the 71wx_sndwnd (which is 16 Kbyte). On
the receiver's side, when the first 4 Kbyte are received, none of the conditions for sending
an ACK have been met unless the lcp_fasttimo() timer has expired. The ACK is therefore
returned to the sender with an average delay of 100 milliseconds. As a result, the sender
must wait for the ACK for an average of 100 milliseconds before the next packet can be
sent. Gur results indicates that this delay was uniformly distributed as U[0,200J. Because
the same situation arose when echoed messages were returned to the sender, the overall
average round trip time is 200 milliseconds when sending 8 Kbyte messages.
In the experiment shown in Figure 12, 50 messages were sent at one-second intervals. This
The two routines were separately tested and it was found that the sig..handler response
result, the round-trip time measured in the experiment seelllS to faU into a cycle, declining
by about 10 milliseconds until reaching the lower limit, and then initiating another cycle
Based on our previous discussion, we can predict the range of message size that may
24
The normal ranges are:
We conducted experiments with messages of size 4K, 4K+l; 12K-I, 12K; 16K, 16K+l;
24K-l, 24K; and 28K, 28K +1. The test results confirm the predicted behavior. For
example, Figure 15 illnstrates the results of sending 12K-l byte data and 12 Kbyte data.
The delay experienced when sending 12K-l byte messages is dramatically longer than the
delay incurred when sending a 12 Kbyte message. Since 1 Kbyte and 100 Kbyte messages
are not in the range in which we predicted abnormalities, as shown in Figures 11 and 13,
the delay we observed is normal. This explanation also justifies the abnormalities shown
in Figure 14, in that round-trip times are exceedlngly large for file size in out predicted
abnormal ranges.
'..• •
TCP found-trip lime onATM lor 12k and 123<- I byla me.....ll".
"" •
,
•
'.' .• .
•
'
••
• ..'•,
OOOOOOOOOOOOOOoooooo-o~ooooo
"" , 10 IS 20 25
R.""liliOll
30 35 40 45 50
A similar abnormality can also be observed on the Ethernet. Given MTU = 1500 for the
Ethernet, we find that the abnormal ranges are
25
The normal ranges are
For messages of 1 to 4 Kbyte in size, the network behavior of the Ethernet is normal.
Abnormalities are found in the ranges from 4K+1 to t1K+M-l byte. Experimental results
obtained have been consistent with this prediction. While the abnormal range is 8 Kbyte
and the normal range is 4 Kbyte on the ATM, the length of both the abnormal and normal
ranges on the Ethernet are 1460 bytes. Therefore, if all data are uniformly distributed, the
probability of abnormal behavior on the ATM is approximately 66%, versus 50% on the
ATM network will be even worse compared with the Ethernet. In addition, since Ethernet
speeds are low in comparison with the ATM network, the impact of abnormal behavior on
Problem Statement
The Tep ...NODELAY option, as discussed in Section 2.1 permits immediate delivery of
small messages with TCP. The abnormal behavior described in the previous section was
caused by the delayed transmission of packets waiting for an ACK. This problem may be
Figure 16 provides the results from sending messages 8 Kbytes and 100 Kbytes in size,
with the TCP-NODELAY set on both the ATM and the Ethernet. A comparison with
Figures 12 and 13 shows that the abnormal behavior (lisappears when the TCP...NODELAY
is set. Figures 17 and 18 provide additional test results with the TCP_NODELAY set.
Discussion
26
TCP ro~<l-'rip 'me <>'1 ATM..,d Elhemel";fl TCP..NODELAY CIplial
•,
, .. .... ...• 100Kt>Il.. ..,Am
0- •••• "'0 100Kt>Iln..,Eflorno'
.. ----..... 8KbyleoonAm
ilOO i~oo
0-- - -- --0 6Kbyleo""Elhomel
1 1
~ !=
••
F
•
~:IOO 1
]~50
150
!: ,. ,l"\/U,..oolo. 00 00 \
...."".."
oooooooooOOOOO<l 0°00-"000000000000000000000000'
.
'00 1
W
.-. i\ .
'. W 15 ~ ~
R"""lim
~ ~ ~ ~ 50 ,OO.~.---.,-c,".-c,",-.,~.-~",-----.oo'---c,.,~c.".-c.",---,~
Ilopo'lm
Simply setting the TCP -NODELAY option when transmitting messages in the abnormal
range is the easiest way to avoid abnormal delays, without modification of the operat-
ing system. This method also addresses the buffering problem described in [CL94]. We
repeated the same test as [CL94] in the same experiment environment except that the
TCP..NODELAY option was set in our tests. In comparison with the result shown in Table
2 from [CL94], Table 3 indicates that the abnormal behavior caused by the small packet
avoidance algorithm disappears. One may observe that there are still cases in which the
TCP tnrough}Hlt is abnormal in Table 3. This abnormal behavior, however, is due to the
silly window syndrome avoidance algorithm. According to the silly window syndrome avoid-
ance algorithm discussed in Section 3.2, an ACK will be delayed for 200 milliseconds unless
the amount of data received is greater than either 2*MSS or 35% of the receiver buffer size.
On an ATM network where MSS = 9148, when the sender buffer size is only 16 Kbytes,
the maximum size of the TCP packets sent is less than 2 * MSS. Therefore, neither of the
conditions for an immediate ACK is met when the receiver buffer 1s larger than 46 Kbytes.
Consequently, after every delivery of 16 Kbyte data, the sender halts the transmission as
the buffer filled up waiting for an ACK, while the receiver delays the ACK untillts timer
expires. Effectively, the TCP throughput is downgraded substantially because of tIle 200
27
Setting the TCP_NODELAY option to avoid the abnormality, however, is not costless.
Compared with Table 2, Table 3 indicates that the average throughput In the normal
cases is lower in our tests. Delivery of small packets has higher overhead and thus waste
is sacrificed to avoId abnormality for sendlng small packets. One should choose to set the
TCP_NODELAY option by carefully investigate the sender and receiver buffer size, the
MTU of the network, and more important, the applications existing on the network.
Table 3: TCP buffer size and mean throughput with TCP ..NODELAY option
28
3.2.4 Experiment III: Transmission of Data with Larger Buffer Size
Problem Statement
For large messages, enlarging the buffer size may also improve network performance. The
ATM network used in these experiments has an MTU =9188j this is, larger than that of
the Ethernet (MTU=1500). The experimental results should show that using larger buffers
Figure 17 shows results for the transmission of 100 Kbyte messages with buffer sizes of 32
Kbytes and 16 Kbytes. Figure 18 provides results for large-size files. In these experiments,
Figure 17: Round-trip time for 100 Kbyte messages on ATM with 32 Kbyte buffer and the
TCP-NODELAY option
Discussion
The experimental results confirm our assumption a larger buffer size enhance ATM per-
formance. Figures 17 and 18 indicate that ATM performance has been improved by ap-
29
Large Iles on ATM ~r.s Ethernet
"00 0-,-·-,-0 TCP on Ethernel. 32l< buller
Figure 18: Round-trip time for NASA image files on ATM with 32 Kbyte buffer and the
TCP_NODELAYoption
proximately 30% with 32 Kbyte buffers in comparison with 16 Kbyte buffers for messages
larger than 100 Kbytes. Furthermore, we observed that the performance of the Ethernet
was not improved as significantly as that of the ATM when the buffer size changes from 16
Kbytes to 32 Kbytes; this improvement was only about. 20%. It appears that a buffer of 16
Kbytcs is sufficient to utilize the slower Ethernet, although it is insufficient for the ATM
network.
The results of OUT investigations indicate that the performance of UDP on the ATM IS
superior to its performance on the Ethernet. While loss rates on the ATM are about 30%
lower, the unreliability of UDP still recommends against its use as the datallase commu-
nication protocol in most cases. The problems discussed in Section 2 with regard to the
Internet apply as well to the ATM. However, while non-aggressive UDP packet delivery is
not effective on an Internet WAN, its effectiveness on an ATM WAN is still unresolved,
30
Based on our investigation and analysis of the TCP lIP protocol and implementation, we
have the following summary conclusions:
• Problems with the use of TCP have been discussed in [CL94, PR95, Rom93, AM94].
Our findings also indicate that implementation of TCP might fail to fully utilize the
capabilities of an ATM network. The TCP protocol and its implementation must
undergo fmther evaluation bGfore it becomes the protocol of choice for the future
network. In particular, modifications must be made to the small packet and silly
• Users may employ a larger sender and receiver buffer. This should improve perfor-
• The advantages of ATM is more obvious in a WAN environment since a ATM switch
route messages much faster than current routers. Reserved channel capacity can re-
duce the loss of cells and improve reliability. Therefore, the ATM should be the choice
for tIle database topological and metric scale-up. As shown in both oUI local exper-
iments and other research [PR95, Rom93, AM94], however, current TCP buffering
suitable for the ATM. Further experimental study is necessary on ATM technologies
in a WAN environment.
4 Conclusions
Experiments were conducted on both the conventional Ethernet network and the ATM
network with the TCP and UDP lIP protocols in order to generate findings of use to devel-
31
opers of distributed datal)ases. The relative merits of each scenario were assessed on the
basis of these experimental data. In general, it was found that the TCP protocol is very
slow but reliable for large data messages and transmission between remote siLes. The UDP
protocol with a few adjustments, however, is found useful for time critical applications that
can tolerate a certain amount of data loss. Abnormal behaviors were observed during the
performance test ofTCP and unp on an ATM LAN. We found that higher capacity of an
ATM network does not necessarily mean enhanced performance to higher level protocols,
if higher level protocols are not configured appropriately to take advantage of the ATM
technology. Careful investigation is advisable for distributed database developer choose the
Previous research [BZM91, MB91] suggested the use of unp to send small packets in a
LAN. In a WAN environment, however, TCP is preferable, especially for large messages.
Therefore, TCP is the choice of data communication for database scale-up. Data trans-
mission in a WAN usually suffers high packet loss rate that common applications can not
tolerate. Despite tho low speed, TCP's guaranteed reliability makes it the protocol of choice
for common applications. But exception can be found in applications such as video con-
ferencing, which is time critical but is not affected by data loss as seriously. Using UDP
datagrams, instead of TCP packets, to transmit data can provide the required high speed
with acceptable qualities to applications of these kind. Furthermore, we found in our ex-
periments that a small increase in the time interval between each UDP datagram during
the transmission may decrease the data loss rate significantly. Adjustments, such as the
insertion of a short for loop between the transmission of each datagram, are found quite
effective to improve the reliability of UDP with little negative impacts on the speed. It is
worth mentioning that such adjustments are machine and application dependent. Devel-
opers of a distributed database system must balance their need for reliability and speed in
32
In general, an ATM network provides much higher transmission throughput compared
to the conventional Ethernet. But higher level protocols must be adjusted in order to
take advantage of such improvement. The TCP /IP implementation of existing systems is
assuming a low speed physical network such as Ethernet, and thus may not be suitable the
high throughput ATM. Abnormal behaviors are experienced during our experiment. Simple
adjustment, such as setting the TCP-NOD ELAY option, works to address this problem. But
the TCP_NODELAY option should be taken as a quick fix rather than an optimal solution
because it downgrades the throughput in the average cases. Choosing larger buffers on
both the sender and receiver's side address this issue while generating higher throughputs.
But buffers are allocated in the host machine's memory, which is usually limited resource
shared by multiple users/applications. It is not possible for a real time system to allocate
arbitrarily large buffers to optimize performance for each individual application. While
choosing large buffers works in our experiment to optimize data transmission throughput
between two test applications on a dedicated ATM network, how well this solution works
Despite the abnormalities, our experiments indicate that a high capacity ATM network
will enhance network performance significantly if it is fully utilized. In the normal cases,
the ATM network over-performs the Ethernet for both TCP and UDP. Considering its
high capacity and the potential to grow, the ATM network is ideal for the next generation
standard of the ATM exists. Neither the hardware nor the software of the ATM
Because the issues discussed in this research are machine and application dependent,
the experiment results and solutions may not apply to different systems. Research
for a widely applicable solution is yet impossible because the lack of standards in the
33
field of ATM.
• The existing network protocols, such as TCP fIP, should undergo evolution in order
to adapt to the improving physical network. Higher level applications, such as the
How TCP lIP can take full advantage of the higher capacity provided by the ATM
• Performance study of a real world ATM network may reveal issues that are left out
dedicated machines enjoy an otherwise idle ATM network. Studies have shown that
an ATM network may behave quite differently when it is saturated and is forced to
drop cells [PR95, Rom93, AM94J. It is still an open issue on how to optimize the
• Studies of an ATM WAN may bring interesting findings. Our performance test on
the ATM network is restricted to a LAN because of our limited resource. Since ATM
is intended to carry data traffics for both LAN and WAN. An ATM network of large
ber of remote sites simultaneously. How well the switching mechanism of ATM could
support group communication of this kind is now an open question.
Acknowledgement
TIle authors would like to thank Professor Douglas E. Comer for providing support for our
experiments. The authors are also very grateful to John Chueng-Hsien Lin for answering
many of our questions. Melliyal Annamalai selected NASA image files for our experiments.
Vue Zhuge at Stanford helped us with the WAN experiments. We would like to acknowledge
Rachel Ramadhyani for her input regarding the presentation of this paper.
34
References
[AM94] Aboul-Magd. TCP Performance in ATM Networks Employing End-ta-End Flow Con-
trol. Technical Reporl 94-0442, ATM Forum Contribution, May 1994.
[BRSga] B. Bhargava and J. Riedl. A Model for Adaptable Systems for Transaction Processing.
IEEE Transacllons on [(nowledge and Data Engineering, 1(4), December 1989.
[BR89b] B. Bhargava and J. Riedl. The RAID Distributed Database System. IEEE Transactions
on Softwarc Engineering, 1.5(6), June 1989.
[CheSS] D. R. Cheriton. The V distributed system. Comlllunications 0/ the ACM, 31(3), March
1988.
[CL94] D. E. Comer and J. C. H. Lin. TCP Buffering and Performance over an ATM Network.
Journal of internetworking: Research and Expen'ence, 10(4):70-80, October 1994.
[Cla82] D. D. Clark. Window and Acknoledgement Strategy in TCP. Request for Comments,
(RFC-813), July 1982.
[Com91] D. E. Comer. lntcrllctworking with TCP/IP Vol I: Principles, Protocols, and Architec-
ture, volume I. Prentice Hall, Inc, Englewood Cliffs, NJ, second edition, 1991.
[CS92] J. D. Cavanaugh and T. J. Salo. Inlernetworking with ATM WANs. Technical report,
Minnesota Supersomputer Center, Inc, December 1992.
[GoI92] R. A. Golding. End-ta-end performance prediction for the Internet. Technical Report
UCSC-CRL-92-26, University of California at Santa Cruz, June 1992.
(Kun92] H. T. Kung. GigalJit Local Area Networks: A Systems Perspective. IEEE Communica-
tion Magazine, 30(4):79-89, April 1992.
35
[KW95] B. G. Kim and P. Wang. ATM Network: Goals and Challenges. Communications of
the ACM, 38(2):39-44, February 1995.
[LMK90] S. J. Leffler, M. K. McKusick, and M. J. Karels. The Design and Implementation of the
4.SBSD UNIX Operating System. Addison-Wesley, Inc, Reading, MA, 1990.
[Nag8'1] J. Nagle. Congestion Control in IP ITCP Internetworks. Request for Comments, (RFC-
896), January 1984.
(PHOR90] L. Peterson, N. Hutchinson, S. O'Malley, and H. Rao. The x-kernel: A Platform for
Accessing Internet Resources. iEEE Computer, 23(5):23-33, May 1990.
(PKL91] C. Pu, F. Korz, and R. C. Lehman. A Measurement Methodology for Wide Area
lnternets. Technical Report CUCS-044-90, Columbia University, March 1991.
[PosSO] J. B. Postel. User Datagram Protocol. Request for Commenls, (RFC-768), August 1980.
[PR95] M. PerioIT and K. Reiss. Improvements to TCP Performance in High-speed ATM Net-
works. Communications of the ACM, 3S(2):90-100, February 1995.
(PSC81] J. B. Postel, C. A. Sunshine, and D. Cohen. The ARPA Internet Protocol. Computer
Nflworks, 5(4):261-271, July 1981.
[Ras86] R. F. Rashid. Threads ora New System. Unix Rcvic.lV, '1(8):37----49, August 1986.
[Rom93] A. Romanow. Preliminary Report of Performance Results for TCP over ATM with
Congestion. Technical report, July 1993.
36
[TRvS+90] A. S. Tanenbaum, R. V. Renesse, H. van Staveren, G. J. Sharp, S. J. Mullender,
J. Jansen, and G. van Rossum. Experiences with the Amoeba Distributed Operating
System. Communications of the ACM, 33(12):46-63, December 1990.
[ZB93] Y. Zhang and B. Bhargava. WANCE: A Wide Area Network Communication Emulation
System. In Proceedings of the IEEE Workshop on Advances in Parallel and Distributed
Systems, pages 40-45, Princeton, New Jersey, October 1993.
37