0% found this document useful (0 votes)
54 views

TCP and UDP Protocols

This document summarizes an experimental study of TCP and UDP protocols for data communication on existing networks and future ATM networks. Experiments were conducted measuring performance of TCP and UDP on LANs and WANs. Additional experiments tested TCP and UDP performance over ATM networks, finding that TCP implementation is not fast enough to fully utilize high bandwidth of ATM for certain message sizes. The document discusses implications for future distributed database systems using ATM networks.

Uploaded by

Mahnoor Mughal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

TCP and UDP Protocols

This document summarizes an experimental study of TCP and UDP protocols for data communication on existing networks and future ATM networks. Experiments were conducted measuring performance of TCP and UDP on LANs and WANs. Additional experiments tested TCP and UDP performance over ATM networks, finding that TCP implementation is not fast enough to fully utilize high bandwidth of ATM for certain message sizes. The document discusses implications for future distributed database systems using ATM networks.

Uploaded by

Mahnoor Mughal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Purdue University

Purdue e-Pubs
Department of Computer Science Technical
Department of Computer Science
Reports

1995

Experimental Study of TCP and UDP Protocols for


Future Distributed Databases
Xiangning Liu

Lebin Cheng

Bharat Bhargava
Purdue University, [email protected]

Zhiyuan Zhao
Report Number:
95-046

Liu, Xiangning; Cheng, Lebin; Bhargava, Bharat; and Zhao, Zhiyuan, "Experimental Study of TCP and UDP Protocols for Future
Distributed Databases" (1995). Department of Computer Science Technical Reports. Paper 1221.
https://docs.lib.purdue.edu/cstech/1221

This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for
additional information.
EXPERIMENTAL STUDY OF TCP AND UDP
PROTOCOLS FOR FUTURE DISTRIDUTED DATABASES

Xiangning Liu
Lebin Cheng
nharat nhargava
Zhiyuan Zhao

Department of Computer Sciences


Purdue University
Wed Lafayetle, IN 47907

CSD-TR-95-046
July 1995
Experimental Study of Tep and UDP Protocols for Future
Distributed Databases
Xiangning Liu, Lebin Cheny, Bharat Bhargavu, and Zhiyuan Zhao

Department of Computer Sciences


Purdue University
West Lafayette, IN 47907
E-mail: {xl,lchengJbb,zhao}~cs.purdue.edu

Contents
1 Introduction 2
1.1 Research Overview 3
1.2 Related Research . 4
1.3 Structure of the Paper 5

2 Data Communication on Existing Networks 6


2.1 Local Area Network (LAN) Experiments. 6
2.2 Wide Area Network (WAN) Experiments 14

3 Data Communication on ATM 17


3.1 ATM Technology . . . . . . _ . . . . . . . . 17
3.2 Experiments on ATM with Tep and UDP . 18
3.2.1 Experimental environments . . . . . 19
3.2.2 Experiment T: TCP Implementation on ATM 20
3.2.3 Experiment II: Transmission of Data with TCP -NODELAY Option 26
3.2.4 Experiment III: Transmission of Data with Larger Buffer Size 29
3.3 Summary of TCP and UDP Performance on ATM 30

4 Conclusions 31

Abstract

Experiments were conducted to measure the data communication performance of


the TCP and UDP lIP protocols on both currently-existing computer networks and the
ATM (Asynchronous Transfer Mode) network of the future. Such directly observed
information regarding performance and availability will be useful to developers of both

1
applications and network systems. Of particular relevance to database applications is
our discussion of tile impact of network behavior on the performance and availability
of distributed database systems, especially on databases that are scaled up in multiple
dimensions. We suggest that database systems should adapt to different environments
to enhance efficiency and reliability. We weigh the merit of Tep and UDP for use
wi til different applications on currently-existing networks and the future ATM network.
Abnormal network behavior observed in connection with the transmission of messages of
particular sizes via TCP over an ATM network is disc\ISSed and solutions are proposed.
Fi.nally, we attempt a description of the ATM-based database systems in the future.
J(eywords: Adaptability, availability, performance, scale up, TCP, tJDP/IP, Inter-
net, LAN (Local Area Network), WAN (Wiele Area Network), ATM.

1 Introduction

Future datalla.se applications, such as multimedia applications and digital libraries, will

involve large volumes of data, long distances between distributed sites, large numbers of

sites, and large data units with complex structures [SSU91]. An increase in the number of

data items in a database is termed a volume scale-up, while an increase in the number of

sites is a topological scale-up. An increase in intersite distance is a metric scale-up, and a

structural scalc-up occurs when data items become complex objects. When databases scale
up in all these dimensions, communication of data items assumes a highly significant role

in system performance and availability.

At present, TCP and UDP are the network protocols most commonly used for data

communication. Most distributed database systems are implemented with Tep or UDP,

and this is likely to remain the case into the foreseeable future. The aspects of network

performance and behavior to be discussed in this paper can therefore form the basis of

further database performance analysis and simulation. Database designers can use these

results to choose a suitable network mechanism.

2
It has frequently been suggested that the computer networks of the future will be based

on the fast ATM (Asynchronous Transfer Mode) network [Vet95]. The current network
structure, with its slow speed and unreliability, will not be adequate to the needs of the

future applications, which will be scaled up in all dimensions. The ATM Forum has been

formed by a large group of companies which have made a commitment to the support of

ATM development. For these reasons, databases in the future are likely to incorporate the
ATM technology. The behavior and construction of such distributed database systems are

discussed in this paper, with examples from local experiments cited.

1.1 Research Overview

Experiments were conducted using TCP (Transmission Control Protocol) and UDP (User
Datagram Protocol) [PSC81, Com91], with objects of various sizes and with several trans-
mission options, on currently-existing LANs and WANs. Communication delay and failure

rates were measured. Experiments were also conducted using 19 NASA image files. On the

basis of these results, we analyzed the respective merits ofTCP and UDP and the various

transmission options.

Local experiments using the ATM indicated that the desired performance level can be

achieved for the transmission of messages smaller than 4 Kbytes. With messages of certain

sizes, however, TCP is not fast enough to utilize the bandwidth provided by the ATM

network at the physical layer. The implementation of the TCP protocol becomes in itself

a performance barrier. The potential advantages of using the ATM to efficiently transmit

large messages are thus unacceptably negated.

In one experiment, we removed the ATM switch, directly connected the two end-point

hosts with fiber optic cable, and compared these results to those of tests in whlch an ATM
switch was used to connect the end points. Little difference was observable between the

results of the two tests. This experiment shows that the ATM switch itself is not responsible

for the performance downgrade. The problem was caused by the nusmatch of the computer

3
system, especially the operating system network software, with the ATM physical1ayer.

1.2 Related Research

Many tools and methods have been develolled to measure various aspects of computer

network performance and availability. Layered refinement [PKL91] permits the effective

measurement of the performance of different layers of TCP lIP by filtering noise. [GoI92]
introduced network measurement of end-to-end performance to prediet Internet behavior.

[CJRSS9] described the overhead entailed in mnning TCP and IP by identifying the normal

paths through the compiled code of TCP implementation. Many efficient communication

mechanisms are provided by experimental operating systems. Examples Include V [CheSS],

Mach [Ras86], Amoeba [TRvS+90], Sprite [OCD+S8], and x-kernel [PHOR90]. The devel-

opment of distributed databases has greatly benefited from the results of these research

efforts.

Intense research activity directed toward the development of the computer networks of

the future is taking place in both industrial and academic settings. [Kun92] discussed

the high-speed local area network in which gigabit bandwidths will change dramatically the

domain offuture computer applications. [Vet95) surveyed the basic concepts and technology

of the ATM. [PR9S] presented a simulation study for the TCP protocol of the problem of
data fiooiling on ATM switches. Solutions such as increasing the ATM switch buffer and
additional congestion control methods were proposed to address the mismatch between

ATM technology and the existing TCP implementation. The poor performance of TCP
has also been reported in a number of studies [Rom93, AM94]. [CL9t1] discussed the TCP

lmffer problem of UNIX 4.3 BSD-Tahoe implementation for ATM communication. [CS92]

discussed some internetworking strategies and issues that arise when ATM networks are

used to carry the TCP lIP protocol suite on WANs.

At Purdue University, an experimental RAID (Robust Adaptable and Interoperable Dis-

tributed) database system [BR89b, BRS9a] was developed. With RAID, the adaptability

4
of the communication subsystem is addressed through the UDP protocol. Several major
impmvements have since been implemented to achieve high performance (BZM91, MB91].

These enhancements, which result in an approximately 70% improvement in response time

(e.g., a decrease from 40 to 22 milliseconds), include:

• Using the efficient UDP protocol, rather than TCP, to provide services of simple

naming, communication surveillance, and long datagram mechanisms for large data
items;

• Supporting Lightweight communication, reduced kernel interaction, minimal message


copying, and physical multicasting; and

• Enforcing a policy of immediate rescheduling to explicitly pass control over interpro-

cessing communication on one machine.

The WANCE tools developed as part of RAID support emulation of distributed trans-
action systems over a WAN [ZB93] by taking advantage of the echo facility on each Internet

host. The RAID project has continued to investigate efficient communication mechanisms

which incorporate both the TCP protocol and the ATM network. These investigations,

which are directed toward the anticipated topologic, metric and volume scale-ups offuture
database applications, are described in this paper.

1.3 Structure of the Paper

The remainder of this paper is organized as follows. In Section 2, we present results

of experiments conducted on existing networks. Two sets of experiments on a LAN and

a WAN are discussed. In Section 3, we discuss and analyze experiments conducted on an

ATM network. The problem ofTCP implementation on the ATM is described and solutions

are proposed. In Section 4, we draw conclusions and provide directions for future work.

5
2 Data Communication on Existing Networks

When database systems scale up topologicaJly, metrically, or in data size, the impact on

the widely-used Internet with its Tep and vDr/IP protocols must be considered. Our ex-

periments use the Ethernet as the LAN (local area network) physical data transfer medium.

For WAN (wide area data) communlcation, the Internet is the testbed which is an integra-

tion of many networks.

TCP is a connection-oriented, flow-controlled, end-ta-end transport protocol that pro-

vides reliable and ordered stream delivery of data [PosSI]. UDP is a simple connectionless

datagram transport protocol that provides peer-ta-peer addressing and fast but. unreliable

and unordered delivery of data [Pos80J. The two protocols are thus best suited to different

purposes and applications. Based on experimental performance measurements, we provide

guidelines for selection between these two communication protocols.

2.1 Local Area Network (LAN) Experiments

Problem Statement

The purpose of these experiments was to measure and compare data communication

performance of TCP and UDP on a LAN for 1) short messages in traditional databases and

2) large messages in future applications such as illgilallibraries. UDP is faster than TCP

but is less reliable. Measurements of message round-trip time and loss rate made in the

course of these investigations will permit an objective assessment of the tradeoff between

speed and reliability. The result of this experiment will be the basis for the discussion of

database metric scale-up.

The development of new applications has brought with it increasing demands for shorter

delays, larger bandwidths, lower packet loss rates, and higher throughput. The capacities

of the physical layer may be incompletely exploited by inefficient high-level software. Ap-

plications which do not choose appropriate and properly-configured protocols will not. make

6
optimal use of network capabilities.

Procedure

All experiments on the local area network were conducted between two Sun Sparc work-

stations raid9. cs. purdue. edu and raid!! . cs. purdue. edu connected with 10 Mbps Eth-

ernet. An extension of the ping programs originally authored by M. Muuss [Ste90] was

used as the experimental vehicle. In this set-up, a sender process running on one machine

sends messages to the echo receiver on another machine. The receiver will not echo the

message back to the sender until it receives the entire message. The found-trip time is then

calculated by the sender using the time interval between the delivery of a message and the
arrival of the echo message. Each experimental trial consisted of 50 message echoes, and

the experiment was repeated twenty times. The experiments were conducted using three

TCP modes and two UDP modes. These modes were as follows:

• Individual~connection TCP~ A connection is established and dosed for each mes-

sage sent. Some applications can send several messages through one TCP connection,

while in other instances the number of messages pMsed through a given connection
is unpred.ictable. A system can maintain only a limited number of simultaneous con-

nections. The closing of a connection may increase the overhead for later messages

but will allow other applications to use these limited connection resources.

• No-delay TCP: Small packets are sent immediately by setting the option TCP-NODELAY.

By default and following the small message avoidance protocol, the TCP software

saves small packets in buffers instead of sending them immecllately [Nag84]. The
TCP --..l~ODELAY option allows users to force a packet to be sent immediately after it

is created.

• TCP or single-connection TCP: Regular TCP messages, by default, are sent within
a single connection without invoking the TCP _NODELAY option.

• Non-aggressive UDP: Large messages are divided into 8 Kbyte packets which are

sent at short intervals by UDP. A for loop was used to ratchet up an integer 2000

7
times, thus generating the interpacket interval used In our LAN experiments. The

length of tills interval is system-dependent.

• UDP or aggressive UDP: For large messages, each 8 Kbyte packet is sent immediately
following the previous packet.

The following extenslons were made to the ping programs:

• Provision of options to support separate connection experiments (for Individual-

connectlon TCP) and no-delay transm1ssion of small packets (for No-delay Tep)).

• Support for large messages of up to t\ megabytes, an extension from the 8 Kbyte

maximum of the original programs.

• Measurement of the precise round-trip tlme. The echo receiver awaits arrival of the
entire message before returning it to the sender. The original ping program starts to
echo back when part of the message has arrived.

• Automatic adjustment of the time interval between the individual messages sent out

by the sender. IT the interval (1 second by default) is too short and the messages

are large, the system may break down if a new message is sent before the previous
messages return.

• AvoleUng intermecUate output. Under UDP, large messages are divided into small
packets. If the sender must output the round-trip time after the arrival of each packet,
the accumulated round-trip time measured for a large message will be artificially

increased by the lag of the print statements which illvolves slow device I/O. For

example, as shown in Figure 1, test results with the original program seem to indicate

that UDP was sometimes slower than TCP. When the extra print statements were

removed, tills abnormal behavior disappeared.

• Adding an option to adjust the interpacket interval for non-aggressive UDP data

transmission.

8
Tep"", UDP On LAN

""
., UDP

.........
.------- •......•
.•...
.. .-..
-"--'

, 3 4 S
Mn""ge Size (byl&3)
6 , "
X 10'

Figure 1; Experimental results wlth the original ping program

, TCPvI UDPI."maDm"IGg" on LAN TCPvo UDPIo,omollmo_onLAN

+- - -... Ituividlllll-<:CI'InOCIion TCP ... - - - ... IndMd.... -eawl~ TCP


, ,-+
'" ,
" , ,,, I
t
'"
,,
, ''
' " .- .
'"
I
, ... + ...
,. .....
f
0-._."
~
No-<loIoy TCP

UOP ,-
,
t-_ +_ +- .... _ +
~ ...... -+- ...t \.
¥10
,E. t
... I
¥
j"'''+''
~ t ... +
.'
j ....... -... - ... -... "
.,0 ".........
j"
I ,,, t \ t
1'......."'
,-It.
....
to¥'

.......'
~ ; '~,' -'
,.
5
~

~ 0 .,f"
\-


__I __ .. '

Figure 2: Round-trip time for small messages in a LAN a) shorter than 512 bytes; b) shorter

than 2 Kbytes.

9
Table I: Large NASA image files

SIZE FILE CONTENT


6988 earth-round.gif: Sharp contours, green on blue globe. Res 187x158

7708 carthl.gif: Very sharp contours, green on blue globe. Res: 160x160

17027 gaLline.gif: Red on black, a whole line of only dots. Res: 450x450

29668 gaLgrccn.gif: Green on green, lots of dots, striations of colors. Res: 384x330

35543 comet.gif: White eye, blue Lill, tail Cades into background. Res: 512x:480

60379 mars.gif: Huge circle of light brown shades. Res: 340x340

74058 surface.gif: Sloping surface of white and blue, sharp contours, shades. Res: 550x450

80385 jupiter.gif: Huge circle of red and yeUow shades, yellow text on black. Res: 71 Ox765

97835 gaLblue.gif: Blue on blue, some dots. Res: 607x373

]04365 hubble.costar.gif: Shades of concentric red, orange, yellow colors; shading, texL. Res: 566x384

114323 clll'th_detail.gif: Blurred contours; text; pink color; black bg.. Res: 1152x864

]35701 eclipse2.gif: A huge number of red shades. Res: 784x630

153634 4gal...red.gif: Bright red, orange; black and white dots; shading. Res: 44lx400

175405 sf.gif: Sharp boundary contours, blue, white and red colors. Res: 500x500

205747 ast....spray.gif: Black background, lots of small particles. Res: 701x659

236199 mitwavcl.gif: Orange and white shades, delicate, multicolored ridges. Res: 1024x1024

279786 earth...highres.gif: Blurred contours; text; blue color; hlack bg.; Res: ] 152x864

406851 text+imagc.gif: Text, many dots, subtle shading. Res: 936x867

486430 eclipsel.gif: A huge number of orange and )'ellow sltades. Res: 1280xl024

10
Results

We conducted experiments using messages of various sizes:

1. Figures 2 a.) and b) show the round-trip times for messa.ges under 512 bytes and 2

Kbytes, respectively. No UDP packet loss was detected in th.is case.

2. Figure 3 provides results for larger messages of between 1 and 80 Kbytes in size. Since

the TCP..NODELAY option would have no impact on these results, we did not repeat
the test with th.is option involved.

3. Figure 4 shows the results ol)tained from sending very large messages of between 50
and 1000 Kbytes.

4. Figure 5 provides the results of sending large multimedia NASA files. The sizes and

descriptions of these files aTe listed in Table 1.

TCP .... UDP I", 10'1l0m.... lIO'I on LAN UDP lou lo,lor;.lTIOtIIlllH on !.AN

,,•
••
... - .... Indvidu.. ~Q'InOdj"" TCP

<>-------<> TCP •
....... UDPmnmgor-po"'"'!lllIO

• ' ....• UDP ,..
, •
,
l
<\l:2.S

,
..... .. .' . ...... .. 2" •• , • !g ,
,. : '


.. "..•.
'

V
,
•'"
'. II.....-
-' \,
• _.2'-
..
.. ,.' .....
,
•• 0.'
.•.•.. •
'-

°ol-''---~-,,----:,:----,.:--~.---:.---,---:,:--.:----:. '. , 3 • S
M..."!!e 50.. (byIo'l
6 , • •
"'_llgoSOze(br1"j ~ I.' Xl.'

Figure 3: a) Rou nd- trip time faT large messages; b) UDP packet loss rate for large messages

Discussion

For small messages, the experiments show that UDP is the most efficient mechanism, with
almost nO message loss noted. In view of the absence of message loss, TCP need undertake

11
LlDPI"., lo'varv1o'110 "",",,1j'OO "" LAN
..'!. ll. ",_.

,
!'. " "
... - - - + In<tilOdlllll-oomocti<n Tep
!"".•"" \ .-/. - \/
,. 0----0

X-'---'I<
Tep
tl«l-Q!I9~LlDP
it" -"·;l . .i.
• -, ... _--'. UDP •,.
~ ,
1 ....... ,. LlDP

J
.~,.5

"] ,
o.

'. , 4 e e 10 ~
'. , • • •
M.. mgo~... (byIMJ .10' MM..... Sizo (byIMJ "

Figure 4: a) Round-trip time for very large messagesj b) UDP packet loss rate for very large

messages

+--- ....
TCP.,.. UDP lorNASAimOiOlIlH O!\LAtl

,,
/
. . -- - --- ..• UDP loa I", NASA im"~ !I•• on LAN
LlDP"'"",l•
• . .•
,,
Non-<l\lIl'~.UDP 1= mt•

,. .......-_.-
x_· x ,, , • -.....-.. .......

.' .... - .. ,,
, , ,,
, ,

,
..-
,
,

'. .. M
-
1
..~1f"._~,_.;:,:-.:.:~.:-:::.-::.~::::.~::-;.:~::1l:.--_.__._.-
1~ ~ ~S J
M"'llgoSizo(tP{looo)
35 4 ~5
Xl0'

5
" .,
,. •
1.5 2
1.10....."
2.5
Si,.
3
(brtn)
J.S 4.5 5
xl0'

Figure 5: a) Round-trip time for large files; b) UDP packet loss rate for large files

12
no retransmission; its performance here is therefore comparable to that of UDP. Further-

more, TCP-.NODELAY has no effect when only one small message is sent. The effects of
TCP -.NODELAY will be discussed further in Section 3.2. Connection establishment has

been noted to add significant overhead for the sending of small messages. Therefore, the

reestablishment of connections should be avoided for such messages.

It is clear that, while UOP is much faster than other mechanisms for messages of larger

size, its reliability gradually declines under these circumstances. The Ethernet employs the

CSMA/CD (Carrier Sense Multiple Access/Collision Detection) mechanism to avoid colli-


sIons of transmitted signals. When messages become larger and more IP packets must thus

be genl3ratecl for each message, the chance of collisions and out-of-order delivery becomes

signiflcantly higher. For this reason, TCP must retransmit large amounts of data when

sending large messages, thus degrading its performance.

The extent of misordered data and data loss increases significantly when the message size

exceeds 200 Kbytes. The overhead caused by connection re-establishment, in thls case, is

negligible.

For very large messages, non-aggressive UDP transm..ission can reduce packet loss signif-

icantly while causing little increase in speed-related overhead. This situation is depicted in

Figure 4. In these experiments, the loop

for(i=O;i<2000;i++)

were used to generate the interval between each UDP packet. Experiments show that

intervals longer than these may increase transmission delays with little further improvement

in reliability. Shorter intervals, on the other hand, may not be enough to bring significant
improvement. The number of iterations is system-dependent.

13
2.2 Wide Area Network (WAN) Experiments

Problem Statement

These experiments were intended to investigate the questions discussed above within the

context of a wide area network (WAN). Based on the result, database topological and metric

scale-up is discussed with respect to the data communication.

Procedure

TIle ping program as described in Section 2.1 was again employed for these experiments.
An echo server was installed on a Sun Spare workstation at Stanford University; unlike the

common echo server on port 7, this server has the capacity to handle the large messages

used for these experiments. The input data from the LAN experiments was used llcrc again.

Results

The experimental results for messages of different sizes were as follows:

1. Figure 6 shows round-trip time for messages under 1 Kbytes. The UDP packet loss

was small (less than 7%), and the TCP -l~ODELAY option had no impact in this case.

2. Figure 7 provides results obtained from sending large messages of between 1 and 80
Kbytes.

3. Figure 8 shows the results of sending very large messages of between 50 and 1000
Kbytes. The overhead arising from connection time is negligible in this case, but

there is an unacceptably high UDP loss rate. It therefore proved impossible to gather

a sufficient sample of round-trip times.

4. Figure 9 provides results obtained from sending the same large NASA multimedia

data files discussed in Section 2.1 and detailed in Table 1.

Discussion

14
TCPv. UDP lor ..nolIm ........ o:l WAN

IndMduo/-emn&<lion TCP,'
0.
... ,t--...
, ,
UDP loa lonmDII m... og... o:l WAN

.
,,{"
•.. ,..• UDPI_rnlOi
\
...'
{
~

700 •
: ..
,,'1
:\,' '.,.~ •
t .. , {, I
11; t +J\" , , I
", " , I
....t l " .. - ..
,I ,(

i "-:.
~._<> No-dolDyTCP

=
.. .. "" --
,~

.'
,o •
.....
..... !l..
.•.... •
.•/·-,c·'"" A "

.i',
j
R.. ..,.
~

,~
......:::: j ..

--., ...•...•.. - ..
--.:.

0 400 600 GOO '0 400 GOO BOO


I-IoOOtlg. 50,. (bytoo) M..... ~5O... (byIo-o)

Figure 6: a) Round-trip time, b) UDP packet loss rate for small messages over WAN

... - ---+- lnotYiduo/-eoonnoco;on Tep


7000 0----<> TCP

.-...• UDP

0
0
.. .. .... ..........
' , . 2
-

:I 4 5 e
- .
M..."S.Siz.(byIn)

Figure 7: a) Round-trip time, b) UDP packet loss rate for large messages OYer WAN

15
UOF ~ lor""",lorgom"..og.. onWNl
.......... -.•.........•- ,.-- --.-
---- ..- -..•....... , .
8 0-----0 Tep

, gS ,,- .. ·--·x UOPToo.RlIO

••:---c,:----;---,,:---,.:---"~.--__:,,' '"\--;-----c,--;--,;-----;;-''!<---::---:
05 2. U 1~ 3 U 4 4~
M......II"Sim(by\&o) "10' MoaogoSizo (byl.. j "10'

Figure 8: a) Round-trip time for very large messages; b) UDP message lost rate for very

large messages

,. .,0' TCF"'UDFI"'~ASAim.lI"fil
. .. ""WNl
'00 !l'-
UDP lOll lor ~ASA im"ll"~'" fNMWNl
,..•....•..-.,'.- -- --- .. -- - -•
0-----<1 TCF
, " ..... ···_x uOP

.- --------. UOPloo.rlIlO

••• 00 ,

..•... . .... --. •


'. ... .- '".Ci~...---;;-",--:,-;;--,:-,..
- --
1.5 2 2.5 3 05 2. 2.5 3
1~ U
-;---t<--;
U 5
M. .IogO Sizo (bylo.) .,0' Moft"ll"Sim (byl...) .10'

Figure 9: a) Round-trip time for NASA image files on WANj b) UDP packot loss rate for

NASA image files

16
These results from the LAN and WAN experiments lead to the following conclusions:

• In general, TCP is recoIlllllended for data communication in database systems, es-

pecially when databases scale up topologically and metrically. Despite its speed ad-

vantages, the reliability of UDP is inadequate for remote communication and large

data transmission. Data loss and out-of-order messages may cause transaction failure

from which the database may have difficulty recoverying. When databases scale up on

topological and metric dimension, the problem becomes more severe. A reliable data

communication service is therefore essential, and TCP is preferable in this regard.

On a Local Area Network (LAN), the efficiency of TCP is usually adequate for the

transmission of small messages and is only slightly less than that of UDP.

• UDP should be used only for time-critical applications, such as video services and

digital libraries, that may tolerate a certain amount of data loss. Since the aggressive

sending of continuous UDP packets will result in heavy data losses, transmissions
should instead be spaced at short intervals. Experimental results using this method

on a LAN show a significant reduction in losses, and, although the speed is somewhat

decreased, it is still much faster than TCP. Unfortunately, this method does little to
improve UDP reliability in a WAN context.

• Applications which act immediately upon partially-received data can also employ

UDP. For example, an image can be partially or roughly displayed to a user on the

basis or partial data received from a remote site, with the image completed as the

remaining data arrives. Lost packets can be retransmitted for later arrival. The

fiex.ibility of UDP is well suited to this approach.

3 Data Communication on ATM

3.1 ATM Technology

Database scale-up on topological, metric, and voll1me dimensions requires fast and re-

liable data communication. ATM (Asynchronous Transfer Mode) is a family of protocols

17
supporting both circuit and packet-switching services. ATM cells, or fLXed-Iength packets,

form the basic data units. An ATM cell, as defined by lTV (formerly CCITT) recommenda-

tion 1.361, contains 48 octets of data and 5 octets of control information_ This fixed-length

cell and the early binding of routing information during the connection setup make the

ATM suitable for high-speed data communication. Its bandwidth reservation and graceful

multiplexing also reuder it suitable for multi-media traffic. Four major benefits of the ATM

discussed in [KW95] are scalability, statistical multiplexing, traffic integration, and network

simplicity. These characteristics are the very things required by database scale-up.

Many applications will benefit from the flexibility in switching and high speed provided by

the ATM. For example, consider the case of an oil company with several computing centers

generating large numerical simulations of drill sites. Engineers would find it useful to display

these results on their graphical workstations, an application requiring a high-throughput

network. As another example, consider a digital medical application transmitting dozens

of x-ray images involving several gigabits of information. In many cases, real-time access

is required for collaborative diagnosis between physicians in different locations. This large

quantity of data can only be delivered in real time by networks running at multimegabit

speeds. Ideally, the collaborating physicians may wish to set up video conference connections
for their discussion. The ATM provides flexible multiplexing technology to meet these needs_

The ATM was originally designed for the B-ISDN (Broadband Integrated Services Digital

Network), a network intended to integrate time critical and data-heavy graphic, imaging,

video, and audio services. As database systems scale up in multiple dimensions, tlley IlluSt

also involve applications of this sort, for which the ATM willlJe the most suitable IJlatform.

3.2 Experiments on ATM with TCP and UDP

In order to take full advantage of an ATM network, adaptation lllllSt be made to match the

ATM with the popular Tep lIP and UDP protocols upon which most existing applications

arc built. We conducted experiments to investigate the data communication characteristics

18
of TCP lIP and UDP when applYIng to an ATM network. The experiment was designed to

observe the impact of this change in the lower layer on the upper-level database systems.

We expected that the ATM would provide fast and reliable data transmission for the ap-

plications. Due to the resource limitation, we can only conduct experiments on an ATM

LAN. The scale-up experiments are only conducted on the data volume dimension. But we

still can use the results fOT the (liscussion of scale-up on topological and metric dimension.

3.2.1 Experimental environments

Our experiments involved two Sun Microsystems SPARCstation IPCs, called percival

and fibonacci, running SunDS 4.1.3 with the 4.3 BSO-Tahoe TCP implementation. As

shown in Figure 10, each host connects to a Fore System ASX-lOO ATM switch via 100
Mbp/s multi-mode nber cables, which supplement the conventional 10 Mbp/s Ethernet.
Both hosts use Fore System SBA-200 ATM adapter cards, the driver of which supports

ATM Adaptation Layer 5 (ALL5). The Fore ATM switch comes with a dedicated processor

and special-purpose software for user configurations.

ASX-IOO
ATM Swilch
,-----
100 Mbfs fibl:r lillk 100 Mb/. fibl:r link

'-----
SBA-200 SBA_200
A<bplcr card Ado3plcr card

[ICrcival fibollolCCi
(SunOS 4.1.3) (SunOS 4.1.3)

10 Mb/s Elhcrn.l

Figure 10: Experimental environment setup with Fore ATM switch

19
3.2.2 Experiment I: TCP Implementation on ATM

Problem Statement

These experiments are intended to measure the performance of the Tep and U0 P pro-

Locols on an ATM network and to investigate the relative merits of an ATM network.

Compared to the conventional Ethernet, the message transfer delay on an ATM network,

with its high-speed fiber cable, is greatly reduced. TCP and UDP were developed before

the existence of the ATM, and -problems may therefore arise from the combination of new

and existing Lechnologies.

, TCP and UDP ,ound-triplime onATM and Elhe"'.l


•..........• TCP on AT'"

o --·-----0 TCP On Ethernet

X' ·········x UDPonATM

.... --........ UDPonElh.",ut


~ :: ~O,?
i'0.6.
= :.
.,!, ;: :: '
Is:-ooo~J ~i oooooo~oooooooooooooooooo ~OOO~O

~ w ~••••• ~.~ ••••••~ ~ •• ~ ••••


r, .,
'. '" " '"
Figure 11: Round-trip time for I Kbyte messages with TCP and UDP usmg ATM and

Ethernet

Procedure and results

In the experiments shown in Figure 11-13,50 messages were sent repeatedly through both

the ATM and Ethernet interfaces using the ping program described in Section 2.1. Figure

11 illustrates the results for the l.ransmlssion of 1 Kbyte data. In this case, the ATM proved
faster than the Ethernet for both UDP and TCP. Figure 12 shows the abnormal behavior

encountered when sending 8 Kbyte messages with TCP on the ATM. Round trip times in

20
rcp end UDP round-lnp Ii...... on ATM end Ethernel

~I(
. .•• •....
:
••


'.'.
".
••
' ..•
'
'.'.
• .•••.
1('\1(

. .. •
••

' •

.
, 1(•••••••••

0- .. ···_·0
rep en ATM
rep en Elhernot
)(. ····· __ ·-x UDPenATM
+- .....__ .-+ UDP en Elhorne1

~~I~ • • • • • ~~ q •••••• ~ ••••

00 ,
~)(X~)(XX~XXXXXXXXXkXXkkXkXkkXkkkkkkkkxxxxxxxxxxxx

" " "


Repetition

Figure 12: Round-trip time for 8 Kbyte message with TCP and UDP using ATM and

Ethernet

TCP rouod-lriplirre en ATM and Elhernet

0- ········0 TCPen Elnernel


1\
.- ----- ...• TCPOtlATM

~<
.¥,i ,
,
;) ,

•• ~ ••• X••••••••••••X"••" ••••••••••••• "'~ ~ •••x


"·"o--",--"";;--~'~'---'ro:;;--c,o,,--,--,,,c,;--~,,,---.,~o---;,o,----;,,,
Ropetilion

Figure 13: Round-trip time for 100 Kbyte message with TCP using ATM and Ethernet

21
this context fluctuated widely and were generally long. In contrast, Figure 13 reveals stable

and short round-trip times for messages sent over the ATM network and long and oscillating

round-trip times on the Ethernet. Figure 14 provides experimental results for sending fJles
of size discussed ill Section 2.1. While the average round-trip times on the ATM is long

and almost constant for small fJles, the round-trip times measured for large files fits well

with predictions. These large-file times are shorter on the ATM than on the EthernC!t and

increase with the size of files sent.


TCP ..-.l UDP lor NASA irnogo il•• on ATM III'Id EIf1.mOl UDP 10>. ",I. 'or NASA imo.~ file..... ATM

0-·_·_·_0 ";!h TCP "" EIf10m0l ,K" ·········x UDP 10lOfn'o

!~..
_ ";lhTCP""ATM
·K
K'-- --····K ";!hUDP""ATM

'"
~GO •
~
~.- - "
l'-·.
! 50 , ~..'-.~ x
~ 40 ;
o .

" M
,
20 '

r\.,~ 11 _K·· •.,.••• K •• '· " •.••••• K •• ···'·····,···,·· .. ····-

-",., .. . .;<.. -i. x' .x··


'. •• '.5 2 2.5
M"'''9.~... (I>jl'')
:I ,.. • ••
.10'

Figure 14: NASE image files on ATM a) Round-trip time; h) UDP packet loss rate

Discussion

The origin of the abnormal behavior noted above lies in the implementation of the

TCP lIP protocol. Both [CL94] and [LMK90] provide detailed discussions on this ques-

tion. Those factors which are most pertinent to our experiments are:

• Sender and receiver TCP buffer sizes;

• Physical MTU (Maximum Transmission Unit);

• TCP MSS (Maximum Segment Size);

• TCP buffer implementation and data delivery mechanism;

22
• The small packet avoidance algorithm; and

• The silly window syndrome avoidance algorithm of TCP;

By default, the sender and receiver window sizes are 16 Kbyte. The MTU of the Ethernet

is 1500 bytes, while the MTU of the ATM is 9188 bytes. Accordingly, the TCP software

sets the MSS to 1460 bytes for the Ethernet and 9148 bytes for the ATM. The MSS for
TCP is calculated by the formula MSS=MTU-40, where 40 is the TPC and IF header size.

TCP Buffers are implemented as chains of mbufs [LMK90] in the Sun OS 4.1.3. Each

mbuf can store either up to 112 bytes of data or a llOinter to a 1 Kbyte data page. The
TCP output routine allocates mbufs to contain the TCP headers and the data. As long as

4 Kbyte of data is available in the buffer, the operating system invokes network routines to

transmit a packet.

TCP will delay the sending of small packets according to the Small packet avoidance

algorithm [Nag84]. Thus, TCP will send packets only under the following conditions:

1. when the connection is idle;

2. when min(D, U) ~ M 5S, where D is the size of the data to be sent and U is the usable
window size. The latter is defined as the receiver window size minus the amount of

outstanding data sent by the sender for which no acknowledgement has been received;

3. when all outstanding data has been acknowledged;

4. when min(D, U) ~ max_sndwnd/2 1 where max_sndwnd is the largest receiver win-


dow size.

According to the Silly window syndrome avoidance algorithm [ClaS2, Jac88] in Tep, the

receiver should avoid small window advertisements as a waste of bandwidth. The receiver

TCP will send an ACK with a window update only under the following conditions:

1. when the alllount of data received R 2: 2 * M 58j

23
2. when R 2: 35% receiver window size;

3. when tcp_Jasttimo() timer expires (every 200 milliseconds).

Based on the previous discussion, when sending a 8 Kbyte message, TCP first places 4
Kbyte of data in the mbufs. This data is transmitted immediately, since the connection is

initially i<Ue. The remaining 4 Kbyte of data is then stored in the buffer. Tilis second data

instalhnent cannot be be sent before the ACK of the first 4 Kbyte message is returned,
since 4 J( < MSS = 9148 and 4 J( < half of the 71wx_sndwnd (which is 16 Kbyte). On

the receiver's side, when the first 4 Kbyte are received, none of the conditions for sending

an ACK have been met unless the lcp_fasttimo() timer has expired. The ACK is therefore

returned to the sender with an average delay of 100 milliseconds. As a result, the sender
must wait for the ACK for an average of 100 milliseconds before the next packet can be

sent. Gur results indicates that this delay was uniformly distributed as U[0,200J. Because

the same situation arose when echoed messages were returned to the sender, the overall
average round trip time is 200 milliseconds when sending 8 Kbyte messages.

In the experiment shown in Figure 12, 50 messages were sent at one-second intervals. This

was implemented by calling the alarm(1) and signal(SIGALRM. sig..handler) routines.

The two routines were separately tested and it was found that the sig..handler response

experienced an approximately 10-millisecond delay each time. Consequently, the delay

causes the tcp..fasttimo() timer to expire about 10 milliseconds ahead of schedule. As a

result, the round-trip time measured in the experiment seelllS to faU into a cycle, declining
by about 10 milliseconds until reaching the lower limit, and then initiating another cycle

by jumping again to the maximum time.

Based on our previous discussion, we can predict the range of message size that may

suffer abnormal network behavior:

(12n + 4)[( + 1 :::; message_size:::; 12(n + 1)J( - 1.

24
The normal ranges are:

12nJ( ::; messagcsize ::; (12n + 4)/(.

We conducted experiments with messages of size 4K, 4K+l; 12K-I, 12K; 16K, 16K+l;
24K-l, 24K; and 28K, 28K +1. The test results confirm the predicted behavior. For

example, Figure 15 illnstrates the results of sending 12K-l byte data and 12 Kbyte data.

The delay experienced when sending 12K-l byte messages is dramatically longer than the

delay incurred when sending a 12 Kbyte message. Since 1 Kbyte and 100 Kbyte messages

are not in the range in which we predicted abnormalities, as shown in Figures 11 and 13,
the delay we observed is normal. This explanation also justifies the abnormalities shown

in Figure 14, in that round-trip times are exceedlngly large for file size in out predicted

abnormal ranges.

'..• •
TCP found-trip lime onATM lor 12k and 123<- I byla me.....ll".

"" •
,


'.' .• .

'
••
• ..'•,

•.. - -- .. -. 12k_1 bylas

(). ·······0 12k bylas

OOOOOOOOOOOOOOoooooo-o~ooooo

"" , 10 IS 20 25
R.""liliOll
30 35 40 45 50

Figure 15: Transmission of 12 Kbyte and 12K-I byte messages on ATM

A similar abnormality can also be observed on the Ethernet. Given MTU = 1500 for the
Ethernet, we find that the abnormal ranges are

4J( + (2n + I)M + 1 ::; message_size::; 4 J( + (2n + 2)M - 1

where n = 0, 1, ...; K=1024, and M=14GO.

25
The normal ranges are

4[( + (2n)M S; message_size.$ 4J( + (2n+ l)M


wher~ n=l, 2, 3, ...; K and M are as above.

For messages of 1 to 4 Kbyte in size, the network behavior of the Ethernet is normal.
Abnormalities are found in the ranges from 4K+1 to t1K+M-l byte. Experimental results

obtained have been consistent with this prediction. While the abnormal range is 8 Kbyte

and the normal range is 4 Kbyte on the ATM, the length of both the abnormal and normal

ranges on the Ethernet are 1460 bytes. Therefore, if all data are uniformly distributed, the

probability of abnormal behavior on the ATM is approximately 66%, versus 50% on the

Ethernet. IT message length distribution is skewed to small messages, the performance of an

ATM network will be even worse compared with the Ethernet. In addition, since Ethernet

speeds are low in comparison with the ATM network, the impact of abnormal behavior on

the Ethernet is less severe.

3.2.3 Experiment II: Transmission of Data with TCP_NODELAY Option

Problem Statement

The Tep ...NODELAY option, as discussed in Section 2.1 permits immediate delivery of

small messages with TCP. The abnormal behavior described in the previous section was

caused by the delayed transmission of packets waiting for an ACK. This problem may be

addressed by using the TCP...NODELAY option to force transmission.

Procedure and results

Figure 16 provides the results from sending messages 8 Kbytes and 100 Kbytes in size,

with the TCP-NODELAY set on both the ATM and the Ethernet. A comparison with
Figures 12 and 13 shows that the abnormal behavior (lisappears when the TCP...NODELAY

is set. Figures 17 and 18 provide additional test results with the TCP_NODELAY set.

Discussion

26
TCP ro~<l-'rip 'me <>'1 ATM..,d Elhemel";fl TCP..NODELAY CIplial

•,
, .. .... ...• 100Kt>Il.. ..,Am
0- •••• "'0 100Kt>Iln..,Eflorno'

.. ----..... 8KbyleoonAm

ilOO i~oo
0-- - -- --0 6Kbyleo""Elhomel
1 1
~ !=
••
F

~:IOO 1
]~50
150
!: ,. ,l"\/U,..oolo. 00 00 \
...."".."
oooooooooOOOOO<l 0°00-"000000000000000000000000'
.
'00 1
W
.-. i\ .

'. W 15 ~ ~
R"""lim
~ ~ ~ ~ 50 ,OO.~.---.,-c,".-c,",-.,~.-~",-----.oo'---c,.,~c.".-c.",---,~
Ilopo'lm

Figure 16: Transmission of data on ATM with the TCP...NODELAY option

Simply setting the TCP -NODELAY option when transmitting messages in the abnormal

range is the easiest way to avoid abnormal delays, without modification of the operat-

ing system. This method also addresses the buffering problem described in [CL94]. We
repeated the same test as [CL94] in the same experiment environment except that the

TCP..NODELAY option was set in our tests. In comparison with the result shown in Table

2 from [CL94], Table 3 indicates that the abnormal behavior caused by the small packet

avoidance algorithm disappears. One may observe that there are still cases in which the
TCP tnrough}Hlt is abnormal in Table 3. This abnormal behavior, however, is due to the

silly window syndrome avoidance algorithm. According to the silly window syndrome avoid-

ance algorithm discussed in Section 3.2, an ACK will be delayed for 200 milliseconds unless
the amount of data received is greater than either 2*MSS or 35% of the receiver buffer size.

On an ATM network where MSS = 9148, when the sender buffer size is only 16 Kbytes,

the maximum size of the TCP packets sent is less than 2 * MSS. Therefore, neither of the

conditions for an immediate ACK is met when the receiver buffer 1s larger than 46 Kbytes.

Consequently, after every delivery of 16 Kbyte data, the sender halts the transmission as

the buffer filled up waiting for an ACK, while the receiver delays the ACK untillts timer

expires. Effectively, the TCP throughput is downgraded substantially because of tIle 200

miltisecond delay for every 16 Kbyte message.

27
Setting the TCP_NODELAY option to avoid the abnormality, however, is not costless.

Compared with Table 2, Table 3 indicates that the average throughput In the normal

cases is lower in our tests. Delivery of small packets has higher overhead and thus waste

available bandwidth. By setting TCP_NODELAY, the overall performance of the network

is sacrificed to avoId abnormality for sendlng small packets. One should choose to set the

TCP_NODELAY option by carefully investigate the sender and receiver buffer size, the

MTU of the network, and more important, the applications existing on the network.

Receive Buffer Size (octel)


""'- 16K 20K 24K 28K 32K 36K 40K 44K 48K 5IK
16K 15.05 13.60 ':;0.322- 0.3,19,:, 0.319:: ,:0.4:67 ' : 0.499:: :,,0·466,:: °'0.409,:' :()·4~:::
20K 15.99 14.60 15.01 14.87 15.40 14.24 ",1.005" ,,1.095 : ':0.5.48 : 0.S4~,:
24K 11.11 16.19 16.14 16.32 11.40 11.31 11.42 11.12 .O.76(J 0.140,
28K 16.57 11.69 11.93 18.13 18.36 19.20 19.14 19.18 18.38 18.20
32K 14.63 18.96 18.42 19.23 19.14 19.14 19.96 20.31 19.69 19.17
36K 14.33 19.22 18.12 19.82 19.11 19.92 20.56 20.49 20.13 20.20
40K 15.16 19.34 18.85 19.73 20.11 20.41 20.81 20.74 20.69 20.57
44K 14.80 19.40 18.27 20.39 20.16 20.14 20.99 20.87 20.89 20.70
48K 14.62 19.46 18.34 20.48 20.26 20.41 20.85 20.83 20.93 20.83
5IK 13.92 19.41 18.26 20.50 20.06 20.21 20.88 20.91 21.21 21.06
,
Note I. Throughpot Dumber::; nro lD megabits per Sllcond (Mbfs).
2; Shaded area indicates abnonn31 Tcr throughput.

Table 2: TCP buffer size and mean throughput [CL94]

Receive Buffer Size (oclet)


16K 20K 24K 28K 32K 30K 40K 44K 48K 51K
"-
16K 14.44 13.36 12.30 12.43 12.51 13.61 12.94 13.00 ",,0.625, 0.625
20K 15.14 16.49 16.04 16.00 16.81 !5.23 11.85 15.40 13.92 13.82
24K 16.39 17.22 17.04 17.17 17.16 18.oJ 18.01 17.91 16.51 15.90
28K 12.80 17.10 17.32 17.44 17.54 1156 18.00 17.92 16.25 16.11
32K 15.91 17.20 1758 17.50 17.69 11.72 11.99 17.99 17.39 11.54
36K 17.16 17.84 16.84 18.04 18.06 18.39 18.09 18.31 18.39 18.20
40K 17.21 11.14 16.49 17.24 17.14 18.42 18.30 17.80 19.55 18.96
44K 16.02 15.39 14.67 15.43 14.09 15.23 14.32 17.38 19.05 17.86
48K 15.08 15.74 15.26 17.30 16.61 11.16 13.69 14.91 18.38 13.18
51K 14.00 16.33 14.56 14.41 16.22 14.35 15.80 15.63 15.50 15.45
,
Note I. Throughpul number::; are III megabll'i per second (Mbfs).
2: Shaded area indicates abnormal TCP throughput.

Table 3: TCP buffer size and mean throughput with TCP ..NODELAY option

28
3.2.4 Experiment III: Transmission of Data with Larger Buffer Size

Problem Statement

For large messages, enlarging the buffer size may also improve network performance. The

ATM network used in these experiments has an MTU =9188j this is, larger than that of

the Ethernet (MTU=1500). The experimental results should show that using larger buffers

improves the performance of the ATM network significantly.

Procedure and results

Figure 17 shows results for the transmission of 100 Kbyte messages with buffer sizes of 32

Kbytes and 16 Kbytes. Figure 18 provides results for large-size files. In these experiments,

the ATM provides significant increment in speed over the Ethernet.


l00Kbylo rT"IOSSIIgo onATM ond E1h.rnol

_ TCP on ATM, 32K bllftor


x· ······x TCPonATM,16Kbllftor

Figure 17: Round-trip time for 100 Kbyte messages on ATM with 32 Kbyte buffer and the
TCP-NODELAY option

Discussion

The experimental results confirm our assumption a larger buffer size enhance ATM per-

formance. Figures 17 and 18 indicate that ATM performance has been improved by ap-

29
Large Iles on ATM ~r.s Ethernet
"00 0-,-·-,-0 TCP on Ethernel. 32l< buller

• • TCP on ATM. 32K buHer



..
+ .. __ .. _.. -+
"00
.. ······x
TCP on Ethernel, 16K buller

Tep on ATM, 16K buller

Figure 18: Round-trip time for NASA image files on ATM with 32 Kbyte buffer and the
TCP_NODELAYoption

proximately 30% with 32 Kbyte buffers in comparison with 16 Kbyte buffers for messages

larger than 100 Kbytes. Furthermore, we observed that the performance of the Ethernet
was not improved as significantly as that of the ATM when the buffer size changes from 16

Kbytes to 32 Kbytes; this improvement was only about. 20%. It appears that a buffer of 16

Kbytcs is sufficient to utilize the slower Ethernet, although it is insufficient for the ATM

network.

3.3 Summary of TCP and UDP Performance on ATM

The results of OUT investigations indicate that the performance of UDP on the ATM IS

superior to its performance on the Ethernet. While loss rates on the ATM are about 30%

lower, the unreliability of UDP still recommends against its use as the datallase commu-

nication protocol in most cases. The problems discussed in Section 2 with regard to the

Internet apply as well to the ATM. However, while non-aggressive UDP packet delivery is

not effective on an Internet WAN, its effectiveness on an ATM WAN is still unresolved,

since the higher ATM speed produces lower loss rates.

30
Based on our investigation and analysis of the TCP lIP protocol and implementation, we
have the following summary conclusions:

• Problems with the use of TCP have been discussed in [CL94, PR95, Rom93, AM94].

Our findings also indicate that implementation of TCP might fail to fully utilize the

capabilities of an ATM network. The TCP protocol and its implementation must

undergo fmther evaluation bGfore it becomes the protocol of choice for the future

network. In particular, modifications must be made to the small packet and silly

window syndrome avoidance mechanism and the packet delivery mechanism.

• If it is undesirable to alter the TCP implementation, the TCP....NODELAY option may


be set at the database application level to avoid abnormal behaviors.

• Users may employ a larger sender and receiver buffer. This should improve perfor-

mance at the application level on a higher-capacity network such as ATM.

• If fully utilized, an ATM network provides high-throughput transmission for large


messages. Therefore, ATM is ideal for multi-media applications in which large amount
of higher level data translnissions are not uncommon.

• The advantages of ATM is more obvious in a WAN environment since a ATM switch

route messages much faster than current routers. Reserved channel capacity can re-

duce the loss of cells and improve reliability. Therefore, the ATM should be the choice
for tIle database topological and metric scale-up. As shown in both oUI local exper-

iments and other research [PR95, Rom93, AM94], however, current TCP buffering

mechanism, transmission implementation and congestion control protocol may not be

suitable for the ATM. Further experimental study is necessary on ATM technologies

in a WAN environment.

4 Conclusions

Experiments were conducted on both the conventional Ethernet network and the ATM

network with the TCP and UDP lIP protocols in order to generate findings of use to devel-

31
opers of distributed datal)ases. The relative merits of each scenario were assessed on the

basis of these experimental data. In general, it was found that the TCP protocol is very

slow but reliable for large data messages and transmission between remote siLes. The UDP

protocol with a few adjustments, however, is found useful for time critical applications that

can tolerate a certain amount of data loss. Abnormal behaviors were observed during the
performance test ofTCP and unp on an ATM LAN. We found that higher capacity of an

ATM network does not necessarily mean enhanced performance to higher level protocols,

if higher level protocols are not configured appropriately to take advantage of the ATM

technology. Careful investigation is advisable for distributed database developer choose the

right parameters for their applications.

Previous research [BZM91, MB91] suggested the use of unp to send small packets in a

LAN. In a WAN environment, however, TCP is preferable, especially for large messages.

Therefore, TCP is the choice of data communication for database scale-up. Data trans-

mission in a WAN usually suffers high packet loss rate that common applications can not

tolerate. Despite tho low speed, TCP's guaranteed reliability makes it the protocol of choice

for common applications. But exception can be found in applications such as video con-

ferencing, which is time critical but is not affected by data loss as seriously. Using UDP

datagrams, instead of TCP packets, to transmit data can provide the required high speed

with acceptable qualities to applications of these kind. Furthermore, we found in our ex-

periments that a small increase in the time interval between each UDP datagram during

the transmission may decrease the data loss rate significantly. Adjustments, such as the

insertion of a short for loop between the transmission of each datagram, are found quite

effective to improve the reliability of UDP with little negative impacts on the speed. It is

worth mentioning that such adjustments are machine and application dependent. Devel-

opers of a distributed database system must balance their need for reliability and speed in

order to optimize the network performance of their system.

32
In general, an ATM network provides much higher transmission throughput compared

to the conventional Ethernet. But higher level protocols must be adjusted in order to

take advantage of such improvement. The TCP /IP implementation of existing systems is

assuming a low speed physical network such as Ethernet, and thus may not be suitable the

high throughput ATM. Abnormal behaviors are experienced during our experiment. Simple

adjustment, such as setting the TCP-NOD ELAY option, works to address this problem. But
the TCP_NODELAY option should be taken as a quick fix rather than an optimal solution

because it downgrades the throughput in the average cases. Choosing larger buffers on

both the sender and receiver's side address this issue while generating higher throughputs.

But buffers are allocated in the host machine's memory, which is usually limited resource

shared by multiple users/applications. It is not possible for a real time system to allocate
arbitrarily large buffers to optimize performance for each individual application. While

choosing large buffers works in our experiment to optimize data transmission throughput

between two test applications on a dedicated ATM network, how well this solution works

in pratice remains as an open question.

Despite the abnormalities, our experiments indicate that a high capacity ATM network

will enhance network performance significantly if it is fully utilized. In the normal cases,

the ATM network over-performs the Ethernet for both TCP and UDP. Considering its

high capacity and the potential to grow, the ATM network is ideal for the next generation

distributed database applications.

Based on our studies, we suggest future research on the following issues:

• Further study on a standardized ATM network will be necessary when a universal

standard of the ATM exists. Neither the hardware nor the software of the ATM

network used in our experiment is produced under a universally recognized standard.

Because the issues discussed in this research are machine and application dependent,

the experiment results and solutions may not apply to different systems. Research

for a widely applicable solution is yet impossible because the lack of standards in the

33
field of ATM.

• The existing network protocols, such as TCP fIP, should undergo evolution in order
to adapt to the improving physical network. Higher level applications, such as the

distributed database applications, rely on TCP lIP to achieve better performance.

How TCP lIP can take full advantage of the higher capacity provided by the ATM

network proves crucial to the overall performance of the system.

• Performance study of a real world ATM network may reveal issues that are left out

in this paper. Our experimental environment is an ideal environment, in which two

dedicated machines enjoy an otherwise idle ATM network. Studies have shown that

an ATM network may behave quite differently when it is saturated and is forced to

drop cells [PR95, Rom93, AM94J. It is still an open issue on how to optimize the

overall performance of a system in practice.

• Studies of an ATM WAN may bring interesting findings. Our performance test on

the ATM network is restricted to a LAN because of our limited resource. Since ATM

is intended to carry data traffics for both LAN and WAN. An ATM network of large

scale should undergo thorough study when it becomes available.

• Message multi-casting on ATM is another interesting research topic. When a <Us-


tributed database topologically scale up, many messages would be sent to large num-

ber of remote sites simultaneously. How well the switching mechanism of ATM could
support group communication of this kind is now an open question.

Acknowledgement

TIle authors would like to thank Professor Douglas E. Comer for providing support for our

experiments. The authors are also very grateful to John Chueng-Hsien Lin for answering

many of our questions. Melliyal Annamalai selected NASA image files for our experiments.

Vue Zhuge at Stanford helped us with the WAN experiments. We would like to acknowledge

Rachel Ramadhyani for her input regarding the presentation of this paper.

34
References

[AM94] Aboul-Magd. TCP Performance in ATM Networks Employing End-ta-End Flow Con-
trol. Technical Reporl 94-0442, ATM Forum Contribution, May 1994.

[BRSga] B. Bhargava and J. Riedl. A Model for Adaptable Systems for Transaction Processing.
IEEE Transacllons on [(nowledge and Data Engineering, 1(4), December 1989.

[BR89b] B. Bhargava and J. Riedl. The RAID Distributed Database System. IEEE Transactions
on Softwarc Engineering, 1.5(6), June 1989.

[BZM91] B. Bhargava, Y. Zhang, and E. Mafia. Evolution of Communication System for


Distributed Transaction Processing in Raid. USENIX Journal Computing Systems,
4(3):277-313, Summer 1991.

[CheSS] D. R. Cheriton. The V distributed system. Comlllunications 0/ the ACM, 31(3), March
1988.

[CJRS89] D. D. Clark, V. ,Jacobson, J. Romkey, and H. Salwen. An Analysis ofTCP Processing


Overhead. iEEE Communication Magazine, 27(6):23-29, June 1989.

[CL94] D. E. Comer and J. C. H. Lin. TCP Buffering and Performance over an ATM Network.
Journal of internetworking: Research and Expen'ence, 10(4):70-80, October 1994.

[Cla82] D. D. Clark. Window and Acknoledgement Strategy in TCP. Request for Comments,
(RFC-813), July 1982.

[Com91] D. E. Comer. lntcrllctworking with TCP/IP Vol I: Principles, Protocols, and Architec-
ture, volume I. Prentice Hall, Inc, Englewood Cliffs, NJ, second edition, 1991.

[CS92] J. D. Cavanaugh and T. J. Salo. Inlernetworking with ATM WANs. Technical report,
Minnesota Supersomputer Center, Inc, December 1992.

[GoI92] R. A. Golding. End-ta-end performance prediction for the Internet. Technical Report
UCSC-CRL-92-26, University of California at Santa Cruz, June 1992.

[JacS8] V. JacolJson. Congestion Avoidance and Control. In Proceedings of ACM 810-


COMM'SS, pages 314-328, August 1985.

(Kun92] H. T. Kung. GigalJit Local Area Networks: A Systems Perspective. IEEE Communica-
tion Magazine, 30(4):79-89, April 1992.

35
[KW95] B. G. Kim and P. Wang. ATM Network: Goals and Challenges. Communications of
the ACM, 38(2):39-44, February 1995.

[LMK90] S. J. Leffler, M. K. McKusick, and M. J. Karels. The Design and Implementation of the
4.SBSD UNIX Operating System. Addison-Wesley, Inc, Reading, MA, 1990.

[MB9l] 1. E. Mafia and B. Bhargava. Communication facilities for distributed transaction


processing systems. IEEE Computer, pages 61-66, August 1991.

[Nag8'1] J. Nagle. Congestion Control in IP ITCP Internetworks. Request for Comments, (RFC-
896), January 1984.

[OCD+S8) J. K. Ousterhout, A. R. Cherenson, F. Douglis, M. N. Nelson, and B. B. Welch. The


Sprite network operating system. IEEE Computer, February 1988.

(PHOR90] L. Peterson, N. Hutchinson, S. O'Malley, and H. Rao. The x-kernel: A Platform for
Accessing Internet Resources. iEEE Computer, 23(5):23-33, May 1990.

(PKL91] C. Pu, F. Korz, and R. C. Lehman. A Measurement Methodology for Wide Area
lnternets. Technical Report CUCS-044-90, Columbia University, March 1991.

[PosSO] J. B. Postel. User Datagram Protocol. Request for Commenls, (RFC-768), August 1980.

[PosSl] J. B. Postel. Transmission Control Protocol. Request for Comments, (RFC-793),


September 1981.

[PR95] M. PerioIT and K. Reiss. Improvements to TCP Performance in High-speed ATM Net-
works. Communications of the ACM, 3S(2):90-100, February 1995.

(PSC81] J. B. Postel, C. A. Sunshine, and D. Cohen. The ARPA Internet Protocol. Computer
Nflworks, 5(4):261-271, July 1981.

[Ras86] R. F. Rashid. Threads ora New System. Unix Rcvic.lV, '1(8):37----49, August 1986.

[Rom93] A. Romanow. Preliminary Report of Performance Results for TCP over ATM with
Congestion. Technical report, July 1993.

[SSU91] A. Silberschatz, M. Stonebraker, and J. Ullman. Database Systems: Achievements and


Opportunities. Communication of ACM, 34{1O):llO-120, 1991.

[Ste901 W. R. Stevens. Unix Network Programming. Prentice-Hall, Inc., 1990.

36
[TRvS+90] A. S. Tanenbaum, R. V. Renesse, H. van Staveren, G. J. Sharp, S. J. Mullender,
J. Jansen, and G. van Rossum. Experiences with the Amoeba Distributed Operating
System. Communications of the ACM, 33(12):46-63, December 1990.

[Vet95] R. J. Vetter. ATM Concepts, Architectures and Protocols. Communications of the


ACM, 38(2):30-38, February 1995.

[ZB93] Y. Zhang and B. Bhargava. WANCE: A Wide Area Network Communication Emulation
System. In Proceedings of the IEEE Workshop on Advances in Parallel and Distributed
Systems, pages 40-45, Princeton, New Jersey, October 1993.

37

You might also like