0% found this document useful (0 votes)

68 views

Accepted Manuscript: 10.1016/j.jnca.2017.12.016

This document presents a new Ethernet Congestion Control Protocol (ECCP) that uses commodity Ethernet switches as a router's switch fabric. ECCP aims to maintain low queue lengths and latency in Ethernet networks by controlling transmission rates based on available bandwidth estimation before links become saturated. The authors develop a mathematical model of ECCP using delay differential equations and analyze its stability using phase plane methods. Extensive simulations validate that ECCP can avoid congestion while achieving high throughput and minimum latency without requiring switch modifications.

Uploaded by

anup_sky88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views

Accepted Manuscript: 10.1016/j.jnca.2017.12.016

Uploaded by

anup_sky88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Accepted Manuscript

Zero-queue ethernet congestion control protocol based on available bandwidth

estimation

Mahmoud Bahnasy, Halima Elbiaze, Bochra Boughzala

PII: S1084-8045(17)30423-X
DOI: 10.1016/j.jnca.2017.12.016
Reference: YJNCA 2036

To appear in: Journal of Network and Computer Applications

Received Date: 4 January 2017

Revised Date: 9 November 2017
Accepted Date: 23 December 2017

Please cite this article as: Bahnasy, M., Elbiaze, H., Boughzala, B., Zero-queue ethernet congestion
control protocol based on available bandwidth estimation, Journal of Network and Computer
Applications (2018), doi: 10.1016/j.jnca.2017.12.016.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

Zero-queue Ethernet Congestion Control Protocol based

on available bandwidth estimation

PT
Mahmoud Bahnasya , Halima Elbiazeb , Bochra Boughzalac
a École de Technologie Supérieure, Montréal, Canada
b Université du Québec à Montréal, Canada

RI
c Ericsson Research, Canada

SC
Abstract

Router’s switch fabric has strict characteristics in terms of packet loss, latency,

U
fairness and head-of-line (HOL) blocking. Network manufacturers address these

AN
requirements using specialized, proprietary and highly expensive switches. Si-
multaneously, IEEE introduces Data Center Bridging (DCB) as an enhancement
to existing Ethernet bridge specifications which include technological enhance-
M
ments addressing packet loss, HOL blocking and latency issues. Motivated by
DCB enhancements, we investigate the possibility of using Ethernet commodity
D

switches as a switch fabric for routers. Thereby, we present Ethernet Congestion

Control Protocol (ECCP) that uses Ethernet commodity switches to achieves
TE

flexible and cost-efficient switch fabric, and fulfills the strict router character-
istics. Furthermore, we present a mathematical model of ECCP using Delay
Differential Equations (DDEs), and analyze its stability using the phase plane
EP

method. We deduced the sufficient conditions of the stability of ECCP that

could be used for parameter setting properly. We also discovered that the sta-
C

bility of ECCP is mainly ensured by the sliding mode motion, causing ECCP
to keep cross traffic close to the maximum link capacity and queue length close
AC

to zero. Extensive simulation scenarios are driven to validate the analytical re-
sults of ECCP behavior. Our analysis shows that ECCP is practical in avoiding

Email addresses: [email protected] (Mahmoud Bahnasy),

[email protected] (Halima Elbiaze), [email protected]
(Bochra Boughzala)

Preprint submitted to Journal of LATEX Templates December 26, 2017

ACCEPTED MANUSCRIPT

congestion and achieving minimum network latency. Moreover, to verify the

performance of ECCP in real networks, we conducted a testbed implementation
for ECCP using Linux machines and a 10-Gbps switch.

PT
Keywords: Data Center Bridging, Congestion Control, Congestion
Prevention, Priority-based Flow Control PFC, Quantized Congestion

RI
Notification QCN, Ethernet Congestion Control Protocol ECCP

SC
1. Introduction

Router’s switch fabric is an essential technology that is traditionally ad-

U
dressed using custom Application-Specific Integrated Circuit (ASIC). This ASIC
must fulfill particular characteristics including low packet loss, fairness between
5
AN
flows, and low latency [1]. The emergence of very-high-speed serial interfaces
and new router’s architectures increase the design and manufacturing cost of the
switch fabric chipset. Traditionally, switch fabric is manufactured using either
M
shared memory or crossbar switch as shown in Fig. 1a and Fig. 1b respectively.
The shared memory architecture requires memory that works N times faster
D

10 than port speed, where N is the number of ports which raises scalability issue.
On the other hand, crossbar architecture tries to keep the buffering at the edge
TE

of the router (Virtual Output Queue VOQ inside line cards). Because this ar-
chitecture requires N VOQs at each ingress port and a central unit (arbiter), it
EP

faces scalability issue [2].

Terminal Terminal
Interfaces Interfaces
Line Card
Line Card

Line Card
C
Line Card
Line Card

Line Card

... .. .
AC

shared memory Crossbar switch Arbiter

(a) Shared-memory-based switch fab- (b) Crossbar-based switch fabric archi-

ric architecture tecture

Figure 1: Router’s switch fabric architectures

2
ACCEPTED MANUSCRIPT

15 In this research, we introduce a new router architecture that uses Ethernet

commodity switches as a switch fabric. In this architecture, we keep all buffering
at the edge of the router and an Ethernet switch is used as a switch fabric. IEEE

PT
has recently presented Data Center Bridging (DCB) [3] that comprises several
enhancements to Ethernet network. However, Ethernet network still suffers

RI
20 from HOL blocking, congestion spreading and high latency. To overcome these
limitations and achieve a non-blocking switch fabric, we present Ethernet Con-

SC
gestion Control Protocol (ECCP) that maintains Ethernet network non-blocked
by preserving switches’ queue lengths close to zero leading to minimum latency
and no HOL blocking. Unlike traditional Congestion control mechanisms that

U
25 use packet accumulation in buffers to trigger the rate control process, ECCP
estimates available bandwidth and uses this information to control transmission
AN
rates before link saturation or data accumulation. Accordingly, it achieves min-
imum latency by trading off a small margin of link capacity. Therefore, ECCP
M
achieves (i) low queue length, (ii) low latency, and (iii) high throughput, (iv)
30 with no switch modification. Such a mechanism could be used in manufactur-
ing a cost-efficient routers’ switch fabric while guaranteeing traditional router
D

characteristics. Besides, it can be utilized as a reliable and robust layer 2 con-

gestion control mechanism for data center applications (e.g. high-performance

computing [4], remote direct memory access (RDMA) [5], and Fibre Channel
35 over Ethernet (FCoE) [6]).
EP

Furthermore, we introduce a mathematical model of ECCP while using the

phase plane method. First, we build a fluid-flow model for ECCP to derive
the delay differential equations (DDEs) that represent ECCP. Then, we sketch
C

the phase trajectories of the rate increase and rate decrease subsystems. Con-
AC

40 sequently, we combine these phase trajectories to understand the transition

between ECCP’s subsystems and to obtain the phase trajectory of the global
ECCP system. Subsequently, the stability of ECCP is analyzed based on this
phase trajectory. Our analysis reveals that the stability of ECCP depends
mainly on the sliding mode motion [7]. Thereafter, we deduce stability con-
45 ditions that assist in defining proper parameters for ECCP. Besides, several

3
ACCEPTED MANUSCRIPT

simulations are conducted using OMNEST [8] to verify our mathematical anal-
ysis. Finally, a Linux-based implementation of ECCP is conducted to verify
ECCP’s performance through experiment.

PT
The rest of this paper is organized as follows. Related work is introduced in
50 Section 2. Section 3 presents ECCP mechanism. Section 4 introduces the phase

RI
plane analysis method in brief. The mathematical model of ECCP is derived
in Section 5. The stability analysis of ECCP is deduced in Section 6. Linux-

SC
based implementation is presented in Section 7. Finally, Section 8 introduces
conclusion and future work.

U
55 2. RELATED WORK

AN
In this section, we present some research work that is closely related to con-
gestion control in both Ethernet layer and Transmission Control Protocol (TCP)
layer. IEEE has recently presented Data Center Bridging (DCB) [3] that com-
M
prise several enhancements for Ethernet network to create a consolidation of I/O
60 connectivity through data centers. DCB aims to eliminate packet loss due to
D

queue overflow. Ethernet PAUSE IEEE 802.3x and Priority-based Flow Control
(PFC) [9] are presented in DCB as link level (hop-by-hop) mechanisms. Ether-
TE

net PAUSE was issued to solve packet loss problem by sending a PAUSE request
to the sender when the receiver buffer reaches a certain threshold. Thus, the
EP

65 sender stops sending data until a local timer expires or a resume notification is
received from the receiver. PFC divides data path into eight traffic classes, each
could be controlled individually. Yet, PFC is still limited because it operates on
C

port plus priority level which can cause congestion-spreading and HOL blocking
[9, 10].
AC

70 Quantized Congestion Notification (QCN) [11, 12] is an end-to-end control

mechanism which is standardized in IEEE 802.1 Qau [11]. QCN aims to keep
queue length at a predefined level called equilibrium queue length (Q eq ). QCN
consists of two parts, (i) Congestion Point (CP) (in bridges) and (ii) Reaction
Point (RP) (in hosts) (Fig. 2). The CP measures queue length (Q), and cal-

4
ACCEPTED MANUSCRIPT

CP
Q Qeq
Data Frames sampling

PT
CNM Frames
Fb = -((Q - Qeq) + w (Q - Qold))

RI
Figure 2: QCN framework: CP in the bridge, and RP in the host’s NIC

SC
75 culates a feedback (Fb) value, in a probabilistic manner, to reflect congestion
severity (Equation 1).

U
Fb = −((Q − Q eq ) + w × (Q − Q ol d )). (1)
AN
Where Q ol d is the previous queue length and w is a constant which equals 2
(for more details refer to [11]). If the calculated Fb is negative, CP creates a
M
Congestion Notification Message (CNM) and sends it to the RP.
80 QCN reduces the overhead of control information traffic and reduces the
required computational power by calculating Fb in a probabilistic manner. At
D

the end host, when RP receives CNM, it decreases its transmission rate accord-
TE

ingly. If no CNM is received, the RP increases its transmission rate according

to a three-phase rate increase algorithm [11].
85 Due to the probabilistic manner of calculating Fb, QCN experiences several
EP

issues regarding fairness [13, 14] and queue length fluctuation [15]. In addition,
QCN does not achieve minimum latency as it keeps queue length at a certain
level (Q eq ).
C

Several research papers have discussed various enhancements for QCN. For
AC

90 example, [15] presents the use of delay variation as an indication of congestion

to address queue fluctuation issue. Other studies like [13, 14] addressed QCN
fairness issue by using new Active Queue Management (AQM) [16] algorithms
that are capable of identifying the culprit flows. Thus, they send CNMs for each
culprit flow. These techniques achieve fairness but they are implemented in the
95 switch which we aim to avoid.

5
ACCEPTED MANUSCRIPT

Data Center TCP (DCTCP) [17] uses switches that support Explicit Con-
gestion Notification (ECN) to mark packets that arrive while queue length is
greater than a predefined threshold. DCTCP source reacts by reducing the

PT
window proportionally to the fraction of marked packets. Data Center QCN
100 (DCQCN) [18] combines the characteristics of Data Center TCP (DCTCP) [17]

RI
and QCN in order to achieve QCN-like behavior while using the ECN mark-
ing feature. DCQCN requires very strict parameters selection regarding byte

SC
counter and marking probability.
Trading a little bandwidth to achieve low queue length and low latency is dis-
105 cussed in a number of papers. For example, HULL (High-bandwidth Ultra-Low

U
Latency) is presented in [19] to reduce average and tail latencies in data centers
by sacrificing a small amount of bandwidth (e.g., 10%). HULL presents the
AN
Phantom Queue (PQ) as a new marking algorithm. Phantom queues simulate
draining data at a fraction (< 1) of link rate. This process generates a virtual
M
110 backlog that is used to mark data packets before congestion. The challenges of
HULL are the needs of switch modification.
TIMELY [20] is a congestion control scheme for data centers. It uses the
D

deviation of Round-Trip Time (RTT) to identify congestion, instead of ECN

marking in DCTCP. TIMELY can significantly reduce queuing delay and it

115 would be interesting to compare ECCP and TIMELY in future work.
Enhanced Forward Explicit Congestion Notification (E-FECN) [21] and proac-
EP

tive congestion control algorithm (PERC) [22] are presented as congestion con-
trol mechanisms that exploit the measured available bandwidth to control data
rates. However, these two methods require switch modifications which we aim
C

120 to avoid.
AC

Few centralized solutions are proposed in the literature. For example, Fast-
pass [23] embraces central control for every packet transmission which raises a
scalability issue.
Another approach to enhance the performance of TCP protocol was to dis-
125 tinguish between congestive packet loss and non-congestive packet loss [24, 25].
Therefore, the TCP congestion avoidance algorithm could be activated only

6
ACCEPTED MANUSCRIPT

when congestive packet loss is detected. E.g., TCP INVS [24] estimates network
queue length and compare this estimation to a threshold. If the estimated queue
length exceeds the threshold, the loss is caused by congestion. Consequently,

PT
130 TCP INVS activate the traditional congestion avoidance algorithm. Otherwise,
the loss is considered as a non-congestion loss and TCP INVS ignores it and

RI
avoids limiting congestion window growth. In addition, [25] proposes an RTT es-
timation algorithm using Autoregressive Integrated Moving Average (ARIMA)

SC
model. By analyzing the estimated RTT, one can estimates the sharp and sud-
135 den changes in the RTT, thereby differentiating the non-congestive packet loss
from congestive packet loss. While these mechanisms acheive better through-

U
put on lossy networks, it introduces an extra packet loss that is not suitable for
router switch fabric or data center network.
AN
Optimizing the routing decision to control the congestion is also proposed in
140 several research papers. Most of this research follows a key idea called the back-
M
pressure algorithm [26] where traffic is directed around a queuing network to
achieve maximum network throughput. An example of this scheme is presented
in [27] where the authors developed a second-order joint congestion control and
D

routing optimization framework that aims to offer resource optimization and

145 fast convergence. Such a scheme can significantly reduce queuing delay and it
would be interesting to investigate this scheme in future work.
EP

3. ECCP : Ethernet Congestion Control Protocol

In this section, we present ECCP as a distributed congestion prevention

algorithm that works on Ethernet layer. ECCP controls data traffic according
to the estimate Available Bandwidth (AvBw) through a network path. ECCP
AC

150

strives to keep link occupancy less than the maximum capacity by a percentage
called Availability Threshold (AvT). Traditional congestion control mechanisms
aim to keeps the queue around a target level. These mechanisms can reduce
queuing latency, but they cannot eliminate it. In these mechanisms, a non-zero
155 queue must be observed before reaction, and sources need one RTT to react

7
ACCEPTED MANUSCRIPT

to this observation, which causes data accumulation in queues even further.

On the other hand, ECCP uses AvBw as a congestion signal to trigger sources
reaction before data accumulation in the queue. Therefore, ECCP achieves a

PT
close-to-zero queue length, leading to minimum network latency.
160 ECCP estimates AvBw through network path by sending trains of probe

RI
frames periodically through this path. Sender adds sending time and other
information as train identifier and sequence number within the train to each

SC
probe frame. On the receiver side, ECCP receives these frames and estimates
AvBw using a modified version of Bandwidth Available in Real-Time (BART)
165 [28]. Afterward, ECCP transmits this information back to the sender. At

U
the sender side, ECCP controls transmission rate based on the received AvBw
value. ECCP advocates rate-based control schemes instead of window-based
AN
control schemes because window-based schemes encounter significant challenges
particularly with the rapid increase of the control cycle time, defined mainly
M
170 by propagation delay, compared to transmission time in modern networks [29].
In addition, [30] and [31] state that at high line rates, queue size fluctuations
become fast and difficult to control because queuing delay is shorter than the
D

control loop delay. Thus, rate based control schemes are more reliable.
TE

3.1. ECCP components

175 In this section, ECCP architecture is described in detailed and the inter-
EP

actions between its components are explained. ECCP prevents congestion by

keeping a percentage of the link capacity available called Availability Threshold
( AvT ). Thus, for any link of maximum capacity C, AvT creates a bandwidth
C

stability margin equals AvT × C. This bandwidth stability margin allows ECCP
AC

180 to send probe traffic without causing queue accumulation. ECCP does not re-
quire switch modification because all its functionalities are implemented inside
line cards or hosts.
Fig. 3 depicts ECCP components1 : (1) probe sender , (2) probe re-

1 The rate limiter in Fig. 3 is outside the scope of this paper

8
ACCEPTED MANUSCRIPT

ceiver , (3) bandwidth estimator , and (4) rate controller . These modules
are implemented in every line card in the router or every host.

PT
Rate Limiter
Probe
Probe
Rate Limiter Receiver

RI
Sender
Network Queue
Rate Bandwidth
Controller Estimator

SC
Data Frame Data Path
Probe Frame Probe frames Path
AvBw Estimation Frame Control Path

U
Figure 3: ECCP components
185

3.1.1. ECCP probe sender

AN
ECCP control cycle starts with the probe sender . This module generates
M
probe trains each of size N frames. Thereupon, it sends them through the
network towards destination host. By the time they leave the source, each
D

190 probe frame is tagged with a sending time. Other information is added to the
probes such as sequence number and train identifier. ECCP probe sender
TE

sends probe traffic of a uniformly distributed random rate µ between a fixed

minimum value and R × AvT. ECCP is not trying to estimate an exact value for
AvBw. Instead, it only estimates AvBw value within a maximum limit equals
EP

195 R × AvT. Thus, ECCP gets enough information to control (decrease or increase)
data rate while limiting probe rate to R × AvT. Hence, probe traffic for M flows
C

crossing one link (M × R × AvT ) never exceeds link bandwidth stability margin
( AvT × C).
AC

3.1.2. ECCP probe receiver

200 The probe receiver captures probe frames, retrieves probe information,
and adds receiving time for each probe frame. Then, ECCP probe receiver
forwards each probe train information to ECCP bandwidth estimator for
additional processing.

9
ACCEPTED MANUSCRIPT

3.1.3. ECCP bandwidth estimator

205 The bandwidth estimator estimates AvBw using a modified version of

PT
BART which is based on a self-induced congestion model. In this model, when
∆in
ne
t li

- Δin
ai gh
Str

Δin
∆out

ε = Δout
β

RI
Probin AvBw=μ α μ+
g ra
te μ ε=
Probe Sender Probing rate μ
0
μ<AvBw μ>AvBw
No Congestion Network Congested probing
d

SC
e ive e r Probe Receiver
c ra t
Re

Network Queue

U
Data Receiver

Data Sender AN
Probe Traffic Data Traffic

Figure 4: The effect of injecting probe traffic into network

M
probe traffic of rate µ and fixed inter-frame intervals ∆in is inserted along a
network path, the received inter-frame interval ∆out is affected by the network
D

state such that; if µ is greater than AvBw, network queues start accumulating
TE

210 data which increases ∆out . Otherwise, ∆out will be, in average, equal to ∆in
(Fig. 4). This model does not require clock synchronization between hosts.
Rather, it uses the relative queuing delay between probe frames.
EP

BART derives a new metric to define the change of the inter-frame time
called strain = (∆out − ∆in )/∆in . For probe rate µ that is less than AvBw, the
215 strain will be, on average, equal to zero ( ≈ 0). Otherwise, the strain increases
C

proportionally to the probe rate µ (Fig. 4). This linear relation between strain
AC

and probe rate µ is represented using (2).

if µ ≤ AvBw

0


= (2)
 α µ + β if µ > AvBw.



Based on this linear relationship between strain and probe rate µ, ECCP
estimates AvBw as the maximum probe rate that keeps the strain equal to

10
ACCEPTED MANUSCRIPT

220 zero (α × AvBw + β = 0) as in (3) (for more details see [32]).

AvBw = − β/α. (3)

PT
For that purpose, the bandwidth estimator calculates the strain i for each
probe pair {i = 1, . . . , N − 1}. Then, the calculated average and its variance R

RI
are forwarded to Kalman Filter (KF). In addition, an estimation of the system
noise covariance Q and measurement error P are provided. Kalman filter works

SC
225 on continuous linear systems while this model has a discontinuity separating
two linear segments as shown in Fig. 4. Thus, BART ignores the probe rates µ
that are not on the horizontal line where µ is less than the last estimated AvBw

U
(µ < AvBw). Unlike BART, ECCP does not ignore probe train information
that is not on the straight line. Instead, it uses that probe rate µ to provide an
230
AN
estimation of AvBw using (4) (for more details see [32]).

if < t

 max(µ j )
M


AvBw =  (4)
 KF( t ,Q, P) if ≥ t



where j is the probe train number and t is the strain threshold that identifies
D

the starting point of the straight line. After that, Kalman filter calculates α and
TE

β parameters. As such, AvBw is calculated using (4). Afterward, bandwidth

estimator sends estimated AvBw back to the source in a CNM message.
EP

235 3.1.4. ECCP rate controller

ECCP rate controller is a key component of ECCP mechanism. It controls
the transmission rate R according to Additive Increase Multiplicative Decrease
C

(AIMD) model after receiving AvBw value. Based on the received estimated
AC

AvBw, ECCP rate controller calculates available bandwidth ratio Ar accord-

240 ing to (5). Ar represents the ratio of the available bandwidth to the bandwidth
stability margin (R × AvT ) (Fig. 5).

AvBw
Ar = . (5)
R × AvT

11
ACCEPTED MANUSCRIPT

Bandwidth

AvT x R
AvBweq
Link Capacity AvBw- AvBweq

PT
AvBw

Aeq

RI Ar- Aeq
Ar

Data Traffic (R)

SC
Time 0

U
Figure 5: Relationship between Av Bw and A r

AN
ECCP works on keeping Ar at an equilibrium level Aeq . Therefore, it calcu-
lates a feedback parameter Fb to represent the severity of the congestion using
M
(6).

Fb = ( Ar − Aeq ) + w × ( Ar − Aol d ) (6)

where Aol d is the previous value of Ar , and w is equal to 2 (similar to QCN) and
TE

it represents a weight for ( Ar − Aol d ); i.e., w makes calculated Fb more sensitive

245 to flows that change their rate aggressively than flows with stable high rates.
Consequently, ECCP uses the calculated Fb to control hosts’ transmission rate.
EP

Furthermore, ECCP rate controller monitors two variables (1) the trans-
mission rate R and (2) the target rate T R. T R is the transmission rate before
congestion and represents an objective rate for the rate increase process. ECCP
C

250 rate controller uses a rate decrease process if the calculated Fb value is neg-
AC

ative otherwise it uses a self-increase process as depicted by (7) (Fig. 6).

 
T R ← R

 

if Fb < 0 (rate decrease process)



 



 R(1 + G d × Fb ) (7)

 

 R ← 1 (R + T R)


2 otherwise (Self-increase process)


12
ACCEPTED MANUSCRIPT

where G d is a fixed value and is taken to make the maximum rate reduction
equals 1/2.

PT
Byte Counter
if Fb < 0 +
Rate R

TR =R Rate -
R = R (1+Gd x Fb) Controller Feed Back

RI
+
R TR Timer

TR = TR + RHAI
TR = TR + RAI

SC
R = 1/2 x (R +TR)
Time

U
AN
Figure 6: ECCP rate control stages
M
Fig. 6 shows the ECCP rate control process in detail. The figure shows that
255 when ECCP calculates a negative Fb, it executes the rate decrease process. In
D

addition, Fig. 6 depicts that ECCP divides the self-increase process into three
stages i) Fast Recovery (FR), ii) Active Increase (AI) and iii) Hyper-Active
TE

Increase (HAI). ECCP determines the increasing stage based on a byte counter
BC and a timer T. The Fast Recovery stage consists of five cycles where each
cycle is defined by sending BC Bytes of data or the expiration of a timer T.
EP

260

The timer defines the end of cycles in case of low rate flows. At each cycle, R is
updated using (7) while keeping T R unchanged. If the byte counter or the timer
C

completes five cycles in FR stage while no negative Fb is calculated, the rate

controller enters Active Increase (AI) stage. In this stage, T R is increased by
AC

265 a predefined value R AI . Moreover, the byte counter and the timer limits are
set to BC/2 and T/2 respectively. Afterward, the rate controller enters the
Hyper-Active Increase (HAI) stage, if both the byte counter and the timer finish
five cycles. In the HAI stage, T R is increased by a predefined value RH AI as

13
ACCEPTED MANUSCRIPT

in (8).

T R + R AI (AI)


TR ←  (8)

PT

T R + RH AI (HAI).


270 Where R AI is the rate increase step in AI stage and RH AI is the rate increase step
in HAI stage. Algorithms 1 and 2 depict ECCP rate decrease and self-increase

RI
processes respectively.

SC
Algorithm 1: ECCP rate decrease process
Input : Available Bandwidth AvBw

Output: New sending rate R

U
1 Ar ← AvBw/R × AvT ;

3 if (Fb < 0) then

AN
Fb ← ( Ar − Aeq ) + w × ( Ar − Aol d ) ;

4 T R ← R;
M
5 R ← R(1 + G d × Fb ) ; /* Rate decrease */

6 Sel f IncreaseStarted ← T RU E ; /* Initialize self-increase */

7 ByteC ycleCnt ← 0 ;

TimeC ycleCnt ← 0 ;
TE

9 ByteCnt ← BC ;

10 Timer ← T ;
EP

11 end
C
AC

14
ACCEPTED MANUSCRIPT

Algorithm 2: ECCP self-increase process

Output: New sending rate R

PT
1 foreach sentFrame do
2 if Sel f IncreaseStarted == T RU E then
3 ByteCnt ← ByteCnt − Byte( f rameSize) ;

RI
4 if (ByteCnt ≤ 0) then
5 ByteC ycleCnt + + ;

SC
6 if (ByteC ycleCnt < 5) then
7 ByteCnt ← BC ;

else

U
8

9 ByteCnt ← BC/2 ;

10 Adjust Rate() ; AN
11 end
M
12 foreach timeout do
13 if Sel f IncreaseStarted == T RU E then
14 TimeC ycleCnt + + ;
D

15 if (ByteC ycleCnt > 5) or (TimeC ycleCnt > 5) then

16 RestartTimer (T/2) ; /* AI or HAI stage */

17 else
18 RestartTimer (T ) ; /* FR stage */
EP

19 Adjust Rate() ;

20 end
C

21
AC

22 AdjustRate():

23 if (ByteC ycleCnt > 5) and (TimeC ycleCnt > 5) then

24 T R ← T R + 5 Mbps ; /* HAI stage */

25 else if (ByteC ycleCnt > 5) or (TimeC ycleCnt > 5) then

26 T R ← T R + 50 Mbps ; /* AI stage */

27 R ← 1/2 × (R + T R);

28 if (R > linkCapacit y) then

15
29 R ← linkCapacit y; Sel f IncreaseStarted ← F ALSE;
ACCEPTED MANUSCRIPT

4. Phase Plane Analysis

In this paper, we use phase plane method to visually represent certain charac-

PT
275 teristics of the differential equation of the ECCP. Phase plane is used to analyze
the behavior of nonlinear systems. The solutions of differential equations are
a set of functions which could be plotted graphically in the phase plane as a

RI
two-dimensional vector field. Given an autonomous system represented by a
differential equation x 00 (t) = f (x(t), x 0 (t)), one can plot the phase trajectory of

SC
280 such a system by following the direction where time increases. Fig. 7a depicts a
system x(t) in time domain, where a phase trajectory of this system is displayed
in Fig. 7b. One can notice that x(t) and x 0 (t) in time domain can be inferred

U
from the phase trajectory plot. Thus, the phase trajectory provides enough

285
AN
information about the behavior of the system. Moreover, sketching phase tra-
jectories is easier than finding an analytical solution of differential equations,
which sometimes is not possible.
M
x(t)
D

x (t)
TE

t
x(t)
EP

(a) The trajectory of x(t) in time do- (b) Phase trajectory of x 0 (t)
C

main versus x(t)

Figure 7: A phase trajectory example

Congestion control schemes in computer networks require different behaviors

for rate increase and rate decrease subsystems. In addition, the congestion state
controls the transition between these subsystems. The phase plane method
290 could link isolated subsystems and present graphically the switching process.

16
ACCEPTED MANUSCRIPT

Thus, using phase plane method is adequate for analyzing segmented systems
like congestion control protocols [33].
In addition, system parameters limitation can be taken into consideration

PT
explicitly. Therefore, we should consider only the phase trajectories that satisfy
295 our system limitations (i.e link capacity and buffer size) even if the system is

RI
stable according to the derived stability conditions.

SC
5. ECCP Modeling

The core element of ECCP is the rate control algorithm. By responding

correctly to the calculated feedback, the network load should remain around

U
300 the target point. For the purpose of simplicity, we made these assumptions:
AN
• All sources are homogeneous, namely they have the same characteristics
such as round-trip time.
M
• Data flows in data center networks have high rates and appear like con-
tinuous flow fluid.
D

305 • Available bandwidth estimation error is negligible (Measured AvBw is used

in the simulation to avoid estimation errors). We leave studying the effect
TE

of available bandwidth estimation error on ECCP stability for future work.

Given the aforementioned assumptions, ECCP can be modeled while calcu-

lating the available bandwidth using (9).

AvBw(t) = C − M × R(t) (9)

310 where AvBw(t) is the available bandwidth at time t, C is the maximum link
AC

capacity, M is the number of flows that share the same bottleneck link, and
R(t) is the host’s transmission rate at time t.
By substituting (9) into (5) we get:

C M
Ar (t) = − (10)
AvT × R(t) AvT
where Ar (t) is the available bandwidth ratio at time t.

17
ACCEPTED MANUSCRIPT

In addition, feedback calculation in (6) becomes:

Fb(t) = Ar (t) − Aeq + w × T × A0r (t − T ) (11)

PT
315 where T is the time interval between trains which defines the control cycle time,
and ( Ar − Aol d ) becomes the derivative of availability ratio A0r multiplied by the

RI
control cycle time T.
Given ECCP rate update equation (7), the derivative of transmission rate

SC
R0 (t) can be represented by the delay differential equation (12).

Gd
 T R(t)Fb(t − τ) if Fb(t − τ) < 0



R (t) = 
0
(12)

U

 T R− R(t )

if Fb(t − τ) ≥ 0
 2 ×T BC
320 AN
where τ is the propagation delay, TBC is the BC counter time.

6. Stability Analysis of ECCP

M
In this section, phase plane is used in studying the stability of ECCP. Phase
plane analysis of ECCP is carried out for the self-increase and rate decrease
D

processes separately. Next, simulation experiments are presented to verify our

325 mathematical analysis.

6.1. Stability Analysis of ECCP Rate Decrease Subsystem

In this section, we analyze ECCP rate decrease subsystem represented by

(12), (11), and (10). For the sake of simplicity and without loss of generality,
we made this linear variable substitution.
C

= Ar (t) − Aeq

 y(t)


AC

 (13)
 y 0 (t) = A0r (t).



Thus, from (10) we get:

C M
y(t) = − − Aeq
AvT × R(t) AvT

18
ACCEPTED MANUSCRIPT

Let ζ = M
AvT + Aeq , we get:
C
y(t) = −ζ
AvT × R(t)

PT
C
R(t) =
AvT × (y(t) + ζ )
C
R0 (t) = − y 0 (t). (14)
AvT × (y(t) + ζ ) 2

RI
The feedback equation could be represented by substituting (13) in (11):

Fb(t) = y(t) + w × T y 0 (t − T ). (15)

SC
Substituting (14) and (15) into the rate decrease part of (12), we get the
rate decrease subsystem equation (16).

U
−C Gd C
y 0 (t) = Fb(t − τ)( )
AvT (y(t) + ζ ) 2 T AvT (y(t) + ζ )
−1
(y(t) + ζ )
AN
y 0 (t) =
Gd
T
Fb(t − τ)
Gd
−y 0 (t) = Fb(t − τ)(y(t) + ζ ). (16)
T
M
Thus, ECCP rate decrease subsystem could be represented by substituting
(15) into (16).
D

Gd
−y 0 (t) = y(t − τ) + w × T × y 0 (t − T − τ) (y(t) + ζ ). (17)
T
TE

Based on (17), we can state this lemma.

Lemma 1. ECCP rate decrease subsystem is stable when (18) is satisfied.

s
1 2 1 p
τ/T < min w − + 2w 2 − + 4w, w + , w + w 2 + 2w . (18)
Gd ζ (G d ζ ) 2 Gd ζ
C

330 For proof, review Appendix A.

6.2. Stability Analysis of ECCP Rate Increase Subsystem

The self-increase subsystem behavior can be summarized as follows: The
stability of ECCP system mainly depends on the sliding mode motion [7] from
self-increase subsystem into the rate decrease subsystem when the system crosses
335 the asymptotic line (Fb = 0). Thus, the ECCP system is asymptotically stable
when inequality (18) is satisfied. For proof, review Appendix B.

19
ACCEPTED MANUSCRIPT

6.3. Verification of ECCP’s stability conditions using simulation

In this section, we use discrete-event simulation to verify the mathemati-

PT
cal analysis of ECCP. Using OMNEST network simulation framework [8], we
340 simulate a dumbbell topology of four data sources and four receivers connected
to two 10-Gbps switches as shown in Fig. 8. All links in this topology have

RI
a maximum capacity of 10 Gbps. We consider the worst case which happens
when all sources send with their maximum link capacity. Thus, we have four

SC
data sources that send data at maximum line capacity (10 Gbps) toward four
345 receivers through one bottleneck link (Fig. 8). Table 1 depicts the simulation
parameters.

U
Source 0 AN Receiver 0

ps
10

Gb
Gb

10
10 G
bps ps Switch 0 Switch 1 b ps
Bottleneck Link 10 G
Source 1 Receiver 1
M
ps 10 G
10 Gb bps
10 Gbps
ps 10
Source 2 Gb G
bp Receiver 2
10 s
D

Source 3 Flow 0 Flow 1 Flow 2 Flow 3 Receiver 3

Figure 8: Simulation topology

Based on ECCP parameters that are shown in Table 1 and inequality (18),
EP

ECCP is stable for all τ < 1.482 T. Fig. 9 shows a box plot of cross traf-
fic. It depicts that ECCP system reduces cross traffic rate to a value lower
C

350 than its minimum limit ((1 − AvT ) × C = 9 Gbps), when τ exceeds the analyt-
ically calculated limit (1.482 T ). In addition, Fig. 9 clearly shows that when
AC

τ = 1.8 ms > 1.482 T, the variation of cross traffic exceeds the maximum al-
lowed margin (AvT × C = 1 Gbps). One can notice that when τ = 3.3 T the
average cross traffic starts to increase again. The reason behind that is the data
355 accumulation in the queue as shown in Fig. 10b.
Fig. 10 depicts the queue length while varying the propagation delay (τ =

20
ACCEPTED MANUSCRIPT

Stability Condition Separation Line

9.5
Cross Traffic Rate (Gbps)
9

PT
8.5

7.5

RI
7

6.5

SC
6
0.3 ms (0.3T) 0.6 ms (0.6T) 1.2 ms (1.2T) 1.8 ms (1.8T) 2.4 ms (2.4T) 3.3 ms (3.3T)
Propagation Delay (ms)

Figure 9: Box plot of the cross traffic rate

U
300
=300 s =600 s =1.2ms
250 AN
Queue Length (KB)

200

150
M
100

50
D

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec)
TE

(a) τ = 0.3 T, 0.6 T, & 1.2 T

300
=1.8ms =2.4ms =3.3ms
250
Queue Length (KB)

200

150

100
C

50
AC

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec)

(b) τ = 1.8 T, 2.4 T, & 3.3 T

Figure 10: Queue length

21
ACCEPTED MANUSCRIPT

0.3 T, 0.6 T, 1.2 T, 1.8 T, 2.4 T, & 3.3 T). Fig. 10a shows that if the stability
conditions are satisfied (τ < 1.482T), ECCP system succeeds in maintaining a
close-to-zero queue length. Otherwise, data start to accumulate and the queue

PT
360 fluctuates significantly as shown in Fig. 10b.
1

RI
0.95
0.8
0.9

SC
Propability

0.6 0.85

0.8

0.4
0.75

U
0.2 0.7

6 6.5 7 7.5
0.3 ms 0.6 ms 1.2 ms 1.8 ms 2.4 ms 3.3 ms data1 data2
0
0 50
AN
100 150
Queue Length (KB)
200 250

Figure 11: CDF of queue length

M
Fig. 11 illustrates the cumulative distribution function (CDF) of the queue
D

length. It shows that when stability conditions are satisfied and τ = 0.3 T, 0.6 T, 1.2 T,
99-percentile of queue length are less than 6.72 KB, 6.78 KB and 21.9 KB re-
TE

spectively. But when these conditions are violated, 99-percentile of queue length
365 reach up to 294.4 KB.
Fig. 12 depicts the transmission rates while varying the propagation delay.
EP

It shows that as long as τ does not exceed the stability limit (1.482 T), ECCP
system achieves fairness between flows.
C
AC

22
ACCEPTED MANUSCRIPT

PT
RI
10 10
Transmission Rate (Gbps)

Transmission Rate (Gbps)

SC
Host 0 Host 1 Host 2 Host 3 Host 0 Host 1 Host 2 Host 3
8 8

6 6

4 4

2 2

U
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec) Time (sec)

10
(a) τ = 300 µs (0.3 T )
AN 10
(b) τ = 600 µs (0.6 T )
Transmission Rate (Gbps)

Transmission Rate (Gbps)

Host 0 Host 1 Host 2 Host 3 Host 0 Host 1 Host 2 Host 3

8 8

6 6

4 4
M
2 2

0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec) Time (sec)
D

(c) τ = 1.2 ms(1.2 T ) (d) τ = 1.8 ms(1.8 T )

10 10
Transmission Rate (Gbps)

Transmission Rate (Gbps)

Host 0 Host 1 Host 2 Host 3 Host 0 Host 1 Host 2 Host 3

8 8
TE

6 6

4 4

2 2

0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
EP

Time (sec) Time (sec)

(e) τ = 2.4 ms (2.4 T ) (f) τ = 3.3 ms(3.3 T )

Figure 12: Transmission rates

C
AC

23
ACCEPTED MANUSCRIPT

Table 1: Simulation parameters

Data Senders Parameters

PT
Frame size Normal distribution

(avg = 1000, σ = 150)

RI
Min Frame size 200 Byte

SC
Max Frame size 1500 Byte

Propagation Delay τ 40 µsec

U
ECCP Probing Parameters

Number of probe frames N = 33

Size of probe frames

AN 1500 Byte

Time between each train 1 ms

M
Available Threshold AvT = 0.1 (10%)

Minimum probe rate 50 Mbps

Maximum probe rate AvT × R

 
0.00001 0.0 
System noise Q =  

 
0.0 0.01
EP

 
1.0 0.0 
Measurement error P =  

 
0.0 100.0
C

ECCP Controller Parameters

Equilibrium available bandwidth ratio Aeq = 0.5 (50%)

Rate control timer T = 5 ms

Rate control byte counter BC = 750 KByte

Gd G d = 100/128

Rate increase step in AI stage R AI = 5 Mbps

Rate increase step in HAI stage24 RH AI = 50 Mbps

ACCEPTED MANUSCRIPT

6.4. Boundary limitations

370 In our stability analysis, we have deduced sufficient stability conditions of

PT
the core mechanism of ECCP. However, ECCP system is also constrained by
physical boundaries such as the maximum link capacity and buffer size. For
example, when the ECCP system reaches the equilibrium point, hosts keep

RI
increasing their data rates until calculating a positive Fb. Thus, cross traffic
375 might reach the maximum limit and data starts to be queued in the system. In

SC
order to avoid this, the integral of the self-increase function from t to (t+(T +2τ))
must be less than the available bandwidth margin (AvT × Aeq ×C), where (T +2τ)
is the control cycle time. The boundary limitation of ECCP queue system is

U
summarized by the following lemma.

380 AN
Lemma 2. ECCP keeps queue length close to zero, thereby ensuring minimum
network latency and preventing congestion if inequality (19) is satisfied.
M
C(T + 2τ)
BC > . (19)
2M
D

The proof of lemma 2 is presented in Appendix C

6.5. Verification of ECCP’s boundary limitations using simulation

Based on ECCP parameters shown in table 1 and inequality (19), ECCP

385 is capable of keeping queue length close to zero when BC > 500 K B. ECCP
EP

is simulated to verify the analytical model. Fig. 13, 14, 15 and 16 depict the
simulation results while varying the byte counter (BC = 150 KB, 450 KB, 600
KB, and 750 KB). In addition, Fig. 13 shows that when inequality (19) is not
C

satisfied (BC < 500 K B), ECCP system becomes unstable and the cross traffic
AC

390 variation exceeds ( AvT × C) limit (1 Gbps). It is clearly shown that reducing
BC decreases the average cross traffic rate and increases its variation. One can
notice that at BC = 150K B, the average cross traffic rate starts to increase again
which is a result of data accumulation in the bottleneck link queue as shown in
Fig. 14. Besides, Fig. 14 depicts that when byte counter does not satisfy the
395 analytically calculated limit BC < 500 K B, the queue starts accumulating data.

25
ACCEPTED MANUSCRIPT

Stability Condition Separation Line

Cross Traffic Rate (Gbps)
9.5

PT
9

RI
8.5

SC
8
150 KB 450 KB 600 KB 750 KB
Byte Conter BC (KB)

Figure 13: Cross traffic statistics

U
In contrast, when byte counter limit is satisfied BC > 500 K B, ECCP succeeded AN
in maintaining a close-to-zero queue length.
300
BC=150 KB BC=450 KB BC=600 KB BC=750 KB
M
250
Queue Length (KB)

200
D

150

100
TE

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec)
EP

Figure 14: Queue length

Fig. 15 shows the CDF of the queue length. It depicts that when BC is
C

equal to 750 and 600 KB, 99-percentile of queue length are less than 6.9 KB
AC

400 and 6.8 KB respectively. But when inequality (19) is not satisfied where BC <
500 KB, 99-percentile of queue length reaches up to 299 KB.
Fig. 16 depicts the effect of varying the byte counter BC on the transmission
rates. It shows that when BC ≤ 600 K B, flows with high rate start recovering
faster than flows with low rate (Fig. 16a, 16b & 16c) but when BC > 600 K B,
405 hosts start to recover at a relatively equal speed which achieves fairness between

26
ACCEPTED MANUSCRIPT

0.98

0.8 0.96

0.94

PT
Propability 0.92

0.6 0.9

0.88

0.86
0.4

RI
0.84

0.82

0.2 0.8

5.8 6 6.2 6.4 6.6 6.8 7 7.2 7.4 7.6 7.8

BC=150 KB BC=450 KB BC=600 KB BC=750 KB data1 data2

SC
0
0 50 100 150 200 250
Queue Length (KB)

Figure 15: CDF of queue length

U
10 10
Transmission Rate (Gbps)

Transmission Rate (Gbps)

Host 0 Host 1 Host 2 Host 3 Host 0 Host 1 Host 2 Host 3
8 8

2
AN 6

0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
M
Time (sec) Time (sec)

(a) BC = 150 KB (b) BC = 450 KB

10 10
Transmission Rate (Gbps)

Transmission Rate (Gbps)

Host 0 Host 1 Host 2 Host 3 Host 0 Host 1 Host 2 Host 3

8 8
D

6 6

4 4
TE

2 2

0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec) Time (sec)

(c) BC = 600 KB (d) BC = 750 KB

Figure 16: Transmission rates

flows (Fig. 16d). This limit exceeds slightly the predicted value by the analytical
analysis (Inequality 19) but stays within an acceptable range.
AC

6.6. Discussion

The time interval between trains T must be greater than the sending time of
410 the whole train (N frames, of 1500 Byte each) with rate equals to AvT × Rmin ,

27
ACCEPTED MANUSCRIPT

where Rmin is the minimum probing rate.

N × 1500 × 8
T ≥ . (20)
AvT × Rmin

PT
Furthermore, T determines the control cycle which controls the buffer bound-
ary. For example, for a stable system of M number of flows, ECCP will keep

RI
the queue length close to zero. If a new flow arrives with rate equals R0 , thus,
415 R0 must satisfy:
R0 × T ≤ B. (21)

SC
Where B is the maximum switch buffer size. In other words, the hardware buffer
inside the switch must satisfy B ≥ T × R0 , or any new flow has to start with

U
B
rate R0 ≤ T.

7. Linux-Based Implementation
AN
420 We have implemented an ECCP testbed using 3 Linux hosts and a 10 Gbps
M
switch. The testbed is connected as shown in Fig. 17 and is configured according
to table 1. In this implementation, we built a Java GUI to periodically collect
D

statistics and plot the actual transmission rate R, and cross traffic rate at the
receiver (Fig. 20).
TE

Statistic
Collector
Sender 0
EP

Sender 1 Receiver
C

Switch
AC

Data Path Stastics Reading

Figure 17: Experiment testbed topology

425 In the next section we present several experiments to validate our bandwidth
estimation method, and in the following section we present the ECCP testbed
implementation.

28
ACCEPTED MANUSCRIPT

7.1. Validating available bandwidth estimation process

ECCP’s available bandwidth estimation process is tested using the afore-

PT
430 mentioned testbed. In this topology, sender 0 sends a constant bit rate traffic
to the receiver and sender 1 sends probe traffic with a randomly generated rate
µ. Fig. 18 shows the measured strains versus the probe rate µ at the receiver

RI
in three scenarios (i) AvBW = 6 Gbps, (ii) AvBW = 5 Gbps, (iii) AvBW = 1.5
Gbps. Fig. 18 depicts that when starts increasing, µ is always identical to

SC
435 AvBw in all cases. Thus, we conclude that this method is trustworthy and could
be used to estimate AvBw.
0.2 0.2

U
Available Bandwidth

Available Bandwidth
0.15 0.15
Strain

Strain
0.1 0.1

0.05

0
AN 0.05

0 2 4 6 8 10 0 2 4 6 8 10
Probe Rate (Gbps) Probe Rate (Gbps)
M
(a) AvBw = 6 Gbps (b) AvBw = 5 Gbps
0.2
Available Bandwidth

0.15
D
Strain

0.1

0.05
TE

0 2 4 6 8 10
Probe Rate (Gbps)

(c) AvBw = 1.5 Gbps

Figure 18: ECCP’s available bandwidth estimation process

7.2. ECCP testbed implementation

In this testbed, we have implemented the ECCP rate controller using

Linux Traffic Control [34]. ECCP needs to control data traffic while probe
440 traffic must be forwarded with no control. To achieve this goal, we use Hierarchy
Token Bucket (HTB) [35] to create two virtual schedulers (Qdisc) with different
classes in Linux machines. HTB allows sending different classes of traffic on

29
ACCEPTED MANUSCRIPT

different simulated links using one physical link. HTB is used to ensure that
the maximum service provided for each class is the minimum of the desired rate
DR or the assigned rate R by ECCP. Fig. 19a shows the two classes that we

PT
445

create to represent data flow and probe flow. In addition, two virtual schedulers
(Qdisc) are created and linked to these classes (Fig. 19b). Thus, ECCP can

RI
limit the data rate by setting the rate on class 1:11 equal the maximum allowed
rate, while keeping the probe class (Class 1:22) uncontrolled. Note that these

SC
450 two queues have different priorities; data flow enter the queue with low priority
while probe flow is forwarded through the queue with high priority.

Data Probes

U
Root HTB Qdisc
1:

HTB Class
1:1
AN Setting LR Leaf
Qdisc

Sending rate: SR
Leaf
Qdisc

Probe rate :μ

HTB
HTB Class 1:11 HTB Class 1:22
root
M
Main Link
Leaf Qdisc Leaf Qdisc
SFQ SFQ
(b) Virtual Queues that is cre-
D

(a) Data class and probe class that ated using HTB for data and probe
TE

is created by HTB packets

Figure 19: HTB created virtual queues and their classes

In this experiment, each host sends with desired rate DR that are throttled
by HTB to the sending rate R which is calculated by ECCP. DRs are varied 4
times in this test, in the first period (0 s < t < 4 s), host 0 sends with DR = 4
C

455 Gbps while host 1 sends with DR = 1 Gbps (Fig. 20). In this period, there is
AC

no congestion and the transmission rates R are not controlled (equal DR). In
the second period (4 s < t < 12.4 s), host 1 increases its DR to 6 Gbps. Thus,
ECCP starts limiting DR by setting R to a value that keeps the cross traffic
close to 9.5 Gbps. One can notice in this period, ECCP controls only the greedy
460 flow (Host 1) while allowing Host 0 to send with its DR. In the third period
(12.4 s < t < 14.2 s), host 0 increases its DR to 6 Gbps. Therefore, ECCP

30
ACCEPTED MANUSCRIPT

starts to control both hosts’ rates severely to prevent congestion. Finally, when
t > 14.2 s, host 0 decreases its DR to 3 Gbps which ends the congestion. Thus,
ECCP alleviates its control, and each host sends with its desired rate (R = DR).

PT
RI
U SC
AN
M
D

Figure 20: ECCP lab implementation results

465 8. Conclusion
EP

In this paper, we propose ECCP as a distributed congestion control mech-

anism that is implemented in line cards or end hosts and does not require any
switch modification.
C

We analyzed ECCP using phase plane method while taking into consid-
eration the propagation delay. Our stability analysis identifies the sufficient
AC

470

conditions for ECCP system stability. In addition, this research shows that the
stability of the ECCP system is ensured by the sliding mode motion. How-
ever, the stability of ECCP depends not only on its parameters but also on the
network configurations.
475 Several simulations were driven to verify our ECCP stability analysis. The

31
ACCEPTED MANUSCRIPT

obtained numerical results reveal that the ECCP system is stable when the
delay is bounded. Finally, a Linux-based testbed experimentation is conducted
to evaluate ECCP performance.

PT
As a perspective of this work, we are presently (i) studying the effect of
480 available bandwidth estimation error on ECCP stability, (ii) evaluating ECCP

RI
in larger and various network topologies using our simulator.

SC
Acknowledgment

This work is supported by Ericsson Research, the Fonds de Recherche Nature

et Technologies (FRQNT) and the Natural Sciences and Engineering Research

U
485 Council of Canada (NSERC). Sincere gratitude is hereby extended to Brian
AN
Alleyne and Andre Beliveau for their help and support in constructing this
work.
M
Appendix A. Proof of lemma 1 (stability conditions of the ECCP
rate decrease subsystem)
D

Proof. Starting with ECCP rate decrease subsystem equation (17) that could
TE

be presented as follows:

Gd
y 0 (t) + y(t − τ) + w × T × y 0 (t − T − τ) (y(t) + ζ ) = 0. (A.1)
T
EP

490 Lyapunov has shown that the stability of nonlinear differential equations in
the neighborhood of equilibrium point can be found from their linear version
around the equilibrium point [36] when the Lipschitz condition is satisfied. For
C

delay differential equations [37] has proven that, delay differential equations
AC

is uniformly asymptotic stable if its linearized version is uniformly asymptotic

495 stable and the Lipschitz condition is satisfied.
Consider functions g1 and g2 are defined as follow: g1 (t) = y(t) and g2 (t) =
− GTd (y(t − τ) + w T y 0 (t − T − τ))(y(t) + ζ ). Since both g1 (t) and g2 (t) are poly-
nomials, for any ~z = (z1 , z2 ) = (y(t), y 0 (t)), there exists L such that ||gi (t,~z1 ) −
gi (t,~z2 )|| ≤ L||~z1 − ~z2 ||, where i = 1,2. Then the Lipschitz condition is satisfied.

32
ACCEPTED MANUSCRIPT

500 Hence, the stability of the delay differential equation is defined by the stability
of the linearized part near the equilibrium point.
Thus, the linear part of the rate decrease subsystem equation becomes:

PT
Gd ζ
y 0 (t) + y(t − τ) + w × T × y 0 (t − T − τ) = 0.

(A.2)
T

RI
We use Taylor series to approximate (A.2) by substituting y(t − τ) and y(t −
T − τ) using (A.3) and (A.4) respectively.
τ 2 00

SC
y(t − τ) ≈ y(t) − τy 0 (t) + y (t). (A.3)
2
505

y 0 (t − T − τ) ≈ y 0 (t) − (T + τ)y 00 (t). (A.4)

U
Hence (A.2) becomes:
Gd ζ
y 0 (t) +
T
τ2
2
AN
y(t) − τy 0 (t) + y 00 (t) + w y 0 (t) − (T + τ)y 00 (t) ≈ 0

τ2 1 τ 0 1
( − w(T + τ))y 00 (t) + w + − y (t) + y(t) ≈ 0 (A.5)
Gd ζ T
M
2T T
where G d ζ , 0. Therefore, one can derive the characteristic equation of (A.5)
as:
D

τ2 1 τ 1
− w(T + τ) λ 2 + w + − λ + = 0. (A.6)
2T Gd ζ T T
TE

By calculating the roots λ 1, 2 of the characteristic equation (A.6), we obtain:

√
−b ± b2 − 4ac
λ 1, 2 = . (A.7)
2a
EP

τ2 τ
where a = 2T − w(T + τ), b = w + 1
Gd ζ − T, and c = 1
T.
510 In order to study the stability of ECCP rate decrease subsystem, the roots
of (A.6) must be either (i) complex roots with negative real part for a system
C

with stable spiral point (Fig. A.21a) or (ii) negative roots for a system with
AC

stable point (Fig. A.21b).

Appendix A.1. System stability with a spiral point

515 For a stable system with a spiral point, λ 1, 2 must be complex numbers with
negative real parts. Thus, inequalities (A.8) and (A.9) must hold.

b2 < 4ac (A.8)

33
ACCEPTED MANUSCRIPT

y'(t)
F
b =0
y'(t)
F
b =0

PT
y(t)
As
ym y(t)

RI
p to
t ic As
l in ym
e pt
ot
ic

SC
lin
e
(a) Complex roots with with negative real
part (b) Negative real roots

U
Figure A.21: Phase trajectories of the rate decrease subsystem

AN
b/a > 0 (A.9)
M
Substituting a, b and c in (A.8), we get:
1 τ 2 τ2 1
w+ − <4 − w(T + τ) .
Gd ζ T 2T T
D

Let H = w + 1
Gd ζ , we get:
TE

τ 2 τ τ
H− < 2( ) 2 − 4w − 4w +
T T T
τ 2 τ
( ) + (2H − 4w) − H − 4w > 0.
2
(A.10)
T T
EP

One can say that the right-hand side (RHS) of (A.10) represents a convex func-
2
tion ( dd(τ/T
(R H S)
) 2 = 1 > 0 ) as shown in Fig. A.22. Thus, inequality (A.8) holds
when RH S < 0 where τ/T < min(roots) and τ/T > max(roots). Hence, we
C

calculate the roots r 1, 2 of the RHS as:

p
−2H + 4w ± (2H − 4w) 2 + 4(H 2 + 4w)
r 1, 2 =
2
p
r 1, 2 = −H + 2w ± (H − 2w) 2 + (H 2 + 4w)
p
r 1, 2 = 2w − H ± H 2 − 4wH + 4w 2 + H 2 + 4w
1 p
r 1, 2 = w − ± 2H 2 − 4wH + 4w 2 + 4w. (A.11)
Gd ζ

34
ACCEPTED MANUSCRIPT

τ/T

PT
r1 r2

RI
Figure A.22: Roots r 1, 2 of a convex function

SC
By substituting with the value of H, we get:
s
1 2
r 1, 2 = w − ± 2w 2 − + 4w. (A.12)
Gd ζ (G d ζ ) 2

U
Thus, inequality (A.8) holds when:


T


τ
< w − 1
Gd ζ
AN
−
q

q
2w 2 − (G d2ζ)2 + 4w
(A.13)
 τ > w − 1 + 2w 2 − 2 2 + 4w.


 T G d ζ (G d ζ)
M
One can conclude that inequality (A.8) does not hold because τ/T by definition
520 must be limited by a certain value k (τ/T < k, where k ∈ R+ ). Therefore, (A.6)
D

does not have complex roots and ECCP rate decrease subsystem does not have
a stable spiral point.
TE

Appendix A.2. System stability with a stable point

For a stable system with a stable point, λ 1, 2 must be negative real number
EP

525 where inequalities (A.14) and (A.15) hold.

b2 > 4ac. (A.14)

√
−b ± b2 − 4ac
< 0. (A.15)
AC

2a
By treating A.14 similar to (A.8), we get:
s
1 2
w− − 2w 2 − + 4w > τ/T
Gd ζ (G d ζ ) 2
s
1 2
<w− + 2w 2 − + 4w. (A.16)
Gd ζ (G d ζ ) 2

35
ACCEPTED MANUSCRIPT

As τ and T are always greater than zero, we can consider the positive root only.

s
1 2
τ/T < w −

PT
+ 2w 2 − + 4w. (A.17)
Gd ζ (G d ζ ) 2
The second condition (Inequality A.15) can be simplified as follows:
−b

RI
(1 ± 1 − 4ac/b2) < 0.
p
(A.18)
2a
530 This condition holds in one of the following two states: (i)

SC
 −b/a > 0



 (A.19)
 1 ± 1 − 4ac/b2 < 0.

 p


U
Or:
 −b/a < 0



 AN
 1 ± 1 − 4ac/b2 > 0.



p (A.20)

The second part of the first state (A.19) does not hold in its worst case when
M
p
we consider the positive root; i.e 1 + 1 − 4ac/b2 will never be less than zero.
Thus we consider only the second state. The worst case of the second part of
D

the second state equation (A.20) could be derived as follows:

p
1 − 1 − 4ac/b2 > 0
TE

p
− 1 − 4ac/b2 > −1

1 − 4ac/b2 > 1
EP

−4ac/b2 > 0. (A.21)

For all b2 > 0 and c = 1/T > 0, we conclude that c must be greater than 0.
C

Consequently, b must be greater than zero (first part of A.20) when −a is greater
than zero.
AC

Hence, to satisfy the second inequality A.20, these conditions must hold:

−a > 0
τ2
wT + wτ − >0
2T
τ τ
−1/2( ) 2 + w + w > 0, (A.22)
T T

36
ACCEPTED MANUSCRIPT

and

b>0

PT
1 τ
w+ − >0
Gd ζ T
τ 1
<w+ . (A.23)
Gd ζ

RI
T

535 Dissimilar to inequality (A.10), the RHS of (A.22) represents a concave func-
τ
tion. Thus, inequality (A.22) holds when min(r 1, 2 ) < < max(r 1, 2 ), where the

SC
T
roots r 1, 2 of (A.22) are:
p
r 1, 2 = w ± w 2 + 2w. (A.24)

U
Hence, inequality (A.22) holds when:
p AN p
w − w 2 + 2w < τ/T < w + w 2 + 2w. (A.25)

Because τ and T ∈ R+ , we consider only the positive root, thus (A.25) becomes:
M
540
p
τ/T < w + w 2 + 2w. (A.26)
D

To conclude, ECCP rate decrease subsystem is stable with a stable point (Fig.
A.21b) when inequalities (A.17), (A.23) and (A.26) hold.
TE

In summary, ECCP rate decrease subsystem is asymptotically stable and the

phase trajectories of the rate decrease differential equation (16) are parabolas

moving toward a stable node as shown in Fig. A.21b when τ/T < min w +
EP

545
√ √
G d ζ , w + w + 2w, w − G d ζ + 2H − 4wH + 4w + w .
1 2 1 2 2
C

Appendix B. Stability Analysis of ECCP Rate Increase Subsystem

In this appendix, we study the stability of the rate increase subsystem by

substituting (14) into self-increase part of (12).

M T R − AvTM×C (y(t) + ζ )
y 0 (t) = ×
AvT × C 2 × TBC
M ζ y(t)
y 0 (t) = × TR − − . (B.1)
2 × AvT × C × TBC 2 × TBC 2 × TBC

37
ACCEPTED MANUSCRIPT

Equation (B.1) is an inhomogeneous second order ordinary differential equa-

tion (ODE) which has a characteristic equation of the form:

PT
1
λ2 + λ=K (B.2)
2 × TBC
where:

RI
M ζ
K= TR −
2 × AvT × C × TBC 2 × TBC

SC
1
M × TR − Aeq
= − AvT
2 × AvT × C × TBC 2 × TBC
1
= M × T R − (1 − Aeq AvT )C . (B.3)
2 × AvT × C × TBC

U
550 The phase trajectories of (B.2) can be drawn using the Isoclinal method [38].

y'(t)
AN
Fig. B.23a and B.23b show the phase trajectories of the self-increase subsystem
As
ym y'(t)
pt
ot
M
ic
lin
e
K>0
y(t) y(t)
D

K<0
As
ym
pt
TE

ot
ic
lin
e
EP

(a) (K > 0) (b) (K < 0)

Figure B.23: Phase trajectories in the self-increase subsystem (K > 0 and K < 0)
C

while (K > 0) and (K < 0) respectively. In general, ECCP reaches self-increase

subsystem coming from the rate decrease subsystem when, M × T R ≥ (1 −

Aeq AvT )C. Consequently, K > 0 and the system follows the trajectory of Fig.
555 B.23a. If ECCP enters the self-increase subsystem phase trajectory with K < 0,
it follows the trajectory in Fig. B.23b for five cycles. Then, it increases T R as
in equation (8) which increases K. Finally, ECCP follows the phase trajectory
shown in Fig. B.23a.

38
ACCEPTED MANUSCRIPT

y'(t)

Fb
=0
Rate Decrease

PT
l5
l2
y(t)

RI
l4 l1
l3

SC
Rate Increase

Figure B.24: ECCP phase trajectories

U
560
AN
Combining Fig. A.21b, B.23a and Fig. B.23b for the rate decrease and self-
increase subsystems, we get Fig. B.24. In this figure, one can notice that if the
system starts in the self-increase subsystem, it follows line l 1 (K > 0) toward
M
the asymptotic line (Fb = 0) or it follows line l 3 (K ≤ 0) for 5 cycles till ECCP
enters AI stage and T R is increased. Then it follows l 4 toward the asymptotic
D

line. Afterward, the system follows either line l 2 coming from FR stage to rate
565 decrease subsystem or l 5 from the AI stage to the rate decrease subsystem. Both
TE

trajectories lead ECCP toward the equilibrium point as shown in Fig. B.24.
Therefore, ECCP rate increase subsystem is not stable, and the stability
of ECCP system mainly depends on the sliding mode motion [7] from self-
EP

increase subsystem into the rate decrease subsystem when the system crosses
570 the asymptotic line (Fb = 0).
C

Appendix C. Proof of lemma 2 (boundary limitations for the ECCP)

Proof. To avoid data accumulation in the queue, the integral of the self-increase
function from t to t + (T + 2τ) must be less than the available bandwidth margin
as depicted by (C.1).
t + (T +2τ)
M R0 (t)dt < AvT Aeq C. (C.1)
t

39
ACCEPTED MANUSCRIPT

575 Since ECCP is a discrete system and R(t) is constant within control cycle, (C.1)
could be approximated within one control cycle to:

PT
M R0 (t)(T + 2τ) < AvT Aeq C
(T R − R)

RI
M (T + 2τ) < AvT Aeq C.
2TBC

SC
At the equilibrium point R = (1 − AvT Aeq )C/M, and T R > C/M.

(C/M − (1 − AvT Aeq )C/M)

M (T + 2τ) < AvT Aeq C

U
2TBC
(C − C − AvT Aeq C)
(T + 2τ) < AvT Aeq C
2TBCAN (T + 2τ) < 2TBC
2BC
(T + 2τ) <
M
C/M
C(T + 2τ)
BC > . (C.2)
2M
D

In summary, ECCP keeps queue length close to zero, if condition C.2 is

satisfied.
EP

580 References

[1] A. Bachmutsky, System design for telecommunication gateways, John Wi-

ley & Sons, 2011.

[2] G. Lee, Cloud Networking: Understanding Cloud-based Data Center Net-

works, Morgan Kaufmann, 2014.

585 [3] I. 802.1, The data center bridging (DCB) task group (2013).
URL http://www.ieee802.org/1/pages/dcbridges.html

40
ACCEPTED MANUSCRIPT

[4] M. Snir, The future of supercomputing, in: Proceedings of the 28th ACM
International Conference on Supercomputing, ICS ’14, ACM, New York,
NY, USA, 2014, pp. 261–262. doi:10.1145/2597652.2616585.

PT
590 [5] S. Bailey, T. Talpey, The architecture of direct data placement (DDP) and
remote direct memory access (RDMA) on internet protocols, Architecture.

RI
URL https://tools.ietf.org/html/rfc4296

SC
[6] P. Kale, A. Tumma, H. Kshirsagar, P. Ramrakhyani, T. Vinode, Fibre
channel over ethernet: A beginners perspective, in: 2011 International
595 Conference on Recent Trends in Information Technology (ICRTIT), 2011,

U
pp. 438–443. doi:10.1109/ICRTIT.2011.5972328.

AN
[7] V. Utkin, Variable structure systems with sliding modes, IEEE Transac-
tions on Automatic Control 22 (2) (1977) 212–222. doi:10.1109/TAC.
1977.1101446.
M
600 [8] A. Varga, R. Hornig, An overview of the omnet++ simulation environment,
in: Proceedings of the 1st international conference on Simulation tools and
D

techniques for communications, networks and systems & workshops, ICST

(Institute for Computer Sciences, Social-Informatics and Telecommunica-
TE

tions Engineering), 2008, p. 60.

605 [9] IEEE standard for local and metropolitan area networks–media access con-
EP

trol (MAC) bridges and virtual bridged local area networks–amendment

17: Priority-based flow control, IEEE Std 802.1Qbb-2011 (Amendment to
C

IEEE Std 802.1Q-2011 as amended by IEEE Std 802.1Qbe-2011 and IEEE

Std 802.1Qbc-2011) (2011) 1–40doi:10.1109/IEEESTD.2011.6032693.
AC

610 [10] B. Stephens, A. L. Cox, A. Singla, J. Carter, C. Dixon, W. Felter, Practical

DCB for improved data center networks, in: INFOCOM, 2014 Proceedings
IEEE, IEEE, 2014, pp. 1824–1832.

[11] IEEE standard for local and metropolitan area networks– virtual bridged
local area networks amendment 13: Congestion notification, IEEE Std

41
ACCEPTED MANUSCRIPT

615 802.1Qau-2010 (Amendment to IEEE Std 802.1Q-2005) (2010) c1–119doi:

10.1109/IEEESTD.2010.5454063.

PT
[12] M. Alizadeh, B. Atikoglu, A. Kabbani, A. Lakshmikantha, R. Pan, B. Prab-
hakar, M. Seaman, Data center transport mechanisms: Congestion control
theory and IEEE standardization, in: 2008 46th Annual Allerton Confer-

RI
620 ence on Communication, Control, and Computing, 2008, pp. 1270–1277.
doi:10.1109/ALLERTON.2008.4797706.

SC
[13] A. Kabbani, M. Alizadeh, M. Yasuda, R. Pan, B. Prabhakar, AF-QCN:
Approximate fairness with quantized congestion notification for multi-

U
tenanted data centers, in: 2010 18th IEEE Symposium on High Perfor-
625 mance Interconnects, 2010, pp. 58–65. doi:10.1109/HOTI.2010.26.
AN
[14] Y. Zhang, N. Ansari, Fair quantized congestion notification in data center
networks, IEEE Transactions on Communications 61 (11) (2013) 4690–
M
4699. doi:10.1109/TCOMM.2013.102313.120809.

[15] Y. Tanisawa, M. Yamamoto, Qcn with delay-based congestion detection

630 for limited queue fluctuation in data center networks, in: 2013 IEEE 2nd
International Conference on Cloud Networking (CloudNet), 2013, pp. 42–
TE

49. doi:10.1109/CloudNet.2013.6710556.

[16] R. Adams, Active queue management: A survey, IEEE Communications

Surveys Tutorials 15 (3) (2013) 1425–1476. doi:10.1109/SURV.2012.

635 082212.00018.
C

[17] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar,

S. Sengupta, M. Sridharan, Data center TCP (DCTCP), SIGCOMM Com-
AC

put. Commun. Rev. 40 (4) (2010) 63–74. doi:10.1145/1851275.1851192.

[18] Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Pad-

640 hye, S. Raindel, M. H. Yahia, M. Zhang, Congestion control for large-scale
RDMA deployments, SIGCOMM Comput. Commun. Rev. 45 (4) (2015)
523–536. doi:10.1145/2829988.2787484.

42
ACCEPTED MANUSCRIPT

[19] M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar, A. Vahdat, M. Yasuda,

Less is more: Trading a little bandwidth for ultra-low latency in the data
center, in: Proceedings of the 9th USENIX Conference on Networked Sys-

PT
645

tems Design and Implementation, NSDI’12, USENIX Association, Berkeley,

CA, USA, 2012, pp. 19–19.

RI
URL http://dl.acm.org/citation.cfm?id=2228298.2228324

[20] R. Mittal, V. T. Lam, N. Dukkipati, E. Blem, H. Wassel, M. Ghobadi,

SC
650 A. Vahdat, Y. Wang, D. Wetherall, D. Zats, TIMELY: RTT-based conges-
tion control for the datacenter, SIGCOMM Comput. Commun. Rev. 45 (4)
(2015) 537–550. doi:10.1145/2829988.2787510.

U
[21] C. So-In, R. Jain, J. Jiang, Enhanced forward explicit congestion notifi-

655
AN
cation (e-fecn) scheme for datacenter ethernet networks, in: 2008 Interna-
tional Symposium on Performance Evaluation of Computer and Telecom-
munication Systems, 2008, pp. 542–546.
M
[22] L. Jose, L. Yan, M. Alizadeh, G. Varghese, N. McKeown, S. Katti, High
speed networks need proactive congestion control, in: Proceedings of the
D

14th ACM Workshop on Hot Topics in Networks, HotNets-XIV, ACM, New

660 York, NY, USA, 2015, pp. 14:1–14:7. doi:10.1145/2834050.2834096.

[23] J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, H. Fugal, Fastpass: A

centralized "zero-queue" datacenter network, SIGCOMM Comput. Com-
EP

mun. Rev. 44 (4) (2014) 307–318. doi:10.1145/2740070.2626309.

[24] Z. Wang, X. Zeng, X. Liu, M. Xu, Y. Wen, L. Chen, TCP conges-

665 tion control algorithm for heterogeneous Internet, Journal of Net-

work and Computer Applications 68 (Supplement C) (2016) 56 – 64.

doi:https://doi.org/10.1016/j.jnca.2016.03.018.
URL http://www.sciencedirect.com/science/article/pii/
S1084804516300327

670 [25] J. A., K. R. S.V., A. U. R., Congestion avoidance algorithm

using ARIMA(2,1,1) model-based RTT estimation and RSS in

43
ACCEPTED MANUSCRIPT

heterogeneous wired-wireless networks, Journal of Network and

Computer Applications 93 (Supplement C) (2017) 91 – 109.

PT
doi:https://doi.org/10.1016/j.jnca.2017.05.008.
675 URL http://www.sciencedirect.com/science/article/pii/
S1084804517302060

RI
[26] L. Tassiulas, A. Ephremides, Stability properties of constrained queueing
systems and scheduling policies for maximum throughput in multihop radio

SC
networks, IEEE Transactions on Automatic Control 37 (12) (1992) 1936–
680 1948. doi:10.1109/9.182479.

U
[27] J. Liu, N. B. Shroff, C. H. Xia, H. D. Sherali, Joint congestion control
and routing optimization: An efficient second-order distributed approach,
AN
IEEE/ACM Transactions on Networking 24 (3) (2016) 1404–1420. doi:
10.1109/TNET.2015.2415734.
M
685 [28] S. Ekelin, M. Nilsson, E. Hartikainen, A. Johnsson, J. E. Mangs, B. Me-
lander, M. Bjorkman, Real-time measurement of end-to-end available
D

bandwidth using kalman filtering, in: 2006 IEEE/IFIP Network Opera-

tions and Management Symposium NOMS 2006, 2006, pp. 73–84. doi:
TE

10.1109/NOMS.2006.1687540.

690 [29] A. Charny, D. D. Clark, R. Jain, Congestion control with explicit rate indi-
EP

cation, in: Communications, 1995. ICC ’95 Seattle, ’Gateway to Globaliza-

tion’, 1995 IEEE International Conference on, Vol. 3, 1995, pp. 1954–1963
vol.3. doi:10.1109/ICC.1995.524537.
C

[30] G. Raina, D. Towsley, D. Wischik, Part II: Control theory for buffer sizing,
AC

695 SIGCOMM Comput. Commun. Rev. 35 (3) (2005) 79–82. doi:10.1145/

1070873.1070885.

[31] F. Kelly, G. Raina, T. Voice, Stability and fairness of explicit congestion

control with small buffers, SIGCOMM Comput. Commun. Rev. 38 (3)
(2008) 51–62. doi:10.1145/1384609.1384615.

44
ACCEPTED MANUSCRIPT

700 [32] M. M. Bahnasy, A. Beliveau, B. Alleyne, B. Boughzala, C. Padala,

K. Idoudi, H. Elbiaze, Using ethernet commodity switches to build a switch
fabric in routers, in: 2015 24th International Conference on Computer Com-

PT
munication and Networks (ICCCN), 2015, pp. 1–8. doi:10.1109/ICCCN.
2015.7288483.

RI
705 [33] W. Jiang, F. Ren, C. Lin, Phase plane analysis of quantized congestion no-
tification for data center ethernet, IEEE/ACM Transactions on Networking

SC
23 (1) (2015) 1–14. doi:10.1109/TNET.2013.2292851.

[34] B. Hubert, et al., Linux advanced routing & traffic control howto, setembro

U
de.

710 AN
[35] M. Devera, Hierarchical token bucket theory (2002).
URL http://luxik.cdi.cz/~devik/qos/htb/manual/theory.htm

[36] V. Arnold, Supplementary chapters to the theory of ordinary differential

M
equations, Nauka, Moscow.
D

[37] R. D. Driver, Ordinary and delay differential equations, Vol. 20, Springer
715 Science & Business Media, 2012.
TE

[38] D. P. Atherton, G. M. Siouris, Nonlinear control engineering, IEEE Trans-

actions on Systems, Man, and Cybernetics 7 (7) (1977) 567–568. doi:
EP

10.1109/TSMC.1977.4309773.
C
AC

45
Author MANUSCRIPT
ACCEPTED biographies

Mahmoud Bahnasy received the M.Eng. from Université du Québec à

Montréal, Canada, in 2014. Currently pursuing the Ph.D. degree in computer
science and engineering at École de Technologie Supérieure, Montréal,
Canada.

PT
Halima Elbiaze received the Master degree from University of Versailles, France, in 1998, and the

RI
Phd from University of Versailles, in March 2002. She is a professor at Université du Québec à
Montréal since June 2003. In 2005, Dr. Elbiaze received the Canada Foundation for Innovation
Award to build her IP over the DWDM network Laboratory. Her research interests include

SC
intelligent optical networks, performance evaluation, traffic engineering, wireless networks, and next
generation IP networks. She is the author or coauthor of many journal and conference papers. Her
research interests include network performance evaluation, traffic engineering, and quality of service
management in optical and wireless networks. She is member of IEEE and OSA.

U
AN
Bochra Boughzala received her engineering national diploma form INSAT,
Tunisia in 2011 and her Master degree in Computer Science from UQAM in
2013. In 2013, she joined Ericsson Research group in Montreal where she now
M

works as experienced researcher in the networking technologies research area.

Her research interests include high performance data plane and execution
environment, data plane abstractions and domain specific languages, network
D

programmability and software defined networking (SDN), congestion control and traffic
management with new interest in 5G Ethernet fronthauling and information centric networking
TE

(ICN) for mobile backhaul.

C EP
AC
ACCEPTED MANUSCRIPT

- Using Ethernet commodity switches as a switch fabric for routers.

PT
- Proposing ECCP that controls transmission rate using estimated available bandwidth.

- Presenting a mathematical model of ECCP using Delay Differential Equations (DDEs).

RI
- Deducing the stability conditions of ECCP using the phase plane method.

SC
- Validating the ECCP performance using simulation and testbed implementation.

U
AN
M
D
TE
C EP
AC

mideterm cheatsheet
No ratings yet
mideterm cheatsheet
18 pages
Rotary Dryer Sizing
100% (5)
Rotary Dryer Sizing
3 pages
BRKDCT-3313 Fabricpath
No ratings yet
BRKDCT-3313 Fabricpath
94 pages
2.3.2.7 Lab - Configuring Basic PPP With Authentication
100% (1)
2.3.2.7 Lab - Configuring Basic PPP With Authentication
17 pages
Calibre User Manual: Release 5.44.0
No ratings yet
Calibre User Manual: Release 5.44.0
415 pages
FCoE Whitepaper
No ratings yet
FCoE Whitepaper
24 pages
6.888: Data Center Congestion Control: Mohammad Alizadeh
No ratings yet
6.888: Data Center Congestion Control: Mohammad Alizadeh
59 pages
Fibre Channel Over Ethernet (Fcoe) Concepts
No ratings yet
Fibre Channel Over Ethernet (Fcoe) Concepts
17 pages
Cloud Networking
No ratings yet
Cloud Networking
36 pages
Data Center TCP (DCTCP) : Microsoft Research Stanford University
No ratings yet
Data Center TCP (DCTCP) : Microsoft Research Stanford University
12 pages
EN05 Intelligent Lossless Data Center Network
No ratings yet
EN05 Intelligent Lossless Data Center Network
45 pages
Fcoe Frame Format Terminology: Destination Mac Address
No ratings yet
Fcoe Frame Format Terminology: Destination Mac Address
1 page
Huawei Hcpa-Ip
No ratings yet
Huawei Hcpa-Ip
32 pages
Fast Transmission of TCP Packets
No ratings yet
Fast Transmission of TCP Packets
21 pages
Final_Review_2024_6d1bc53350deaeac530b5d02d2c4fe40
No ratings yet
Final_Review_2024_6d1bc53350deaeac530b5d02d2c4fe40
15 pages
Configuring Cisco Nexus 7000 Series Switches-Student Guide p402-511 Final-New
No ratings yet
Configuring Cisco Nexus 7000 Series Switches-Student Guide p402-511 Final-New
229 pages
Converged Networks and FCoE
No ratings yet
Converged Networks and FCoE
12 pages
Congestion Control Using NBP
No ratings yet
Congestion Control Using NBP
12 pages
Dynamic Adaption of DCF and PCF Mode of IEEE 802.11 WLAN: Abhishek Goliya
No ratings yet
Dynamic Adaption of DCF and PCF Mode of IEEE 802.11 WLAN: Abhishek Goliya
57 pages
Lecture 42 Ether Channel
No ratings yet
Lecture 42 Ether Channel
6 pages
RDMA Over Commodity Ethernet at Scale
No ratings yet
RDMA Over Commodity Ethernet at Scale
14 pages
eeFCoE Bootcamp
No ratings yet
eeFCoE Bootcamp
16 pages
Lab 4.4 Comparing Queuing Strategies
No ratings yet
Lab 4.4 Comparing Queuing Strategies
45 pages
Smart Traffic Management in High-Speed Networks by Fuzzy Logic Control
No ratings yet
Smart Traffic Management in High-Speed Networks by Fuzzy Logic Control
7 pages
Session 22, 23
No ratings yet
Session 22, 23
14 pages
Congestion Control For High Bandwidth-Delay Product Networks
No ratings yet
Congestion Control For High Bandwidth-Delay Product Networks
14 pages
Advanced Topics in Networking: Lecture 7: Programmable Forwarding
No ratings yet
Advanced Topics in Networking: Lecture 7: Programmable Forwarding
38 pages
Configuring Frame Relay Traffic Shaping On 7200 Routers and Lower Platforms
No ratings yet
Configuring Frame Relay Traffic Shaping On 7200 Routers and Lower Platforms
7 pages
Packet-Buffering Inside Routers
No ratings yet
Packet-Buffering Inside Routers
62 pages
DCC Micro Project
89% (9)
DCC Micro Project
10 pages
11.1. Describing FCoE
No ratings yet
11.1. Describing FCoE
31 pages
DCyt MICRO PROJECT
0% (1)
DCyt MICRO PROJECT
13 pages
Poster NRCP2019
No ratings yet
Poster NRCP2019
1 page
ecs4100-12ph
No ratings yet
ecs4100-12ph
6 pages
Introduction to Congestion Control in Juniper AI:ML Networks
No ratings yet
Introduction to Congestion Control in Juniper AI:ML Networks
13 pages
Fastpass SIGCOMM14 Perry
No ratings yet
Fastpass SIGCOMM14 Perry
12 pages
22 Qos
No ratings yet
22 Qos
47 pages
slides-101-iccrg-proposed-ieee-8021qcz-work-00
No ratings yet
slides-101-iccrg-proposed-ieee-8021qcz-work-00
26 pages
Fcoe by Cisco at A Glance
No ratings yet
Fcoe by Cisco at A Glance
2 pages
JohnHufferd Fiber Channel Over Ethernet
No ratings yet
JohnHufferd Fiber Channel Over Ethernet
48 pages
Fabric Path Switching
No ratings yet
Fabric Path Switching
1 page
Abstract Cloud
No ratings yet
Abstract Cloud
1 page
Fibre Channel Over Ethernet (Fcoe), Iscsi and The Converged Data Center
No ratings yet
Fibre Channel Over Ethernet (Fcoe), Iscsi and The Converged Data Center
22 pages
1 Ethernet Dclan en
No ratings yet
1 Ethernet Dclan en
56 pages
EtherChannel -1
No ratings yet
EtherChannel -1
20 pages
By Chandan 19 Dec 2011
No ratings yet
By Chandan 19 Dec 2011
77 pages
21 Congestion Avoidance 22-03-2024
No ratings yet
21 Congestion Avoidance 22-03-2024
36 pages
Paper 2
No ratings yet
Paper 2
32 pages
Datagram Congestion Control Protocol (DCCP)
No ratings yet
Datagram Congestion Control Protocol (DCCP)
42 pages
Datacenter Ethernet
No ratings yet
Datacenter Ethernet
2 pages
Fall 2007 Midterm Solutions
No ratings yet
Fall 2007 Midterm Solutions
7 pages
TCP Research
No ratings yet
TCP Research
9 pages
Cisco Nexus 3064PQ Switch Architecture
No ratings yet
Cisco Nexus 3064PQ Switch Architecture
9 pages
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
Chapter One: Router and Switch: What Is The Big Benefit of Using Switches To Connect Hosts?
No ratings yet
Chapter One: Router and Switch: What Is The Big Benefit of Using Switches To Connect Hosts?
48 pages
Storage Systems UNIT-IInd
No ratings yet
Storage Systems UNIT-IInd
3 pages
Appendix F: Authors: John Hennessy & David Patterson
No ratings yet
Appendix F: Authors: John Hennessy & David Patterson
33 pages
CCIE RS Quick Review Kit
No ratings yet
CCIE RS Quick Review Kit
63 pages
Technology in Telecommunications Networks
From Everand
Technology in Telecommunications Networks
Tanushri Kaniyar
No ratings yet
CCNA Certification All-in-One For Dummies
From Everand
CCNA Certification All-in-One For Dummies
Silviu Angelescu
5/5 (1)
Essential Guide to CAN Communication
From Everand
Essential Guide to CAN Communication
Engineer's Essentials
No ratings yet
Spanning Tree Protocol Essentials: Definitive Reference for Developers and Engineers
From Everand
Spanning Tree Protocol Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
T-Spice 13 User Guide-Contents: 1 Getting Started 10
No ratings yet
T-Spice 13 User Guide-Contents: 1 Getting Started 10
569 pages
7 FSM Sda
No ratings yet
7 FSM Sda
109 pages
A Simple Drain Current Model For Schottky-Barrier Carbon Nanotube Field Effect Transistors
No ratings yet
A Simple Drain Current Model For Schottky-Barrier Carbon Nanotube Field Effect Transistors
2 pages
Eye Diagram Using Simulink
No ratings yet
Eye Diagram Using Simulink
4 pages
Lithographic Processes: Pattern Generation and Transfer
No ratings yet
Lithographic Processes: Pattern Generation and Transfer
16 pages
VLSI Test Process and Equipment
No ratings yet
VLSI Test Process and Equipment
43 pages
Quantum Mechanics JEST 2012-2017
No ratings yet
Quantum Mechanics JEST 2012-2017
30 pages
Light: The Cosmic Messenger
No ratings yet
Light: The Cosmic Messenger
102 pages
VSWR Measurement
No ratings yet
VSWR Measurement
4 pages
Numerical Modelling and Experimental Validation - Ducted Rocket Motor
No ratings yet
Numerical Modelling and Experimental Validation - Ducted Rocket Motor
118 pages
CSJM University Kanpur: Database Management System 2k19 - Batch
No ratings yet
CSJM University Kanpur: Database Management System 2k19 - Batch
11 pages
CT&PT
0% (1)
CT&PT
131 pages
12th Physics Vivek Sir
No ratings yet
12th Physics Vivek Sir
3 pages
Fahu 01
No ratings yet
Fahu 01
9 pages
Dil vs Kadisco
No ratings yet
Dil vs Kadisco
78 pages
Integrated Circuit Design of 4-Bit Booth Multiplier Radix-4 Using Microwind Software
No ratings yet
Integrated Circuit Design of 4-Bit Booth Multiplier Radix-4 Using Microwind Software
5 pages
Kathrein - E800372991 5
100% (1)
Kathrein - E800372991 5
1 page
2011 P4 Math SA1 RGPS
No ratings yet
2011 P4 Math SA1 RGPS
28 pages
Debre Berhan University: School of Computing Department of Information Technology
No ratings yet
Debre Berhan University: School of Computing Department of Information Technology
11 pages
Nuclei Class 12 Notes Chapter 13
No ratings yet
Nuclei Class 12 Notes Chapter 13
5 pages
Detective Dyes
No ratings yet
Detective Dyes
22 pages
Linear Function
No ratings yet
Linear Function
12 pages
Olympiad Corner Solution by Linear Combination: Kin-Yin Li
No ratings yet
Olympiad Corner Solution by Linear Combination: Kin-Yin Li
4 pages
MSJ500
No ratings yet
MSJ500
1 page
Master Hvac Boq
No ratings yet
Master Hvac Boq
15 pages
II Sem - Module 4 - Partial Diferential Equations - Notes
No ratings yet
II Sem - Module 4 - Partial Diferential Equations - Notes
17 pages
SBT308 2022 (1)
No ratings yet
SBT308 2022 (1)
4 pages
Inorganic Chemistry: Group 17
100% (3)
Inorganic Chemistry: Group 17
38 pages
Resume Sudeep - Fadadu
No ratings yet
Resume Sudeep - Fadadu
2 pages
Detailed Lesson Plan in Empowerment Technology
100% (1)
Detailed Lesson Plan in Empowerment Technology
8 pages
1 16Bcs152 Sushmita V Wipro Technologies LNT Infotech Focus Edumatics 9944599567 2 3 4
No ratings yet
1 16Bcs152 Sushmita V Wipro Technologies LNT Infotech Focus Edumatics 9944599567 2 3 4
2 pages
Pythaa
No ratings yet
Pythaa
2 pages
2023 Asc
No ratings yet
2023 Asc
1 page
Ilovepdf Merged 2
No ratings yet
Ilovepdf Merged 2
48 pages
Ironic Clinic Jacques-Alain Miller: Excellence
No ratings yet
Ironic Clinic Jacques-Alain Miller: Excellence
7 pages
Chapter 2 MOS Transistors
No ratings yet
Chapter 2 MOS Transistors
52 pages
Engineering 4862 Microprocessors: Assignment 2
No ratings yet
Engineering 4862 Microprocessors: Assignment 2
6 pages
Using Customer Exit Variables in BW Reports_3a Part - 10
No ratings yet
Using Customer Exit Variables in BW Reports_3a Part - 10
16 pages

Accepted Manuscript: 10.1016/j.jnca.2017.12.016

Uploaded by

Accepted Manuscript: 10.1016/j.jnca.2017.12.016

Uploaded by

Accepted Manuscript

Zero-queue ethernet congestion control protocol based on available bandwidth

Mahmoud Bahnasy, Halima Elbiaze, Bochra Boughzala

To appear in: Journal of Network and Computer Applications

Received Date: 4 January 2017

Zero-queue Ethernet Congestion Control Protocol based

switches as a switch fabric for routers. Thereby, we present Ethernet Congestion

method. We deduced the sufficient conditions of the stability of ECCP that

Email addresses: [email protected] (Mahmoud Bahnasy),

Preprint submitted to Journal of LATEX Templates December 26, 2017

congestion and achieving minimum network latency. Moreover, to verify the

Router’s switch fabric is an essential technology that is traditionally ad-

faces scalability issue [2].

shared memory Crossbar switch Arbiter

(a) Shared-memory-based switch fab- (b) Crossbar-based switch fabric archi-

Figure 1: Router’s switch fabric architectures

15 In this research, we introduce a new router architecture that uses Ethernet

characteristics. Besides, it can be utilized as a reliable and robust layer 2 con-

gestion control mechanism for data center applications (e.g. high-performance

Furthermore, we introduce a mathematical model of ECCP while using the

40 sequently, we combine these phase trajectories to understand the transition

70 Quantized Congestion Notification (QCN) [11, 12] is an end-to-end control

ingly. If no CNM is received, the RP increases its transmission rate according

90 example, [15] presents the use of delay variation as an indication of congestion

deviation of Round-Trip Time (RTT) to identify congestion, instead of ECN

marking in DCTCP. TIMELY can significantly reduce queuing delay and it

routing optimization framework that aims to offer resource optimization and

3. ECCP : Ethernet Congestion Control Protocol

In this section, we present ECCP as a distributed congestion prevention

to this observation, which causes data accumulation in queues even further.

3.1. ECCP components

actions between its components are explained. ECCP prevents congestion by

1 The rate limiter in Fig. 3 is outside the scope of this paper

3.1.1. ECCP probe sender

sends probe traffic of a uniformly distributed random rate µ between a fixed

3.1.2. ECCP probe receiver

3.1.3. ECCP bandwidth estimator

Figure 4: The effect of injecting probe traffic into network

 and probe rate µ is represented using (2).

220 zero (α × AvBw + β = 0) as in (3) (for more details see [32]).

AvBw = − β/α. (3)

β parameters. As such, AvBw is calculated using (4). Afterward, bandwidth

235 3.1.4. ECCP rate controller

AvBw, ECCP rate controller calculates available bandwidth ratio Ar accord-

Data Traffic (R)

Fb = ( Ar − Aeq ) + w × ( Ar − Aol d ) (6)

it represents a weight for ( Ar − Aol d ); i.e., w makes calculated Fb more sensitive

ative otherwise it uses a self-increase process as depicted by (7) (Fig. 6).

completes five cycles in FR stage while no negative Fb is calculated, the rate

Output: New sending rate R

3 if (Fb < 0) then

6 Sel f IncreaseStarted ← T RU E ; /* Initialize self-increase */

Algorithm 2: ECCP self-increase process

15 if (ByteC ycleCnt > 5) or (TimeC ycleCnt > 5) then

16 RestartTimer (T/2) ; /* AI or HAI stage */

23 if (ByteC ycleCnt > 5) and (TimeC ycleCnt > 5) then

25 else if (ByteC ycleCnt > 5) or (TimeC ycleCnt > 5) then

28 if (R > linkCapacit y) then

4. Phase Plane Analysis

main versus x(t)

Figure 7: A phase trajectory example

Congestion control schemes in computer networks require different behaviors

The core element of ECCP is the rate control algorithm. By responding

305 • Available bandwidth estimation error is negligible (Measured AvBw is used

of available bandwidth estimation error on ECCP stability for future work.

Given the aforementioned assumptions, ECCP can be modeled while calcu-

lating the available bandwidth using (9).

AvBw(t) = C − M × R(t) (9)

In addition, feedback calculation in (6) becomes:

Fb(t) = Ar (t) − Aeq + w × T × A0r (t − T ) (11)

6. Stability Analysis of ECCP

processes separately. Next, simulation experiments are presented to verify our

325 mathematical analysis.

6.1. Stability Analysis of ECCP Rate Decrease Subsystem

In this section, we analyze ECCP rate decrease subsystem represented by

Fb(t) = y(t) + w × T y 0 (t − T ). (15)

and probe rate µ is represented using (2).