Accepted Manuscript: 10.1016/j.jnca.2017.12.016
Accepted Manuscript: 10.1016/j.jnca.2017.12.016
PII: S1084-8045(17)30423-X
DOI: 10.1016/j.jnca.2017.12.016
Reference: YJNCA 2036
Please cite this article as: Bahnasy, M., Elbiaze, H., Boughzala, B., Zero-queue ethernet congestion
control protocol based on available bandwidth estimation, Journal of Network and Computer
Applications (2018), doi: 10.1016/j.jnca.2017.12.016.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
PT
Mahmoud Bahnasya , Halima Elbiazeb , Bochra Boughzalac
a École de Technologie Supérieure, Montréal, Canada
b Université du Québec à Montréal, Canada
RI
c Ericsson Research, Canada
SC
Abstract
Router’s switch fabric has strict characteristics in terms of packet loss, latency,
U
fairness and head-of-line (HOL) blocking. Network manufacturers address these
AN
requirements using specialized, proprietary and highly expensive switches. Si-
multaneously, IEEE introduces Data Center Bridging (DCB) as an enhancement
to existing Ethernet bridge specifications which include technological enhance-
M
ments addressing packet loss, HOL blocking and latency issues. Motivated by
DCB enhancements, we investigate the possibility of using Ethernet commodity
D
flexible and cost-efficient switch fabric, and fulfills the strict router character-
istics. Furthermore, we present a mathematical model of ECCP using Delay
Differential Equations (DDEs), and analyze its stability using the phase plane
EP
bility of ECCP is mainly ensured by the sliding mode motion, causing ECCP
to keep cross traffic close to the maximum link capacity and queue length close
AC
to zero. Extensive simulation scenarios are driven to validate the analytical re-
sults of ECCP behavior. Our analysis shows that ECCP is practical in avoiding
PT
Keywords: Data Center Bridging, Congestion Control, Congestion
Prevention, Priority-based Flow Control PFC, Quantized Congestion
RI
Notification QCN, Ethernet Congestion Control Protocol ECCP
SC
1. Introduction
U
dressed using custom Application-Specific Integrated Circuit (ASIC). This ASIC
must fulfill particular characteristics including low packet loss, fairness between
5
AN
flows, and low latency [1]. The emergence of very-high-speed serial interfaces
and new router’s architectures increase the design and manufacturing cost of the
switch fabric chipset. Traditionally, switch fabric is manufactured using either
M
shared memory or crossbar switch as shown in Fig. 1a and Fig. 1b respectively.
The shared memory architecture requires memory that works N times faster
D
10 than port speed, where N is the number of ports which raises scalability issue.
On the other hand, crossbar architecture tries to keep the buffering at the edge
TE
of the router (Virtual Output Queue VOQ inside line cards). Because this ar-
chitecture requires N VOQs at each ingress port and a central unit (arbiter), it
EP
Terminal Terminal
Interfaces Interfaces
Line Card
Line Card
Line Card
C
Line Card
Line Card
Line Card
... .. .
AC
2
ACCEPTED MANUSCRIPT
PT
has recently presented Data Center Bridging (DCB) [3] that comprises several
enhancements to Ethernet network. However, Ethernet network still suffers
RI
20 from HOL blocking, congestion spreading and high latency. To overcome these
limitations and achieve a non-blocking switch fabric, we present Ethernet Con-
SC
gestion Control Protocol (ECCP) that maintains Ethernet network non-blocked
by preserving switches’ queue lengths close to zero leading to minimum latency
and no HOL blocking. Unlike traditional Congestion control mechanisms that
U
25 use packet accumulation in buffers to trigger the rate control process, ECCP
estimates available bandwidth and uses this information to control transmission
AN
rates before link saturation or data accumulation. Accordingly, it achieves min-
imum latency by trading off a small margin of link capacity. Therefore, ECCP
M
achieves (i) low queue length, (ii) low latency, and (iii) high throughput, (iv)
30 with no switch modification. Such a mechanism could be used in manufactur-
ing a cost-efficient routers’ switch fabric while guaranteeing traditional router
D
the phase trajectories of the rate increase and rate decrease subsystems. Con-
AC
3
ACCEPTED MANUSCRIPT
simulations are conducted using OMNEST [8] to verify our mathematical anal-
ysis. Finally, a Linux-based implementation of ECCP is conducted to verify
ECCP’s performance through experiment.
PT
The rest of this paper is organized as follows. Related work is introduced in
50 Section 2. Section 3 presents ECCP mechanism. Section 4 introduces the phase
RI
plane analysis method in brief. The mathematical model of ECCP is derived
in Section 5. The stability analysis of ECCP is deduced in Section 6. Linux-
SC
based implementation is presented in Section 7. Finally, Section 8 introduces
conclusion and future work.
U
55 2. RELATED WORK
AN
In this section, we present some research work that is closely related to con-
gestion control in both Ethernet layer and Transmission Control Protocol (TCP)
layer. IEEE has recently presented Data Center Bridging (DCB) [3] that com-
M
prise several enhancements for Ethernet network to create a consolidation of I/O
60 connectivity through data centers. DCB aims to eliminate packet loss due to
D
queue overflow. Ethernet PAUSE IEEE 802.3x and Priority-based Flow Control
(PFC) [9] are presented in DCB as link level (hop-by-hop) mechanisms. Ether-
TE
net PAUSE was issued to solve packet loss problem by sending a PAUSE request
to the sender when the receiver buffer reaches a certain threshold. Thus, the
EP
65 sender stops sending data until a local timer expires or a resume notification is
received from the receiver. PFC divides data path into eight traffic classes, each
could be controlled individually. Yet, PFC is still limited because it operates on
C
port plus priority level which can cause congestion-spreading and HOL blocking
[9, 10].
AC
4
ACCEPTED MANUSCRIPT
CP
Q Qeq
Data Frames sampling
RP
PT
CNM Frames
Fb = -((Q - Qeq) + w (Q - Qold))
RI
Figure 2: QCN framework: CP in the bridge, and RP in the host’s NIC
SC
75 culates a feedback (Fb) value, in a probabilistic manner, to reflect congestion
severity (Equation 1).
U
Fb = −((Q − Q eq ) + w × (Q − Q ol d )). (1)
AN
Where Q ol d is the previous queue length and w is a constant which equals 2
(for more details refer to [11]). If the calculated Fb is negative, CP creates a
M
Congestion Notification Message (CNM) and sends it to the RP.
80 QCN reduces the overhead of control information traffic and reduces the
required computational power by calculating Fb in a probabilistic manner. At
D
the end host, when RP receives CNM, it decreases its transmission rate accord-
TE
issues regarding fairness [13, 14] and queue length fluctuation [15]. In addition,
QCN does not achieve minimum latency as it keeps queue length at a certain
level (Q eq ).
C
Several research papers have discussed various enhancements for QCN. For
AC
5
ACCEPTED MANUSCRIPT
Data Center TCP (DCTCP) [17] uses switches that support Explicit Con-
gestion Notification (ECN) to mark packets that arrive while queue length is
greater than a predefined threshold. DCTCP source reacts by reducing the
PT
window proportionally to the fraction of marked packets. Data Center QCN
100 (DCQCN) [18] combines the characteristics of Data Center TCP (DCTCP) [17]
RI
and QCN in order to achieve QCN-like behavior while using the ECN mark-
ing feature. DCQCN requires very strict parameters selection regarding byte
SC
counter and marking probability.
Trading a little bandwidth to achieve low queue length and low latency is dis-
105 cussed in a number of papers. For example, HULL (High-bandwidth Ultra-Low
U
Latency) is presented in [19] to reduce average and tail latencies in data centers
by sacrificing a small amount of bandwidth (e.g., 10%). HULL presents the
AN
Phantom Queue (PQ) as a new marking algorithm. Phantom queues simulate
draining data at a fraction (< 1) of link rate. This process generates a virtual
M
110 backlog that is used to mark data packets before congestion. The challenges of
HULL are the needs of switch modification.
TIMELY [20] is a congestion control scheme for data centers. It uses the
D
tive congestion control algorithm (PERC) [22] are presented as congestion con-
trol mechanisms that exploit the measured available bandwidth to control data
rates. However, these two methods require switch modifications which we aim
C
120 to avoid.
AC
Few centralized solutions are proposed in the literature. For example, Fast-
pass [23] embraces central control for every packet transmission which raises a
scalability issue.
Another approach to enhance the performance of TCP protocol was to dis-
125 tinguish between congestive packet loss and non-congestive packet loss [24, 25].
Therefore, the TCP congestion avoidance algorithm could be activated only
6
ACCEPTED MANUSCRIPT
when congestive packet loss is detected. E.g., TCP INVS [24] estimates network
queue length and compare this estimation to a threshold. If the estimated queue
length exceeds the threshold, the loss is caused by congestion. Consequently,
PT
130 TCP INVS activate the traditional congestion avoidance algorithm. Otherwise,
the loss is considered as a non-congestion loss and TCP INVS ignores it and
RI
avoids limiting congestion window growth. In addition, [25] proposes an RTT es-
timation algorithm using Autoregressive Integrated Moving Average (ARIMA)
SC
model. By analyzing the estimated RTT, one can estimates the sharp and sud-
135 den changes in the RTT, thereby differentiating the non-congestive packet loss
from congestive packet loss. While these mechanisms acheive better through-
U
put on lossy networks, it introduces an extra packet loss that is not suitable for
router switch fabric or data center network.
AN
Optimizing the routing decision to control the congestion is also proposed in
140 several research papers. Most of this research follows a key idea called the back-
M
pressure algorithm [26] where traffic is directed around a queuing network to
achieve maximum network throughput. An example of this scheme is presented
in [27] where the authors developed a second-order joint congestion control and
D
145 fast convergence. Such a scheme can significantly reduce queuing delay and it
would be interesting to investigate this scheme in future work.
EP
algorithm that works on Ethernet layer. ECCP controls data traffic according
to the estimate Available Bandwidth (AvBw) through a network path. ECCP
AC
150
strives to keep link occupancy less than the maximum capacity by a percentage
called Availability Threshold (AvT). Traditional congestion control mechanisms
aim to keeps the queue around a target level. These mechanisms can reduce
queuing latency, but they cannot eliminate it. In these mechanisms, a non-zero
155 queue must be observed before reaction, and sources need one RTT to react
7
ACCEPTED MANUSCRIPT
PT
close-to-zero queue length, leading to minimum network latency.
160 ECCP estimates AvBw through network path by sending trains of probe
RI
frames periodically through this path. Sender adds sending time and other
information as train identifier and sequence number within the train to each
SC
probe frame. On the receiver side, ECCP receives these frames and estimates
AvBw using a modified version of Bandwidth Available in Real-Time (BART)
165 [28]. Afterward, ECCP transmits this information back to the sender. At
U
the sender side, ECCP controls transmission rate based on the received AvBw
value. ECCP advocates rate-based control schemes instead of window-based
AN
control schemes because window-based schemes encounter significant challenges
particularly with the rapid increase of the control cycle time, defined mainly
M
170 by propagation delay, compared to transmission time in modern networks [29].
In addition, [30] and [31] state that at high line rates, queue size fluctuations
become fast and difficult to control because queuing delay is shorter than the
D
control loop delay. Thus, rate based control schemes are more reliable.
TE
175 In this section, ECCP architecture is described in detailed and the inter-
EP
stability margin equals AvT × C. This bandwidth stability margin allows ECCP
AC
180 to send probe traffic without causing queue accumulation. ECCP does not re-
quire switch modification because all its functionalities are implemented inside
line cards or hosts.
Fig. 3 depicts ECCP components1 : (1) probe sender , (2) probe re-
8
ACCEPTED MANUSCRIPT
ceiver , (3) bandwidth estimator , and (4) rate controller . These modules
are implemented in every line card in the router or every host.
PT
Rate Limiter
Probe
Probe
Rate Limiter Receiver
RI
Sender
Network Queue
Rate Bandwidth
Controller Estimator
SC
Data Frame Data Path
Probe Frame Probe frames Path
AvBw Estimation Frame Control Path
U
Figure 3: ECCP components
185
190 probe frame is tagged with a sending time. Other information is added to the
probes such as sequence number and train identifier. ECCP probe sender
TE
195 R × AvT. Thus, ECCP gets enough information to control (decrease or increase)
data rate while limiting probe rate to R × AvT. Hence, probe traffic for M flows
C
crossing one link (M × R × AvT ) never exceeds link bandwidth stability margin
( AvT × C).
AC
9
ACCEPTED MANUSCRIPT
PT
BART which is based on a self-induced congestion model. In this model, when
∆in
ne
t li
- Δin
ai gh
Str
Δin
∆out
ε = Δout
β
RI
Probin AvBw=μ α μ+
g ra
te μ ε=
Probe Sender Probing rate μ
0
μ<AvBw μ>AvBw
No Congestion Network Congested probing
d
SC
e ive e r Probe Receiver
c ra t
Re
Network Queue
U
Data Receiver
Data Sender AN
Probe Traffic Data Traffic
state such that; if µ is greater than AvBw, network queues start accumulating
TE
210 data which increases ∆out . Otherwise, ∆out will be, in average, equal to ∆in
(Fig. 4). This model does not require clock synchronization between hosts.
Rather, it uses the relative queuing delay between probe frames.
EP
BART derives a new metric to define the change of the inter-frame time
called strain = (∆out − ∆in )/∆in . For probe rate µ that is less than AvBw, the
215 strain will be, on average, equal to zero ( ≈ 0). Otherwise, the strain increases
C
proportionally to the probe rate µ (Fig. 4). This linear relation between strain
AC
if µ ≤ AvBw
0
= (2)
α µ + β if µ > AvBw.
Based on this linear relationship between strain and probe rate µ, ECCP
estimates AvBw as the maximum probe rate that keeps the strain equal to
10
ACCEPTED MANUSCRIPT
PT
For that purpose, the bandwidth estimator calculates the strain i for each
probe pair {i = 1, . . . , N − 1}. Then, the calculated average and its variance R
RI
are forwarded to Kalman Filter (KF). In addition, an estimation of the system
noise covariance Q and measurement error P are provided. Kalman filter works
SC
225 on continuous linear systems while this model has a discontinuity separating
two linear segments as shown in Fig. 4. Thus, BART ignores the probe rates µ
that are not on the horizontal line where µ is less than the last estimated AvBw
U
(µ < AvBw). Unlike BART, ECCP does not ignore probe train information
that is not on the straight line. Instead, it uses that probe rate µ to provide an
230
AN
estimation of AvBw using (4) (for more details see [32]).
if < t
max(µ j )
M
AvBw = (4)
KF( t ,Q, P) if ≥ t
where j is the probe train number and t is the strain threshold that identifies
D
the starting point of the straight line. After that, Kalman filter calculates α and
TE
(AIMD) model after receiving AvBw value. Based on the received estimated
AC
AvBw
Ar = . (5)
R × AvT
11
ACCEPTED MANUSCRIPT
Bandwidth
AvT x R
AvBweq
Link Capacity AvBw- AvBweq
PT
AvBw
Aeq
RI Ar- Aeq
Ar
SC
Time 0
U
Figure 5: Relationship between Av Bw and A r
AN
ECCP works on keeping Ar at an equilibrium level Aeq . Therefore, it calcu-
lates a feedback parameter Fb to represent the severity of the congestion using
M
(6).
where Aol d is the previous value of Ar , and w is equal to 2 (similar to QCN) and
TE
Furthermore, ECCP rate controller monitors two variables (1) the trans-
mission rate R and (2) the target rate T R. T R is the transmission rate before
congestion and represents an objective rate for the rate increase process. ECCP
C
250 rate controller uses a rate decrease process if the calculated Fb value is neg-
AC
T R ← R
if Fb < 0 (rate decrease process)
R(1 + G d × Fb ) (7)
R ← 1 (R + T R)
2 otherwise (Self-increase process)
12
ACCEPTED MANUSCRIPT
where G d is a fixed value and is taken to make the maximum rate reduction
equals 1/2.
PT
Byte Counter
if Fb < 0 +
Rate R
TR =R Rate -
R = R (1+Gd x Fb) Controller Feed Back
RI
+
R TR Timer
TR = TR + RHAI
TR = TR + RAI
SC
R = 1/2 x (R +TR)
Time
U
AN
Figure 6: ECCP rate control stages
M
Fig. 6 shows the ECCP rate control process in detail. The figure shows that
255 when ECCP calculates a negative Fb, it executes the rate decrease process. In
D
addition, Fig. 6 depicts that ECCP divides the self-increase process into three
stages i) Fast Recovery (FR), ii) Active Increase (AI) and iii) Hyper-Active
TE
Increase (HAI). ECCP determines the increasing stage based on a byte counter
BC and a timer T. The Fast Recovery stage consists of five cycles where each
cycle is defined by sending BC Bytes of data or the expiration of a timer T.
EP
260
The timer defines the end of cycles in case of low rate flows. At each cycle, R is
updated using (7) while keeping T R unchanged. If the byte counter or the timer
C
265 a predefined value R AI . Moreover, the byte counter and the timer limits are
set to BC/2 and T/2 respectively. Afterward, the rate controller enters the
Hyper-Active Increase (HAI) stage, if both the byte counter and the timer finish
five cycles. In the HAI stage, T R is increased by a predefined value RH AI as
13
ACCEPTED MANUSCRIPT
in (8).
T R + R AI (AI)
TR ← (8)
PT
T R + RH AI (HAI).
270 Where R AI is the rate increase step in AI stage and RH AI is the rate increase step
in HAI stage. Algorithms 1 and 2 depict ECCP rate decrease and self-increase
RI
processes respectively.
SC
Algorithm 1: ECCP rate decrease process
Input : Available Bandwidth AvBw
U
1 Ar ← AvBw/R × AvT ;
4 T R ← R;
M
5 R ← R(1 + G d × Fb ) ; /* Rate decrease */
7 ByteC ycleCnt ← 0 ;
TimeC ycleCnt ← 0 ;
TE
9 ByteCnt ← BC ;
10 Timer ← T ;
EP
11 end
C
AC
14
ACCEPTED MANUSCRIPT
PT
1 foreach sentFrame do
2 if Sel f IncreaseStarted == T RU E then
3 ByteCnt ← ByteCnt − Byte( f rameSize) ;
RI
4 if (ByteCnt ≤ 0) then
5 ByteC ycleCnt + + ;
SC
6 if (ByteC ycleCnt < 5) then
7 ByteCnt ← BC ;
else
U
8
9 ByteCnt ← BC/2 ;
10 Adjust Rate() ; AN
11 end
M
12 foreach timeout do
13 if Sel f IncreaseStarted == T RU E then
14 TimeC ycleCnt + + ;
D
17 else
18 RestartTimer (T ) ; /* FR stage */
EP
19 Adjust Rate() ;
20 end
C
21
AC
22 AdjustRate():
27 R ← 1/2 × (R + T R);
In this paper, we use phase plane method to visually represent certain charac-
PT
275 teristics of the differential equation of the ECCP. Phase plane is used to analyze
the behavior of nonlinear systems. The solutions of differential equations are
a set of functions which could be plotted graphically in the phase plane as a
RI
two-dimensional vector field. Given an autonomous system represented by a
differential equation x 00 (t) = f (x(t), x 0 (t)), one can plot the phase trajectory of
SC
280 such a system by following the direction where time increases. Fig. 7a depicts a
system x(t) in time domain, where a phase trajectory of this system is displayed
in Fig. 7b. One can notice that x(t) and x 0 (t) in time domain can be inferred
U
from the phase trajectory plot. Thus, the phase trajectory provides enough
285
AN
information about the behavior of the system. Moreover, sketching phase tra-
jectories is easier than finding an analytical solution of differential equations,
which sometimes is not possible.
M
x(t)
D
x (t)
TE
t
x(t)
EP
(a) The trajectory of x(t) in time do- (b) Phase trajectory of x 0 (t)
C
16
ACCEPTED MANUSCRIPT
Thus, using phase plane method is adequate for analyzing segmented systems
like congestion control protocols [33].
In addition, system parameters limitation can be taken into consideration
PT
explicitly. Therefore, we should consider only the phase trajectories that satisfy
295 our system limitations (i.e link capacity and buffer size) even if the system is
RI
stable according to the derived stability conditions.
SC
5. ECCP Modeling
U
300 the target point. For the purpose of simplicity, we made these assumptions:
AN
• All sources are homogeneous, namely they have the same characteristics
such as round-trip time.
M
• Data flows in data center networks have high rates and appear like con-
tinuous flow fluid.
D
310 where AvBw(t) is the available bandwidth at time t, C is the maximum link
AC
capacity, M is the number of flows that share the same bottleneck link, and
R(t) is the host’s transmission rate at time t.
By substituting (9) into (5) we get:
C M
Ar (t) = − (10)
AvT × R(t) AvT
where Ar (t) is the available bandwidth ratio at time t.
17
ACCEPTED MANUSCRIPT
PT
315 where T is the time interval between trains which defines the control cycle time,
and ( Ar − Aol d ) becomes the derivative of availability ratio A0r multiplied by the
RI
control cycle time T.
Given ECCP rate update equation (7), the derivative of transmission rate
SC
R0 (t) can be represented by the delay differential equation (12).
Gd
T R(t)Fb(t − τ) if Fb(t − τ) < 0
R (t) =
0
(12)
U
T R− R(t )
if Fb(t − τ) ≥ 0
2 ×T BC
320 AN
where τ is the propagation delay, TBC is the BC counter time.
= Ar (t) − Aeq
y(t)
AC
(13)
y 0 (t) = A0r (t).
Thus, from (10) we get:
C M
y(t) = − − Aeq
AvT × R(t) AvT
18
ACCEPTED MANUSCRIPT
Let ζ = M
AvT + Aeq , we get:
C
y(t) = −ζ
AvT × R(t)
PT
C
R(t) =
AvT × (y(t) + ζ )
C
R0 (t) = − y 0 (t). (14)
AvT × (y(t) + ζ ) 2
RI
The feedback equation could be represented by substituting (13) in (11):
SC
Substituting (14) and (15) into the rate decrease part of (12), we get the
rate decrease subsystem equation (16).
U
−C Gd C
y 0 (t) = Fb(t − τ)( )
AvT (y(t) + ζ ) 2 T AvT (y(t) + ζ )
−1
(y(t) + ζ )
AN
y 0 (t) =
Gd
T
Fb(t − τ)
Gd
−y 0 (t) = Fb(t − τ)(y(t) + ζ ). (16)
T
M
Thus, ECCP rate decrease subsystem could be represented by substituting
(15) into (16).
D
Gd
−y 0 (t) = y(t − τ) + w × T × y 0 (t − T − τ) (y(t) + ζ ). (17)
T
TE
s
1 2 1 p
τ/T < min w − + 2w 2 − + 4w, w + , w + w 2 + 2w . (18)
Gd ζ (G d ζ ) 2 Gd ζ
C
19
ACCEPTED MANUSCRIPT
PT
cal analysis of ECCP. Using OMNEST network simulation framework [8], we
340 simulate a dumbbell topology of four data sources and four receivers connected
to two 10-Gbps switches as shown in Fig. 8. All links in this topology have
RI
a maximum capacity of 10 Gbps. We consider the worst case which happens
when all sources send with their maximum link capacity. Thus, we have four
SC
data sources that send data at maximum line capacity (10 Gbps) toward four
345 receivers through one bottleneck link (Fig. 8). Table 1 depicts the simulation
parameters.
U
Source 0 AN Receiver 0
ps
10
Gb
Gb
10
10 G
bps ps Switch 0 Switch 1 b ps
Bottleneck Link 10 G
Source 1 Receiver 1
M
ps 10 G
10 Gb bps
10 Gbps
ps 10
Source 2 Gb G
bp Receiver 2
10 s
D
Based on ECCP parameters that are shown in Table 1 and inequality (18),
EP
ECCP is stable for all τ < 1.482 T. Fig. 9 shows a box plot of cross traf-
fic. It depicts that ECCP system reduces cross traffic rate to a value lower
C
350 than its minimum limit ((1 − AvT ) × C = 9 Gbps), when τ exceeds the analyt-
ically calculated limit (1.482 T ). In addition, Fig. 9 clearly shows that when
AC
τ = 1.8 ms > 1.482 T, the variation of cross traffic exceeds the maximum al-
lowed margin (AvT × C = 1 Gbps). One can notice that when τ = 3.3 T the
average cross traffic starts to increase again. The reason behind that is the data
355 accumulation in the queue as shown in Fig. 10b.
Fig. 10 depicts the queue length while varying the propagation delay (τ =
20
ACCEPTED MANUSCRIPT
10
PT
8.5
7.5
RI
7
6.5
SC
6
0.3 ms (0.3T) 0.6 ms (0.6T) 1.2 ms (1.2T) 1.8 ms (1.8T) 2.4 ms (2.4T) 3.3 ms (3.3T)
Propagation Delay (ms)
U
300
=300 s =600 s =1.2ms
250 AN
Queue Length (KB)
200
150
M
100
50
D
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec)
TE
EP
200
150
100
C
50
AC
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec)
21
ACCEPTED MANUSCRIPT
0.3 T, 0.6 T, 1.2 T, 1.8 T, 2.4 T, & 3.3 T). Fig. 10a shows that if the stability
conditions are satisfied (τ < 1.482T), ECCP system succeeds in maintaining a
close-to-zero queue length. Otherwise, data start to accumulate and the queue
PT
360 fluctuates significantly as shown in Fig. 10b.
1
RI
0.95
0.8
0.9
SC
Propability
0.6 0.85
0.8
0.4
0.75
U
0.2 0.7
6 6.5 7 7.5
0.3 ms 0.6 ms 1.2 ms 1.8 ms 2.4 ms 3.3 ms data1 data2
0
0 50
AN
100 150
Queue Length (KB)
200 250
length. It shows that when stability conditions are satisfied and τ = 0.3 T, 0.6 T, 1.2 T,
99-percentile of queue length are less than 6.72 KB, 6.78 KB and 21.9 KB re-
TE
spectively. But when these conditions are violated, 99-percentile of queue length
365 reach up to 294.4 KB.
Fig. 12 depicts the transmission rates while varying the propagation delay.
EP
It shows that as long as τ does not exceed the stability limit (1.482 T), ECCP
system achieves fairness between flows.
C
AC
22
ACCEPTED MANUSCRIPT
PT
RI
10 10
Transmission Rate (Gbps)
SC
Host 0 Host 1 Host 2 Host 3 Host 0 Host 1 Host 2 Host 3
8 8
6 6
4 4
2 2
U
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec) Time (sec)
10
(a) τ = 300 µs (0.3 T )
AN 10
(b) τ = 600 µs (0.6 T )
Transmission Rate (Gbps)
6 6
4 4
M
2 2
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec) Time (sec)
D
6 6
4 4
2 2
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
EP
23
ACCEPTED MANUSCRIPT
PT
Frame size Normal distribution
RI
Min Frame size 200 Byte
SC
Max Frame size 1500 Byte
U
ECCP Probing Parameters
0.00001 0.0
System noise Q =
0.0 0.01
EP
1.0 0.0
Measurement error P =
0.0 100.0
C
Gd G d = 100/128
PT
the core mechanism of ECCP. However, ECCP system is also constrained by
physical boundaries such as the maximum link capacity and buffer size. For
example, when the ECCP system reaches the equilibrium point, hosts keep
RI
increasing their data rates until calculating a positive Fb. Thus, cross traffic
375 might reach the maximum limit and data starts to be queued in the system. In
SC
order to avoid this, the integral of the self-increase function from t to (t+(T +2τ))
must be less than the available bandwidth margin (AvT × Aeq ×C), where (T +2τ)
is the control cycle time. The boundary limitation of ECCP queue system is
U
summarized by the following lemma.
380 AN
Lemma 2. ECCP keeps queue length close to zero, thereby ensuring minimum
network latency and preventing congestion if inequality (19) is satisfied.
M
C(T + 2τ)
BC > . (19)
2M
D
is simulated to verify the analytical model. Fig. 13, 14, 15 and 16 depict the
simulation results while varying the byte counter (BC = 150 KB, 450 KB, 600
KB, and 750 KB). In addition, Fig. 13 shows that when inequality (19) is not
C
satisfied (BC < 500 K B), ECCP system becomes unstable and the cross traffic
AC
390 variation exceeds ( AvT × C) limit (1 Gbps). It is clearly shown that reducing
BC decreases the average cross traffic rate and increases its variation. One can
notice that at BC = 150K B, the average cross traffic rate starts to increase again
which is a result of data accumulation in the bottleneck link queue as shown in
Fig. 14. Besides, Fig. 14 depicts that when byte counter does not satisfy the
395 analytically calculated limit BC < 500 K B, the queue starts accumulating data.
25
ACCEPTED MANUSCRIPT
10
PT
9
RI
8.5
SC
8
150 KB 450 KB 600 KB 750 KB
Byte Conter BC (KB)
U
In contrast, when byte counter limit is satisfied BC > 500 K B, ECCP succeeded AN
in maintaining a close-to-zero queue length.
300
BC=150 KB BC=450 KB BC=600 KB BC=750 KB
M
250
Queue Length (KB)
200
D
150
100
TE
50
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec)
EP
Fig. 15 shows the CDF of the queue length. It depicts that when BC is
C
equal to 750 and 600 KB, 99-percentile of queue length are less than 6.9 KB
AC
400 and 6.8 KB respectively. But when inequality (19) is not satisfied where BC <
500 KB, 99-percentile of queue length reaches up to 299 KB.
Fig. 16 depicts the effect of varying the byte counter BC on the transmission
rates. It shows that when BC ≤ 600 K B, flows with high rate start recovering
faster than flows with low rate (Fig. 16a, 16b & 16c) but when BC > 600 K B,
405 hosts start to recover at a relatively equal speed which achieves fairness between
26
ACCEPTED MANUSCRIPT
0.98
0.8 0.96
0.94
PT
Propability 0.92
0.6 0.9
0.88
0.86
0.4
RI
0.84
0.82
0.2 0.8
SC
0
0 50 100 150 200 250
Queue Length (KB)
U
10 10
Transmission Rate (Gbps)
2
AN 6
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
M
Time (sec) Time (sec)
6 6
4 4
TE
2 2
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (sec) Time (sec)
flows (Fig. 16d). This limit exceeds slightly the predicted value by the analytical
analysis (Inequality 19) but stays within an acceptable range.
AC
6.6. Discussion
The time interval between trains T must be greater than the sending time of
410 the whole train (N frames, of 1500 Byte each) with rate equals to AvT × Rmin ,
27
ACCEPTED MANUSCRIPT
PT
Furthermore, T determines the control cycle which controls the buffer bound-
ary. For example, for a stable system of M number of flows, ECCP will keep
RI
the queue length close to zero. If a new flow arrives with rate equals R0 , thus,
415 R0 must satisfy:
R0 × T ≤ B. (21)
SC
Where B is the maximum switch buffer size. In other words, the hardware buffer
inside the switch must satisfy B ≥ T × R0 , or any new flow has to start with
U
B
rate R0 ≤ T.
7. Linux-Based Implementation
AN
420 We have implemented an ECCP testbed using 3 Linux hosts and a 10 Gbps
M
switch. The testbed is connected as shown in Fig. 17 and is configured according
to table 1. In this implementation, we built a Java GUI to periodically collect
D
statistics and plot the actual transmission rate R, and cross traffic rate at the
receiver (Fig. 20).
TE
Statistic
Collector
Sender 0
EP
Sender 1 Receiver
C
Switch
AC
425 In the next section we present several experiments to validate our bandwidth
estimation method, and in the following section we present the ECCP testbed
implementation.
28
ACCEPTED MANUSCRIPT
PT
430 mentioned testbed. In this topology, sender 0 sends a constant bit rate traffic
to the receiver and sender 1 sends probe traffic with a randomly generated rate
µ. Fig. 18 shows the measured strains versus the probe rate µ at the receiver
RI
in three scenarios (i) AvBW = 6 Gbps, (ii) AvBW = 5 Gbps, (iii) AvBW = 1.5
Gbps. Fig. 18 depicts that when starts increasing, µ is always identical to
SC
435 AvBw in all cases. Thus, we conclude that this method is trustworthy and could
be used to estimate AvBw.
0.2 0.2
U
Available Bandwidth
Available Bandwidth
0.15 0.15
Strain
Strain
0.1 0.1
0.05
0
AN 0.05
0 2 4 6 8 10 0 2 4 6 8 10
Probe Rate (Gbps) Probe Rate (Gbps)
M
(a) AvBw = 6 Gbps (b) AvBw = 5 Gbps
0.2
Available Bandwidth
0.15
D
Strain
0.1
0.05
TE
0 2 4 6 8 10
Probe Rate (Gbps)
29
ACCEPTED MANUSCRIPT
different simulated links using one physical link. HTB is used to ensure that
the maximum service provided for each class is the minimum of the desired rate
DR or the assigned rate R by ECCP. Fig. 19a shows the two classes that we
PT
445
create to represent data flow and probe flow. In addition, two virtual schedulers
(Qdisc) are created and linked to these classes (Fig. 19b). Thus, ECCP can
RI
limit the data rate by setting the rate on class 1:11 equal the maximum allowed
rate, while keeping the probe class (Class 1:22) uncontrolled. Note that these
SC
450 two queues have different priorities; data flow enter the queue with low priority
while probe flow is forwarded through the queue with high priority.
Data Probes
U
Root HTB Qdisc
1:
HTB Class
1:1
AN Setting LR Leaf
Qdisc
Sending rate: SR
Leaf
Qdisc
Probe rate :μ
HTB
HTB Class 1:11 HTB Class 1:22
root
M
Main Link
Leaf Qdisc Leaf Qdisc
SFQ SFQ
(b) Virtual Queues that is cre-
D
(a) Data class and probe class that ated using HTB for data and probe
TE
In this experiment, each host sends with desired rate DR that are throttled
by HTB to the sending rate R which is calculated by ECCP. DRs are varied 4
times in this test, in the first period (0 s < t < 4 s), host 0 sends with DR = 4
C
455 Gbps while host 1 sends with DR = 1 Gbps (Fig. 20). In this period, there is
AC
no congestion and the transmission rates R are not controlled (equal DR). In
the second period (4 s < t < 12.4 s), host 1 increases its DR to 6 Gbps. Thus,
ECCP starts limiting DR by setting R to a value that keeps the cross traffic
close to 9.5 Gbps. One can notice in this period, ECCP controls only the greedy
460 flow (Host 1) while allowing Host 0 to send with its DR. In the third period
(12.4 s < t < 14.2 s), host 0 increases its DR to 6 Gbps. Therefore, ECCP
30
ACCEPTED MANUSCRIPT
starts to control both hosts’ rates severely to prevent congestion. Finally, when
t > 14.2 s, host 0 decreases its DR to 3 Gbps which ends the congestion. Thus,
ECCP alleviates its control, and each host sends with its desired rate (R = DR).
PT
RI
U SC
AN
M
D
465 8. Conclusion
EP
We analyzed ECCP using phase plane method while taking into consid-
eration the propagation delay. Our stability analysis identifies the sufficient
AC
470
conditions for ECCP system stability. In addition, this research shows that the
stability of the ECCP system is ensured by the sliding mode motion. How-
ever, the stability of ECCP depends not only on its parameters but also on the
network configurations.
475 Several simulations were driven to verify our ECCP stability analysis. The
31
ACCEPTED MANUSCRIPT
obtained numerical results reveal that the ECCP system is stable when the
delay is bounded. Finally, a Linux-based testbed experimentation is conducted
to evaluate ECCP performance.
PT
As a perspective of this work, we are presently (i) studying the effect of
480 available bandwidth estimation error on ECCP stability, (ii) evaluating ECCP
RI
in larger and various network topologies using our simulator.
SC
Acknowledgment
U
485 Council of Canada (NSERC). Sincere gratitude is hereby extended to Brian
AN
Alleyne and Andre Beliveau for their help and support in constructing this
work.
M
Appendix A. Proof of lemma 1 (stability conditions of the ECCP
rate decrease subsystem)
D
Proof. Starting with ECCP rate decrease subsystem equation (17) that could
TE
be presented as follows:
Gd
y 0 (t) + y(t − τ) + w × T × y 0 (t − T − τ) (y(t) + ζ ) = 0. (A.1)
T
EP
490 Lyapunov has shown that the stability of nonlinear differential equations in
the neighborhood of equilibrium point can be found from their linear version
around the equilibrium point [36] when the Lipschitz condition is satisfied. For
C
delay differential equations [37] has proven that, delay differential equations
AC
32
ACCEPTED MANUSCRIPT
500 Hence, the stability of the delay differential equation is defined by the stability
of the linearized part near the equilibrium point.
Thus, the linear part of the rate decrease subsystem equation becomes:
PT
Gd ζ
y 0 (t) + y(t − τ) + w × T × y 0 (t − T − τ) = 0.
(A.2)
T
RI
We use Taylor series to approximate (A.2) by substituting y(t − τ) and y(t −
T − τ) using (A.3) and (A.4) respectively.
τ 2 00
SC
y(t − τ) ≈ y(t) − τy 0 (t) + y (t). (A.3)
2
505
U
Hence (A.2) becomes:
Gd ζ
y 0 (t) +
T
τ2
2
AN
y(t) − τy 0 (t) + y 00 (t) + w y 0 (t) − (T + τ)y 00 (t) ≈ 0
τ2 1 τ 0 1
( − w(T + τ))y 00 (t) + w + − y (t) + y(t) ≈ 0 (A.5)
Gd ζ T
M
2T T
where G d ζ , 0. Therefore, one can derive the characteristic equation of (A.5)
as:
D
τ2 1 τ 1
− w(T + τ) λ 2 + w + − λ + = 0. (A.6)
2T Gd ζ T T
TE
τ2 τ
where a = 2T − w(T + τ), b = w + 1
Gd ζ − T, and c = 1
T.
510 In order to study the stability of ECCP rate decrease subsystem, the roots
of (A.6) must be either (i) complex roots with negative real part for a system
C
with stable spiral point (Fig. A.21a) or (ii) negative roots for a system with
AC
33
ACCEPTED MANUSCRIPT
y'(t)
F
b =0
y'(t)
F
b =0
PT
y(t)
As
ym y(t)
RI
p to
t ic As
l in ym
e pt
ot
ic
SC
lin
e
(a) Complex roots with with negative real
part (b) Negative real roots
U
Figure A.21: Phase trajectories of the rate decrease subsystem
AN
b/a > 0 (A.9)
M
Substituting a, b and c in (A.8), we get:
1 τ 2 τ2 1
w+ − <4 − w(T + τ) .
Gd ζ T 2T T
D
Let H = w + 1
Gd ζ , we get:
TE
τ 2 τ τ
H− < 2( ) 2 − 4w − 4w +
T T T
τ 2 τ
( ) + (2H − 4w) − H − 4w > 0.
2
(A.10)
T T
EP
One can say that the right-hand side (RHS) of (A.10) represents a convex func-
2
tion ( dd(τ/T
(R H S)
) 2 = 1 > 0 ) as shown in Fig. A.22. Thus, inequality (A.8) holds
when RH S < 0 where τ/T < min(roots) and τ/T > max(roots). Hence, we
C
p
−2H + 4w ± (2H − 4w) 2 + 4(H 2 + 4w)
r 1, 2 =
2
p
r 1, 2 = −H + 2w ± (H − 2w) 2 + (H 2 + 4w)
p
r 1, 2 = 2w − H ± H 2 − 4wH + 4w 2 + H 2 + 4w
1 p
r 1, 2 = w − ± 2H 2 − 4wH + 4w 2 + 4w. (A.11)
Gd ζ
34
ACCEPTED MANUSCRIPT
τ/T
PT
r1 r2
RI
Figure A.22: Roots r 1, 2 of a convex function
SC
By substituting with the value of H, we get:
s
1 2
r 1, 2 = w − ± 2w 2 − + 4w. (A.12)
Gd ζ (G d ζ ) 2
U
Thus, inequality (A.8) holds when:
T
τ
< w − 1
Gd ζ
AN
−
q
q
2w 2 − (G d2ζ)2 + 4w
(A.13)
τ > w − 1 + 2w 2 − 2 2 + 4w.
T G d ζ (G d ζ)
M
One can conclude that inequality (A.8) does not hold because τ/T by definition
520 must be limited by a certain value k (τ/T < k, where k ∈ R+ ). Therefore, (A.6)
D
does not have complex roots and ECCP rate decrease subsystem does not have
a stable spiral point.
TE
√
−b ± b2 − 4ac
< 0. (A.15)
AC
2a
By treating A.14 similar to (A.8), we get:
s
1 2
w− − 2w 2 − + 4w > τ/T
Gd ζ (G d ζ ) 2
s
1 2
<w− + 2w 2 − + 4w. (A.16)
Gd ζ (G d ζ ) 2
35
ACCEPTED MANUSCRIPT
As τ and T are always greater than zero, we can consider the positive root only.
s
1 2
τ/T < w −
PT
+ 2w 2 − + 4w. (A.17)
Gd ζ (G d ζ ) 2
The second condition (Inequality A.15) can be simplified as follows:
−b
RI
(1 ± 1 − 4ac/b2) < 0.
p
(A.18)
2a
530 This condition holds in one of the following two states: (i)
SC
−b/a > 0
(A.19)
1 ± 1 − 4ac/b2 < 0.
p
U
Or:
−b/a < 0
AN
1 ± 1 − 4ac/b2 > 0.
p (A.20)
The second part of the first state (A.19) does not hold in its worst case when
M
p
we consider the positive root; i.e 1 + 1 − 4ac/b2 will never be less than zero.
Thus we consider only the second state. The worst case of the second part of
D
p
− 1 − 4ac/b2 > −1
1 − 4ac/b2 > 1
EP
For all b2 > 0 and c = 1/T > 0, we conclude that c must be greater than 0.
C
Consequently, b must be greater than zero (first part of A.20) when −a is greater
than zero.
AC
Hence, to satisfy the second inequality A.20, these conditions must hold:
−a > 0
τ2
wT + wτ − >0
2T
τ τ
−1/2( ) 2 + w + w > 0, (A.22)
T T
36
ACCEPTED MANUSCRIPT
and
b>0
PT
1 τ
w+ − >0
Gd ζ T
τ 1
<w+ . (A.23)
Gd ζ
RI
T
535 Dissimilar to inequality (A.10), the RHS of (A.22) represents a concave func-
τ
tion. Thus, inequality (A.22) holds when min(r 1, 2 ) < < max(r 1, 2 ), where the
SC
T
roots r 1, 2 of (A.22) are:
p
r 1, 2 = w ± w 2 + 2w. (A.24)
U
Hence, inequality (A.22) holds when:
p AN p
w − w 2 + 2w < τ/T < w + w 2 + 2w. (A.25)
Because τ and T ∈ R+ , we consider only the positive root, thus (A.25) becomes:
M
540
p
τ/T < w + w 2 + 2w. (A.26)
D
To conclude, ECCP rate decrease subsystem is stable with a stable point (Fig.
A.21b) when inequalities (A.17), (A.23) and (A.26) hold.
TE
545
√ √
G d ζ , w + w + 2w, w − G d ζ + 2H − 4wH + 4w + w .
1 2 1 2 2
C
M T R − AvTM×C (y(t) + ζ )
y 0 (t) = ×
AvT × C 2 × TBC
M ζ y(t)
y 0 (t) = × TR − − . (B.1)
2 × AvT × C × TBC 2 × TBC 2 × TBC
37
ACCEPTED MANUSCRIPT
PT
1
λ2 + λ=K (B.2)
2 × TBC
where:
RI
M ζ
K= TR −
2 × AvT × C × TBC 2 × TBC
SC
1
M × TR − Aeq
= − AvT
2 × AvT × C × TBC 2 × TBC
1
= M × T R − (1 − Aeq AvT )C . (B.3)
2 × AvT × C × TBC
U
550 The phase trajectories of (B.2) can be drawn using the Isoclinal method [38].
y'(t)
AN
Fig. B.23a and B.23b show the phase trajectories of the self-increase subsystem
As
ym y'(t)
pt
ot
M
ic
lin
e
K>0
y(t) y(t)
D
K<0
As
ym
pt
TE
ot
ic
lin
e
EP
Figure B.23: Phase trajectories in the self-increase subsystem (K > 0 and K < 0)
C
38
ACCEPTED MANUSCRIPT
y'(t)
Fb
=0
Rate Decrease
PT
l5
l2
y(t)
RI
l4 l1
l3
SC
Rate Increase
U
560
AN
Combining Fig. A.21b, B.23a and Fig. B.23b for the rate decrease and self-
increase subsystems, we get Fig. B.24. In this figure, one can notice that if the
system starts in the self-increase subsystem, it follows line l 1 (K > 0) toward
M
the asymptotic line (Fb = 0) or it follows line l 3 (K ≤ 0) for 5 cycles till ECCP
enters AI stage and T R is increased. Then it follows l 4 toward the asymptotic
D
line. Afterward, the system follows either line l 2 coming from FR stage to rate
565 decrease subsystem or l 5 from the AI stage to the rate decrease subsystem. Both
TE
trajectories lead ECCP toward the equilibrium point as shown in Fig. B.24.
Therefore, ECCP rate increase subsystem is not stable, and the stability
of ECCP system mainly depends on the sliding mode motion [7] from self-
EP
increase subsystem into the rate decrease subsystem when the system crosses
570 the asymptotic line (Fb = 0).
C
Proof. To avoid data accumulation in the queue, the integral of the self-increase
function from t to t + (T + 2τ) must be less than the available bandwidth margin
as depicted by (C.1).
t + (T +2τ)
M R0 (t)dt < AvT Aeq C. (C.1)
t
39
ACCEPTED MANUSCRIPT
575 Since ECCP is a discrete system and R(t) is constant within control cycle, (C.1)
could be approximated within one control cycle to:
PT
M R0 (t)(T + 2τ) < AvT Aeq C
(T R − R)
RI
M (T + 2τ) < AvT Aeq C.
2TBC
SC
At the equilibrium point R = (1 − AvT Aeq )C/M, and T R > C/M.
U
2TBC
(C − C − AvT Aeq C)
(T + 2τ) < AvT Aeq C
2TBCAN (T + 2τ) < 2TBC
2BC
(T + 2τ) <
M
C/M
C(T + 2τ)
BC > . (C.2)
2M
D
TE
580 References
585 [3] I. 802.1, The data center bridging (DCB) task group (2013).
URL http://www.ieee802.org/1/pages/dcbridges.html
40
ACCEPTED MANUSCRIPT
[4] M. Snir, The future of supercomputing, in: Proceedings of the 28th ACM
International Conference on Supercomputing, ICS ’14, ACM, New York,
NY, USA, 2014, pp. 261–262. doi:10.1145/2597652.2616585.
PT
590 [5] S. Bailey, T. Talpey, The architecture of direct data placement (DDP) and
remote direct memory access (RDMA) on internet protocols, Architecture.
RI
URL https://tools.ietf.org/html/rfc4296
SC
[6] P. Kale, A. Tumma, H. Kshirsagar, P. Ramrakhyani, T. Vinode, Fibre
channel over ethernet: A beginners perspective, in: 2011 International
595 Conference on Recent Trends in Information Technology (ICRTIT), 2011,
U
pp. 438–443. doi:10.1109/ICRTIT.2011.5972328.
AN
[7] V. Utkin, Variable structure systems with sliding modes, IEEE Transac-
tions on Automatic Control 22 (2) (1977) 212–222. doi:10.1109/TAC.
1977.1101446.
M
600 [8] A. Varga, R. Hornig, An overview of the omnet++ simulation environment,
in: Proceedings of the 1st international conference on Simulation tools and
D
605 [9] IEEE standard for local and metropolitan area networks–media access con-
EP
[11] IEEE standard for local and metropolitan area networks– virtual bridged
local area networks amendment 13: Congestion notification, IEEE Std
41
ACCEPTED MANUSCRIPT
PT
[12] M. Alizadeh, B. Atikoglu, A. Kabbani, A. Lakshmikantha, R. Pan, B. Prab-
hakar, M. Seaman, Data center transport mechanisms: Congestion control
theory and IEEE standardization, in: 2008 46th Annual Allerton Confer-
RI
620 ence on Communication, Control, and Computing, 2008, pp. 1270–1277.
doi:10.1109/ALLERTON.2008.4797706.
SC
[13] A. Kabbani, M. Alizadeh, M. Yasuda, R. Pan, B. Prabhakar, AF-QCN:
Approximate fairness with quantized congestion notification for multi-
U
tenanted data centers, in: 2010 18th IEEE Symposium on High Perfor-
625 mance Interconnects, 2010, pp. 58–65. doi:10.1109/HOTI.2010.26.
AN
[14] Y. Zhang, N. Ansari, Fair quantized congestion notification in data center
networks, IEEE Transactions on Communications 61 (11) (2013) 4690–
M
4699. doi:10.1109/TCOMM.2013.102313.120809.
630 for limited queue fluctuation in data center networks, in: 2013 IEEE 2nd
International Conference on Cloud Networking (CloudNet), 2013, pp. 42–
TE
49. doi:10.1109/CloudNet.2013.6710556.
42
ACCEPTED MANUSCRIPT
PT
645
RI
URL http://dl.acm.org/citation.cfm?id=2228298.2228324
SC
650 A. Vahdat, Y. Wang, D. Wetherall, D. Zats, TIMELY: RTT-based conges-
tion control for the datacenter, SIGCOMM Comput. Commun. Rev. 45 (4)
(2015) 537–550. doi:10.1145/2829988.2787510.
U
[21] C. So-In, R. Jain, J. Jiang, Enhanced forward explicit congestion notifi-
655
AN
cation (e-fecn) scheme for datacenter ethernet networks, in: 2008 Interna-
tional Symposium on Performance Evaluation of Computer and Telecom-
munication Systems, 2008, pp. 542–546.
M
[22] L. Jose, L. Yan, M. Alizadeh, G. Varghese, N. McKeown, S. Katti, High
speed networks need proactive congestion control, in: Proceedings of the
D
43
ACCEPTED MANUSCRIPT
PT
doi:https://doi.org/10.1016/j.jnca.2017.05.008.
675 URL http://www.sciencedirect.com/science/article/pii/
S1084804517302060
RI
[26] L. Tassiulas, A. Ephremides, Stability properties of constrained queueing
systems and scheduling policies for maximum throughput in multihop radio
SC
networks, IEEE Transactions on Automatic Control 37 (12) (1992) 1936–
680 1948. doi:10.1109/9.182479.
U
[27] J. Liu, N. B. Shroff, C. H. Xia, H. D. Sherali, Joint congestion control
and routing optimization: An efficient second-order distributed approach,
AN
IEEE/ACM Transactions on Networking 24 (3) (2016) 1404–1420. doi:
10.1109/TNET.2015.2415734.
M
685 [28] S. Ekelin, M. Nilsson, E. Hartikainen, A. Johnsson, J. E. Mangs, B. Me-
lander, M. Bjorkman, Real-time measurement of end-to-end available
D
10.1109/NOMS.2006.1687540.
690 [29] A. Charny, D. D. Clark, R. Jain, Congestion control with explicit rate indi-
EP
[30] G. Raina, D. Towsley, D. Wischik, Part II: Control theory for buffer sizing,
AC
44
ACCEPTED MANUSCRIPT
PT
munication and Networks (ICCCN), 2015, pp. 1–8. doi:10.1109/ICCCN.
2015.7288483.
RI
705 [33] W. Jiang, F. Ren, C. Lin, Phase plane analysis of quantized congestion no-
tification for data center ethernet, IEEE/ACM Transactions on Networking
SC
23 (1) (2015) 1–14. doi:10.1109/TNET.2013.2292851.
[34] B. Hubert, et al., Linux advanced routing & traffic control howto, setembro
U
de.
710 AN
[35] M. Devera, Hierarchical token bucket theory (2002).
URL http://luxik.cdi.cz/~devik/qos/htb/manual/theory.htm
[37] R. D. Driver, Ordinary and delay differential equations, Vol. 20, Springer
715 Science & Business Media, 2012.
TE
10.1109/TSMC.1977.4309773.
C
AC
45
Author MANUSCRIPT
ACCEPTED biographies
PT
Halima Elbiaze received the Master degree from University of Versailles, France, in 1998, and the
RI
Phd from University of Versailles, in March 2002. She is a professor at Université du Québec à
Montréal since June 2003. In 2005, Dr. Elbiaze received the Canada Foundation for Innovation
Award to build her IP over the DWDM network Laboratory. Her research interests include
SC
intelligent optical networks, performance evaluation, traffic engineering, wireless networks, and next
generation IP networks. She is the author or coauthor of many journal and conference papers. Her
research interests include network performance evaluation, traffic engineering, and quality of service
management in optical and wireless networks. She is member of IEEE and OSA.
U
AN
Bochra Boughzala received her engineering national diploma form INSAT,
Tunisia in 2011 and her Master degree in Computer Science from UQAM in
2013. In 2013, she joined Ericsson Research group in Montreal where she now
M
programmability and software defined networking (SDN), congestion control and traffic
management with new interest in 5G Ethernet fronthauling and information centric networking
TE
PT
- Proposing ECCP that controls transmission rate using estimated available bandwidth.
RI
- Deducing the stability conditions of ECCP using the phase plane method.
SC
- Validating the ECCP performance using simulation and testbed implementation.
U
AN
M
D
TE
C EP
AC