predicting QoE factors-2018
predicting QoE factors-2018
Abstract—Classic network control techniques have as sole resulting in different file sizes. The availability of multiple
objective the fulfillment of Quality-of-Service (QoS) metrics, representations for the same video segment enables DASH
being quantitative and network-centric. Nowadays, the research clients to scale up or down the video quality by simply
community envisions a paradigm shift that will put the em-
phasis on Quality of Experience (QoE) metrics, which relate selecting the best segment to be downloaded according to
directly to the user satisfaction. Yet, assessing QoE from QoS network status and video player’s buffer.
measurements is a challenging task that powerful Software The way final users perceive the quality of a streamed video
Defined Network controllers are now able to tackle via machine depends on several factors that cannot be all measured. This
learning techniques. In this paper we focus on a few crucial perceived quality is denoted as Quality of Experience (QoE).
QoE factors and we first propose a Bayesian Network model
to predict re-buffering ratio. Then, we derive our own novel According to [7], the user experience highly depends on three
Neural Network search method to prove that the BN correctly crucial factors: (i) the visual quality and its variation, (ii) the
captures the discovered stalling data patterns. Finally, we show frequency and duration of re-buffering events (i.e., stalls or
that hidden variable models based and context information boost interruptions), and (iii) the startup delay. While the visual
performance for all QoE related measures. quality and its variation can be measured using PSNR-based
Index Terms—Software Defined Networking, Quality of Experi-
ence, Bayesian Network, Neural Network Search Method, Graph
metrics when traffic is not encrypted, re-buffering events and
Clustering, Hidden Variable Model start-up delay cannot be directly measured, but only predicted
from classic QoS metrics [4]. This allows to infer QoE factors
I. I NTRODUCTION by still relying on legacy, QoS monitoring systems.
Yet, the mapping between QoS and QoE metrics is highly
According to a recent report [1], video traffic will steadily complex, as they often lay in high dimensional spaces and are
grow in the next years, representing 82% of the whole subject to noise. As a consequence, a closed form modeling
Internet traffic by 2021. Therefore, handling video traffic and its experimental validation are not practical. We therefore
so as to maximize the quality perceived by final users is resort to machine learning techniques to derive the complex
becoming critical both for content and network operators. relationships between QoS and QoE metrics.
To this end, Content Delivery Networks (CDNs) operators On a data set produced with a high-fidelity and fully
have adopted coordinated control planes [2] between routing controllable simulation environment, we show that while lo-
and their streaming systems following the recent trend of cal linear relationships hold for the video quality variation
Software Defined Networks (SDN) [3], which has deeply and network measurements, re-buffering events lay in high
transformed the way network architectures are designed and dimension clusters of QoS metrics. For re-buffering events,
controlled. Nonetheless, Internet Service Providers (ISPs) can we present a Bayesian Network (BN) classifier based on two
also contribute to improve the perceived quality of video traffic Logistic Regressions (LR) which better balances the class
by optimizing network resources according to the user needs. accuracy compared to the state of the art method based
However, ISPs can only exploit coarse-grained information on on random forests [8]. Furthermore, we demonstrate that
video flows due to the end-to-end encryption that many Over- due to the stochastic nature of re-buffering events, clusters
The-Top (OTT) operators like Facebook, Google, and Amazon partially overlap, hence increasing the inaccuracy of standard
employ [4]. ISPs are therefore calling for new methods for predictors. A pattern exploration model that we specifically
handling network resources in order to maximize the perceived design using a novel Neural Network (NN) search method
quality in video services, which directly reflects the opinion confirms our intuition that other predictors incur the same or
customers have on the network infrastructure [5]. worse inaccuracy of BN-based methods. Finally, we turn our
HTTP Adaptive Streaming (HAS), which has been stan- attention to hidden variables, namely metrics that cannot be
dardized into MPEG-Dynamic Adaptive HTTP Streaming directly measured but can be still inferred from QoS metrics.
(DASH) [6], represents nowadays the pillar technology for We show that the use of predicted hidden variables as features
video streaming over the Internet. Indeed, HAS connections can indeed improve accuracy for re-buffering events. Finally,
can easily pass through intermediate services like NATs, we show that if we have access to information about network
gateways and proxies without the need of complex network congestion (e.g., number of competing sessions, QoS measures
configurations. Videos are split into temporal segments whose on bottlenecks) and basic characteristics on video streams
duration lasts from a couple up to hundreds of seconds. Each (e.g., type of device, content provider) all predictions of QoE
segment (also knows as chunk) is encoded at different qualities factors can be further improved.
The paper is structured as follow. Sec. II discusses relevant
related work. Sec. III describes the problem of predicting
QoE from QoS measurements and the data set we produced. • Average video bitrate of the downloaded segments.
Sec. IV illustrates our method to classify and predict re- • Average video bitrate variation: the standard deviation
buffering events, while Sec. V presents results for the video of the video bitrate. It quantifies quality changes over the
quality and its variation. Finally, Sec. VI concludes our paper. different downloaded segments.
• Re-buffering ratio: freezing (or stalling) time over the
II. R ELATED W ORK duration of the video streaming session.
Quality of Experience (QoE) has recently gained momen- Our aim in this paper is to infer the three aforementioned QoE
tum as a way to assess the user opinion of the network quality factor from the observable QoS metrics described in Tab. I
while watching videos. An additive log-logistic model that using machine learning techniques.
maps video quality, freezing (i.e., stall of the video session),
and image artifacts due to compression and re-buffering events B. Dataset Description
into a QoE score has been firstly proposed in [9] and succes-
To build and evaluate QoS to QoE mapping functions, we
sively adopted by ITU in the Recommendation P.1202.2 as a
have used a high-fidelity and fully controllable simulation
reference model for quantifying QoE [10]. The investigation
environment at both network and streaming levels. The simula-
performed in [11] on how a user perceives the video quality
tion platform is based on the Adaptive Multimedia Streaming
and the main factors that influence this perception resulted in
Simulator Framework (AMust) [17] in ns-3 which implements
the definition of eight mathematical models of QoE. Studies
an HTTP client and server for LibDASH, one of the reference
like the one presented in [12] provide quantitative methods to
software of ISO/IEC MPEG-DASH standard.
measure the distortion of the received bit-stream due to video
As streaming content, we have chosen 3 representative
quality and freezing. While different in the way they compute
open movies1 commonly used for testing video codecs and
a score for measuring QoE, all these works agree on three main
streaming protocols: Big Buck Bunny (BBB), a cartoon with
impairments that affect the QoE, namely re-buffering events,
a mix of low and high motion scenes, Swiss Account (TSA),
the video quality and its variation. Furthermore, due to the
a sport documentary with regular motion scenes and Red Bull
psychological effect known as memory effect, the repetition
Play Street (RBPS), a sport show with high motion scenes.
of the same impairment during the video session such as
We have considered a star network with a bottleneck link
the experience of multiple video stalls due to re-buffering
as shown in Fig. 1, on top of which we have simulated a
strongly affects the quality perceived by the final user [13]. For
large number scenarios varying the number of nodes (from 1
this reason, both client-side and network-side mechanisms [7],
to 100), the bottleneck capacity (from 500 kbps to 10Mbps
[14], [15] have been recently proposed to prevent or at least
per stream), the bottleneck delay (from 10ms to 100ms), the
minimize re-buffering events and video quality variations.
bottleneck packet loss (from 0% to 3%), screen resolutions
Existing client-side DASH adaptation policies base their
and DASH policies (RB, BB and hybrid). After a month
decisions on several network performance and the internal
of simulations, we have obtained statistics for more than
client state. Rate-Based (RB) policies base their decisions
69,000 video sessions with 50 associated variables from 4
on the measured download throughput, whereas Buffer-Based
categories: Context information on network congestion and
(BB) [14] approaches use the level of the buffer containing the
stream characteristics, QoS metrics, Target QoE factors and
downloaded segments to decide the quality of the next chunk.
Hidden QoE variables (see Tab. I for a complete list). The
A number of hybrid approaches also exist, where the explicit
dataset is meant to become public.
formulation of the optimization problem [7] enables the use
of control theoretic methods.
Machine learning has been recently used to predict QoE
from network measurements [4], [16]. Dimopoulos et al. [4]
shows how the rebuffering ratio, and the average video quality
and its variation, can be predicted using random forests. We
consider this work as a starting point for our research and
present two further contributions: 1) a Bayesian Network
model to predict rebuffering events with a better balance in
class accuracies and 2) the evidence that additional context
information on network congestion and basic characteristics Fig. 1: Simulation environment with AMust in ns-3.
of video streams improve predictions for all QoE factors.
III. F ROM Q O S TO Q O E FACTORS Arguably, out of the 3 target variables we want to predict
RebufferingRatio is the most difficult, especially in its raw
A. Problem Statement
continuous form. To simplify our task we take a similar
We consider three main QoE factors which are commonly
used to measure user-perceived video quality [5]: 1 http://concert.itec.aau.at/SVCDataset/
Index Name Type Description
Class Training Accuracy Validation Accuracy
1 RequestID Context Streaming session identifier
2 NbClients Context Maximum number of streams competing on the bottleneck NoStall 0.96178 0.95525
3 BottleneckBW Context Capacity of the bottleneck
4 BottleneckDelay Context Network delay on the bottleneck MildStall 0.7585 0.73587
5 BottleneckLoss Context Packet loss on the bottleneck
6 DASHPolicy Context DASH policy (e.g, or name of content provider)
SevereStall 0.43874 0.34211
7 ClientResolution Context Client screen resolution or device type (e.g., smartphone)
8 RequestDuration QoS metric Duration of the stream TABLE II: RF class accuracies. Training to validation ratio is 4:1.
[9, 13] TCPOut/InPacket QoS metric Number of TCP packets (In and Out)
[10, 14] TCPOut/InDelay QoS metric Average delay experienced by TCP packets (In and Out)
[11, 15] TCPOut/InJitter QoS metric Average jitter experienced by TCP packets (In and Out)
[12, 16] TCPOut/InPloss QoS metric Packet loss rate experienced by TCP packets (In and Out)
17 TCPInputRetrans QoS metric Packet retransmissions experienced by TCP
18
[19:27]
StdNetworkRate
[0,5,10,25,50,75,90,95,100]
QoS metric
QoS metric
Standard deviation of the network rate
xth quantile for the network rate
IV. S TALLING PREDICTION
NetworkRate (measured in intervals of 2s)
28 StdInterATimesReq QoS metric Std. dev. of inter-arrival times of segment requests
[29:37] [0,5,10,25,50,75,90,95,100] QoS metric xth quantile for the
In this section we use a Bayesian Network (BN) [18] model
38
InterATimesReq
StartUpDelay Hidden
inter-arrival times of segment requests
Initial time at the client to start playing the video
to accurately predict the StallLabel variable from the QoS
39
40
AvgVideoDownloadRate
StdVideoDownloadRate
Hidden
Hidden
Average downloading rate for video segments
Std. dev. of downloading rate for video segments
metrics listed in Tab. I. We then show that StallLabel is formed
41
42
AvgVideoBufferLevel
StdVideoBufferLevel
Hidden
Hidden
Average video buffer length.
Std. dev. of video buffer length
by a mixture of 2 distributions and that if there is a model that
43 StallEvents Hidden Number of stall events predicts accurately the true distribution of a data point we can
44 RebufferingRatio Target Portion of time spent in stall events
45 StallLabel Target Discretization of RebufferingRatio variable get around 97% performance with the proposed BN model.
46 TotalStallingTime Hidden Total duration of stall events
47 AvgTimeStallingEvents Hidden Average duration of stall events Finally, we conjecture through a custom novel neural network
48 AvgQualityIndex Hidden Avg. normalized index of downloaded representations
49 AvgVideoBitRate Target Average video bitrate consumed by the player search method that there is no such model, hence achieving
50 AvgVideoQualityVariation Target Average variation of the video bitrate
51 AvgDownloadBitRate Hidden Average download rate of video segments higher performance with our dataset is unlikely.
TABLE I: Context information, QoS metrics, hidden variables. Target As a benchmark model we use Random Forest (RF) as done
QoE factors, which we want to predict from all other variables, are in [8]. A RF is a bagging of Decision Tree (DT) models, see
highlighted in bold. [19]. At each leaf node, DT greedily selects and splits an input
variable into non-overlapping regions, so that the resulting new
leafs gain predictive power. The bagging procedure essentially
approach as in [8]. The RebufferingRatio values are aggregated tries to minimize the effect of local optimality that stems from
into 3 discrete values in a new variable StallLabel. Firstly, the greedy split procedure. Table II shows the performance of a
RebufferingRatio equals to 0 means that no stalling has RF classifier on the StallLabel variable pruned with minimum
occurred, hence we set StallLabel=NoStall. If it is between leaf size of 50 to prevent over-fitting and training to validation
(0, 0.1) then StallLabel=MildStall. Finally, if RebufferingRatio size ration of 4:1 (the same ratio is used in all the paper).
is above 0.1 then StallLabel is given the value SevereStall. The results show that the RF is making an accurate
Fig. 2 shows the histogram of the 3 target variables. Sim- prediction on the NoStall class of StallLabel while it has
ilarly to the target variables, all other variables’ distribution worse predictions on the MildStall. The performance on the
follow an exponential pattern. For this reason we initially SevereStall class is practically unacceptable. There are 2 most
apply on the input data a logarithmic transformation. commonly occurring problems with RF, i.e., 1) the RF greedy
split procedure result in low quality local optimum; 2) the
In the next sections, we use machine learning techniques to
RF’s rectangular decision regions have boundaries parallel to
derive accurate QoS-QoE mappings given the available data.
the basis of the dimensions, which could fail to capture some
dependencies among features.
10 4
10 For these reasons, we turn our attention to Bayesian Net-
works based on Logistic Regression (LR) predictors. LR is
Count
51155
5
17180 a binary classification model that maximizes the likelihood
794 L(θ) of a target vector of binary values Y given the input
0
NoStall MildStall SevereStall data X for a given prior distribution P (θ) (assumed to be
Re-buffering Ratio
10 4
uniform in our experiments) of the parameter set θ, see Eq.
4 (1). The a-posteriory probability model is assumed to be
P (Yi |X, θ) = (σ[θ, Xi ])Yi (1 − σ[θ, Xi ])1−Yi where σ is the
Count
2
sigmoid function σ[θ, x] = 1+e1−θ·x . In Eq. (2) we report the
0 gradient of the log-likelihood.
0 2 4 6
AvgVideoBitRate [bps] 10 6
10 4
( n
)
5
Y
max L(θ) = P (Yi |X, θ) P (θ) (1)
θ
Count
i=1
( )
∂ log(L(θ)) X ∂ log(P (θ))
0 = Xi (σ[θXi ] − Yi ) + (2)
0 1 2 3 ∂θ i
∂θ
AvgVideoQualityVariation [bps] 10 6
Using the LR probabilistic model we next define the BN
Fig. 2: Histograms for our 3 target variables. we used in Fig. 3 to predict the StallLabel variable.
Data X θ1 6⊥ θ2 |M ildStall
Input: Data X, initial parameters θ1 = 0, θ2 = 0;
θ1 6⊥ θ2 |SevereStall
Output: Prediction for Y =StallLabel
LR1 : P (Yi |X, θ1 ) θ1
X Set X := log(X); Optimize the parameter θ1 of the 1st
LR by maximizing the log-likelihood via gradient descent
If P (Yi |X, θ1 ) ≥ α ∈ [0, 1] method, i.e.:
Y=NoStall LR2 : P {Yi |X, θ2 , P (Yi |X, θ1 ) ≥ α} θ2 ∂ ln(L(θ1t−1 ))
θ1t (i) = θ1t−1 (i) −
P {Yi |X, θ2 , P (Yi |X, θ1 ) ≥ α} ≥ β ∂θ1t−1 (i)
Y=MildStall Y=SevereStall Optimize the decision boundary α of the 1st LR such that: