Monitoring VoIP Call Quality Using Improved Simplified E-Model
Monitoring VoIP Call Quality Using Improved Simplified E-Model
Symposium
Abstract—ITU-T recommendation G.107 introduced the E- of the listeners are considered an important factor in
model, a repeatable way to assess if a network is prepared to estimating accurate scores. Thus, subjective testing using
carry a VoIP call or not. Various studies show that the E- MOS is time consuming, expensive and does not allow real
model is complex with many factors to be used in monitoring time measurement. Consequently, in recent years new
purposes. Consequently, simplified versions of the E-model methods were developed for measuring MOS scores in an
have been proposed to simplify the calculations and focus on objective way (without human perception): PESQ [4], E-
the most important factors required for monitoring the call model [5] and several others.
quality. In this paper, we propose simple correction to a PESQ, Perceptual Evaluation of Speech Quality, is
simplified E-model; we show how to calculate the correction considered an objective method for predicting the speech
coefficients for 4 common codecs (G.711, G.723.1, G.726 and
quality. It is an intrusive testing method which takes into
G.729A) and then we show that its predictions better match
PESQ scores by implementing it in a monitoring application.
account two signals; one is the reference signal while the
other one is the actual degraded signal. Both signals are sent
Keywords-VoIP; E-model; PESQ; Monitoring through the test that uses a PESQ algorithm and the result is
a PESQ score. Consequently, this approach cannot be used to
I. INTRODUCTION monitor real time calls.
Nowadays, a new objective method proposed by TU-T
The evaluation of data networks depends on several G.107 [5] defines the E-model, a mathematical model that
factors. Thus, it is argued that it is not appropriate to use a combines all the impairment factors that affect the voice
single metric to evaluate the quality of data networks. Yet in quality in a single metric called R value that is mapped to
the telephony world, a single number is typically given to MOS scale. The E-model was designed to provide estimated
rate call quality. Such value is used as a basis of monitoring network quality and has shown to be reasonably accurate for
and tuning the network. Voice over Internet Protocol (VoIP) this purpose. It has not been accepted as a valid measurement
is an example of such data network application [1]. tool for live networks. The ITU-T G.107 Recommendation
In previous years, VoIP has become an important [5] states at the beginning of the document that “it is
application and is expected to carry more and more voice considered only estimates for the transmission planning
traffic over TCP/IP networks. In real-time voice applications, purposes and not for actual customer opinion prediction”
the speech quality is impaired by the packet loss, jitter, delay unlike the PESQ [4] which is developed to model subjective
and bandwidth. Consequently, VoIP applications require low tests commonly used in telecommunications to assess the
delay, low packet loss rates, low jitter and sufficient voice quality by human beings.
bandwidth in order not to affect the interaction between call Increasingly and against ITU recommendations, the E-
participants. model is being used nowadays by industry and research as a
VoIP is based on IP network; however IP networks live voice quality measurement tool. Thus, simple versions
frequently provide best effort services, and may not of E-model [1, 6] have been proposed to simplify the
guarantee delay, packet loss, and jitter [2]. So, the prediction complexity of the original E-model [5] and focus on most
of voice quality in different environments and traffic loads important part that affect the VoIP call quality.
may be as important part of network monitoring in order to The objective of our work is to provide a monitoring
measure voice quality and prevent critical problems before system using a simplified version of the E-model corrected
they occur. for 4 common codecs to better predict PESQ MOS scores as
As measuring voice quality is important to the service PESQ is generally considered to provide more accurate
providers and end users, ITU-T provides two test methods predictions of user experience than the E-model.
subjective and objective testing. Subjective testing was This paper is organized as follows: Section 2 describes
considered the earliest attempts on this issue to evaluate the the proposed improved simplified E-model. In Section 3 we
speech quality by giving Mean Opinion Scores (MOS). The show how we derived the correction coefficients used in the
MOS test is one of the widely known accepted tests that give improved simplified E-model. In Section 4 we propose our
a speech quality rating. ITU-T Rec. P.800 [3] presents the results using the derived model by implementing it in a
MOS test procedures as users can rate the speech quality monitoring application. Finally, we conclude and summarize
from 1(Poor) to 5 (Excellent) scale. Of course, the numbers
the paper in Section 5.
978-1-4673-5288-8/13/$31.00
Authorized licensed use limited to: Institut©2013 IEEE
Teknologi Bandung. Downloaded927
on September 14,2021 at 03:57:06 UTC from IEEE Xplore. Restrictions apply.
II. IMPROVED SIMPLIFIED E-MODEL (4)
In this section, we will first give a brief description of the
simplified E-model [6] and then we will describe our TABLE I. CODECS SPECIAL COEFFICIENTS
proposed improvements to the simplified E-model with the Codec a b c
method of calculation of the various parameters used in the
model in order to be applicable in monitoring purposes. G.711 0.18 -27.90 1126.62
G.723.1 5.3k 0.039 -4.2 166.61
A. Simplified E-Model
G.726 24k 0.046 -4.53 168.09
The original E-model is very complex [7] and involved
with many factors. Moreover, the voice processing is not G.729A 0.063 -8.08 311.72
related significantly to the instantaneous judgment of QoS.
Thus, a simplified version of the E-model [6] has been
introduced to focus on the most important parts and 1.1)
afterwards it was used in a monitoring system [2]. This is the basic signal to noise ratio, including noise
model takes in to account the codec and the present network sources such as circuit and room noise. However, currently it
conditions which are the main two factors that affect the is really difficult to calculate directly. Thus, ITU-T
voice quality. The simplified E-model is expressed by
equation (1) by calculating the evaluation value R. G.113 [8] provides the common value of . Since, the
inherent degradation that occurs when converting actual
R= R0 – Icodec – Ipacketloss – Idelay (1) spoken conversation to a network signal and back reduces
the theoretical maximum R-value (94.2) with no
Where R0 represents the basic signal to noise ratio, Idelay impairments to 93.2 [5]. So, we set the R0 value to 93.2.
represents the delays introduced from end to end, Icodec is 1.2)
the codec factor and the Ipacketloss is the packet loss rate
within a particular time. Finally the R value is mapped to is the equipment impairment (codec quality) factors as
MOS score. defined in [8] and [9]. It represents the codec distortion
which leads to voice distortion and impairments arising
B. Improved simplified E-Model because of signal conversions. Nowadays, its value is
The objective of this model is to determine the voice determined by looking up the codec in the ITU-T
quality MOS rating by a simplified modified version of the Recommendation G.113 literature [8] as Table II is part of it.
previous E-model described above. The computational
model consists of a mathematical function of parameters of TABLE II. SOME CODING INFORMATION
the transmission system. The computation itself can be split Encoder References BitRate Ie value
into several elements and can be expressed by the following Type
equation (2). (Kbit/s)
PCM G.711 64 0
(2)
ACELP G.723.1 5.3 19
fitted to PESQ scores which is the standard objective method CS-ACELP G.729A 8 11
defined by ITU-T recommendation P.862 [4], is the
average delay time within specified period and A is the
expectation factor due to the communication system. The 1.3) :
description and method of calculating the previous is the packet loss percentage within a particular
parameters ( , and A) in (2) are as follows: period measured by certain number of packets. The
percentage measured is the loss of packets occurred when the
1) : sender’s packets is not received by the receiver. It can be
as mentioned above is a second order function model expressed by the following formula (5).
corrected with PESQ scores to obtain more accurate results
in our monitoring system. Ry can be expressed by the (5)
following equation (3).
(3) Where DS is the difference between the largest and smallest
sequence number of N packets. Statistics and calculation of
Where is a part of the simplified E-model (1) which is the Real-time Transport Protocol (RTP) packets can be used
corrected with PESQ scores, can be obtained by the to calculate this percentage by the following expression (6).
following expression (4) and a, b, c are codecs coefficients as DS=LS-SS+1 (6)
shown in Table I and derived in section III.