Get Uncertainty Modelling in Data Science Sébastien Destercke (Ed.) free all chapters
Get Uncertainty Modelling in Data Science Sébastien Destercke (Ed.) free all chapters
com
https://textbookfull.com/product/uncertainty-modelling-in-
data-science-sebastien-destercke-ed/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/advanced-data-analysis-modelling-in-
chemical-engineering-1st-edition-denis-constales/
textboxfull.com
https://textbookfull.com/product/modelling-based-teaching-in-science-
education-1st-edition-john-k-gilbert/
textboxfull.com
The Data Science Design Manual 1st Edition Steven S.
Skiena
https://textbookfull.com/product/the-data-science-design-manual-1st-
edition-steven-s-skiena/
textboxfull.com
https://textbookfull.com/product/research-in-data-science-ellen-
gasparovic/
textboxfull.com
https://textbookfull.com/product/frontiers-in-data-science-matthias-
dehmer/
textboxfull.com
https://textbookfull.com/product/uncertainty-analysis-of-experimental-
data-with-r-1st-edition-benjamin-david-shaw/
textboxfull.com
https://textbookfull.com/product/subsurface-environmental-modelling-
between-science-and-policy-dirk-scheer/
textboxfull.com
Advances in Intelligent Systems and Computing 832
Sébastien Destercke
Thierry Denoeux · María Ángeles Gil
Przemyslaw Grzegorzewski
Olgierd Hryniewicz Editors
Uncertainty
Modelling
in Data
Science
Advances in Intelligent Systems and Computing
Volume 832
Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: [email protected]
The series “Advances in Intelligent Systems and Computing” contains publications on theory,
applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all
disciplines such as engineering, natural sciences, computer and information science, ICT, economics,
business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the
areas of modern intelligent systems and computing such as: computational intelligence, soft computing
including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms,
social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and
society, cognitive science and systems, Perception and Vision, DNA and immune based systems,
self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics including human-machine
teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis,
knowledge management, intelligent agents, intelligent decision making and support, intelligent network
security, trust management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings
of important conferences, symposia and congresses. They cover significant recent developments in the
field, both of a foundational and applicable character. An important characteristic feature of the series is
the short publication time and world-wide distribution. This permits a rapid and broad dissemination of
research results.
Advisory Board
Chairman
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
e-mail: [email protected]
Members
Rafael Bello Perez, Universidad Central “Marta Abreu” de Las Villas, Santa Clara, Cuba
e-mail: [email protected]
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
e-mail: [email protected]
Hani Hagras, University of Essex, Colchester, UK
e-mail: [email protected]
László T. Kóczy, Széchenyi István University, Győr, Hungary
e-mail: [email protected]
Vladik Kreinovich, University of Texas at El Paso, El Paso, USA
e-mail: [email protected]
Chin-Teng Lin, National Chiao Tung University, Hsinchu, Taiwan
e-mail: [email protected]
Jie Lu, University of Technology, Sydney, Australia
e-mail: [email protected]
Patricia Melin, Tijuana Institute of Technology, Tijuana, Mexico
e-mail: [email protected]
Nadia Nedjah, State University of Rio de Janeiro, Rio de Janeiro, Brazil
e-mail: [email protected]
Ngoc Thanh Nguyen, Wroclaw University of Technology, Wroclaw, Poland
e-mail: [email protected]
Jun Wang, The Chinese University of Hong Kong, Shatin, Hong Kong
e-mail: [email protected]
Olgierd Hryniewicz
Editors
Uncertainty Modelling
in Data Science
123
Editors
Sébastien Destercke Przemyslaw Grzegorzewski
CNRS, Heudiasyc Faculty of Mathematics and Information
Sorbonne universités, Université Science
de technologie de Compiègne Warsaw University of Technology
Compiegne, France Warsaw, Poland
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This volume contains the peer-reviewed papers presented at the 9th International
Conference on Soft Methods in Probability and Statistics (SMPS 2018), which
was held in conjunction with the 5th International Conference on Belief Functions
(BELIEF 2018) on 17–21 September 2018 in Compiègne, France. The series of
biannual International Conference on Soft Methods in Probability and Statistics
started in Warsaw in 2002. It then successfully took place in Oviedo (2004), Bristol
(2006), Toulouse (2008), Oviedo/Mieres (2010), Konstanz (2012), Warsaw (2014)
and Rome (2016). SMPS and BELIEF 2018 were organized by the Heudiasyc
laboratory at the Université de Technologie de Compiègne.
Over the last decades, the interest for extensions and alternatives to probability
and statistics has significantly grown in areas as diverse as reliability,
decision-making, data mining and machine learning, optimization, etc. This interest
comes from the need to enrich existing models, in order to include different facets
of uncertainty such as ignorance, vagueness, randomness, conflict or imprecision.
Frameworks such as rough sets, fuzzy sets, fuzzy random variables, random sets,
belief functions, possibility theory, imprecise probabilities, lower previsions,
desirable gambles all share this goal, but have emerged from different needs. By
putting together the BELIEF and SMPS conferences, we hope to increase the
interactions and discussions between the two communities and to converge towards
a more unified view of uncertainty theories.
We also think that the advances, results and tools presented in this volume are
important in the ubiquitous and fast-growing fields of data science, machine
learning and artificial intelligence. Indeed, an important aspect of some of the
learned predictive models is the trust one places in them. Modelling carefully and
with principled methods, the uncertainty associated to the data and the models is
one of the means to increase this trust, as the model will then be able to distinguish
reliable predictions from less reliable ones. In addition, extensions such as fuzzy
sets can be explicitly designed to provide interpretable predictive models,
facilitating user interaction and increasing their trust.
v
vi Preface
Programme Committee
vii
viii Organization
ix
x Contents
1 Introduction
n+1−j
S Xn+1 (t) = , for t ∈ (xj , xj+1 ), j = 0, ..., n. (3)
n+1
The difference between the upper and lower survival functions, called impreci-
sion, is non-zero because of the limited inferential assumptions made, and reflects
the amount of information in the data.
This paper is organized as follows. Section 2 introduces the main idea of
imprecise predictive inference based on ALT and log-rank test. The main novelty
of our approach is that the imprecision results from a classical nonparametric
test, which is the log-rank test, integrated with the Arrhenius function to link
different stress levels. In Sect. 3 we explain why we do not use a single log-rank
test on all stress levels. In Sect. 4 our method is illustrated in two examples.
Section 5 presents some concluding remarks.
Imprecise Statistical Inference for Accelerated Life Testing Data 3
In this section we present new predictive inference based on ALT data and
the log-rank test. The proposed new method consists of two steps. First, the
pairwise log-rank test is used between the stress level Ki and K0 , to get the
intervals [γ i , γ i ] of values γi for which we do not reject the null hypothesis that
the data transformed from level i to level 0, and the original data from level 0,
come from the same underlying distribution, where i = 1, ..., m. With these m
pairs (γ i , γ i ), we define γ = min γ i and γ = max γ i .
Second, we apply the data transformation using γ (γ) for all levels to get
transformed data at level 0 which leads to NPI lower (upper) survival function
S (S). Note that each observation at an increased stress level is transformed
to an interval at level 0, where the interval tends to be larger if a data point
was originally from a higher stress level. If the model fits really well, we expect
most γ i to be quite similar, and also most γ i . The NPI lower survival function is
attained when all data observations at increased stress levels are transformed to
the normal stress level using γ, and the NPI upper survival function results from
the use of γ. If the model fits poorly, γ i are likely to differ a lot, or γ i differ a lot,
or both. Hence, in case of poor model fit, the resulting interval [γ, γ] tends to be
wider than in the case of good model fit. A main novelty of our method is that
imprecision results from pairwise comparisons via a classical test, we comment
further on this in the next section.
In our novel method discussed in Sect. 2, we use pairwise log-rank tests between
stress level Ki and K0 . An alternative would be to use one log-rank test for
the data at all stress levels combined. We now explain why this would not lead
to a sensible method of imprecise statistical inference. Suppose we would test
the null hypothesis that data from all stress levels, transformed using parameter
value γa , originate from the same underlying distribution. Let [γ a , γ a ] be the
interval of such values γa for which this hypothesis is not rejected. If the model
fits very well, we would expect γ a to be close to the γ from Sect. 2 and also γ a
to be close to γ. If however, the model fits poorly, the [γ a , γ a ] interval may be
very small or even empty. Therefore, this leads to less imprecision if the model
fits poorly, and that is the reason why we do the pairwise levels and take the
minimum and the maximum of γ i and γ i , respectively. Then, we are interested in
prediction of one future observation at the normal stress level K0 . So, using the
observations transformed from the increased stress levels K1 , ..., Km as well as
the original data obtained at the normal stress level K0 , we apply NPI to derive
lower and upper survival functions for as described in Sect. 1. The examples in
Sect. 4 illustrate the proposed method of Sect. 2 as well as the problem if we
would use the combined approach for all levels.
4 A. A. H. Ahmadini and F. P. A. Coolen
4 Examples
In this section we present two examples. In example 1 we simulated data at
all levels that correspond to the model for the link function we assume for the
analysis. In example 2 we change these data such that the assumed link function
will not provide a good fit anymore. Together, these examples illustrate our novel
imprecise method, from Sect. 2, as well as the problem that could occur if we
used the log-rank test on all stress levels combined, as discussed in Sect. 3.
4.1 Example 1
The method proposed in Sect. 2 is illustrated in an example, which presents
the temperature-accelerated lifespan test. Data are simulated at three temper-
atures. The normal temperature condition was K0 = 283 and the increased
temperatures stress levels were K1 = 313 and K2 = 353 Kelvin. Ten obser-
vations were simulated from a fully specified model, using the Arrhenius link
function in combination with a Weibull distribution at each temperature. The
Arrhenius parameter γ was set at 5200, and the Weibull distribution at K0 had
shape parameter 3 and scale parameter 7000. This model keeps the same shape
parameter at each temperature, but the scale parameter are linked by the Arrhe-
nius relation, which led to scale parameter 1202.942 at K1 and 183.0914 at K2 .
Imprecise Statistical Inference for Accelerated Life Testing Data 5
Ten units were tested at each temperature, for a total of 30 units used in the
study. The failure times, in hours, are given in Table 1.
To illustrate the log-rank test method using these data, we assume the Arrhe-
nius link function for the data. Note that our method does not assume a paramet-
ric distribution at each stress level. The pairwise log-rank test is used between
K1 and K0 and between K2 and K0 to derive the intervals [γ i , γ i ] of values γi for
which we do not reject the null hypothesis with regard to the well-mixed data
transformation. The resulting intervals [γ i , γ i ] are giving in the first two rows of
Table 2, for three test significance levels. Of course, for larger significance level
the intervals become wider.
According to the accepted intervals in Table 2, we can obtain the NPI lower
and upper survival functions by taking from the pairwise stress level K1 to K0 or
K2 to K0 always the minimum of the γ i and the maximum of the γ i with levels
of significance 0.99, 0.95 and 0.90 values. So, we take γ = min γ i = 3901.267 and
the γ = max γ i = 6563.545 of the pairwise K1 , K0 with 0.99 significance level,
γ = min γ i = 4254.053 and the γ = max γ i = 6251.168 of the pairwise K1 , K0
with 0.95 significance level, and γ = min γ i = 4486.491 and the γ = max γ i =
6017.435 of the pairwise K1 , K0 with 0.90 significance level then transformed
the data to the normal stress level, see Fig. 1(a). In this figure, the lower survival
function S is labeled as S (γ i ) and the upper survival function S is labeled as S
(γ i ). This figure shows that higher significance levels leads to more imprecision
for the NPI lower and upper survival functions.
To illustrate the effect of using the single log-rank test for all stress levels
simultaneously as discussed in Sect. 3, the final row in Table 2 provides the inter-
val [γ a , γ a ] of values γa for all the stress levels together. From this interval we
can again obtain the lower and upper survival functions using NPI, these are
presented in Fig. 1(b). In this example, the data were simulated precisely with
the link function as assumed in our method, so there is not much difference
between the lower and upper survival functions for corresponding significance
levels in Figs. 1(a) and (b). Example 2 will illustrate what happens if the model
does not fit well.
4.2 Example 2
To illustrate our method in case the model does not fit the data well, and also
to show what would have happened if we had used the joint log-rank test in our
method instead of the pairwise tests, we use the same data as in Example 1,
but we change some of these. In Scenario 1 (indicated as Ex 2-1 in Fig. 1), we
multiple the data at level K1 by 1.4. In Scenario 2 (Ex 2-2), we do the same and
in addition we multiply the data at level K2 by 0.8. The resulting data values
are given in the last two columns in Table 1.
For these two scenarios, we have repeated the analysis as described in Exam-
ple 1. The resulting intervals of γ values are given in Tables 3 and 4. Note that
for significance level 0.90 in Scenario 2 the null hypothesis of the joint log-rank
test would be rejected for all values γa , hence we report an empty interval, so
6 A. A. H. Ahmadini and F. P. A. Coolen
clearly our method would not work if we had used this joint test instead of the
pairwise tests.
The NPI lower and upper survival functions in Figs. 1(c) and (e), using our
method as discussed in Sect. 2, have more imprecision. Note that the lower sur-
vival function is identical in both scenarios as the same γ is used, this is because
the increased values at K1 have resulted in smaller values for γ 1 and γ 1 and the
γ in our method is equal to the γ 1 in these cases. In Scenario 2, the observations
at level K2 have decreased, leading to larger γ 2 and γ 2 values, and this leads to
the upper survival functions increasing in comparison to Scenario 1.
If we would have used the joint long-rank test instead of the pairwise tests,
as discussed in Sect. 3, then imprecision would have decreased in these two sce-
narios, as can be seen from Figs. 1(d) and (f). Note that in Fig. 1(f) there are no
lower and upper survival functions corresponding to the use of the joint log-rank
test for significance level 0.90, as this leads to an empty interval of γa values. As
mentioned in Sect. 3, if the model does not fit well, then we are going to sooner
reject the null hypothesis for all the three levels together, see Tables 3 and 4. So
we have a smaller range of values for which we do not reject the null hypoth-
esis. But if the model fits poorly, we actually want a larger range of values, so
increased imprecision. It is obvious that this is achieved by taking the minimum
of the γ i and the maximum of the γ i of the pairwise tests, hence this is our
proposed method in Sect. 2. This is illustrated by Figs. 1(a), (c) and (e).
5 Concluding Remarks
This paper has presented an exploration of the use of a novel statistical method
providing imprecise semi-parametric inference for ALT data, where the impreci-
sion is related to the log-rank test statistics. The proposed method applies the
use of the log-rank test to compare the survival distribution of pairwise stress
levels, in combination with the Arrhenius model finding the accepted interval of
γ values according to the null hypothesis. We explored imprecision through the
use of nonparametric test for the parameter of the link function between different
stress levels, which enabled us to transform the observations at increased stress
levels to interval-valued observations at the normal stress level and achieve fur-
ther robustness. We consider nonparametric predictive inference at the normal
stress level combined with the Arrhenius model linking observations at different
stress levels. We showed why, in our method, we use the imprecision from com-
bined pairwise log-rank tests, and not from a single log-rank test on all stress
levels together. The latter would lead to less imprecision if the model fits poorly,
while our proposed method then leads to more imprecision. In this paper, to illus-
trate basic idea of our novel method, we assumed that failure data are available
at all stress levels including the normal stress level. This may not be realistic. If
there are no failure data at the normal stress level, or only right-censored obser-
vations, then we can apply our method using a higher stress level as the basis
for the combinations, so transform data to that stress level. Then the combined
data at that level could be transformed all together to the normal stress level.
The log-rank test in this approach could be replaced by other comparison tests,
where even the use of tests based on imprecise probability theory [7] could be
explored. This is left as an interesting topic for future research.
References
1. Augustin, T., Coolen, F., de Cooman, G., Troffaes, M.: Introduction to Imprecise
Probabilities. Wiley, Chichester (2014)
2. Coolen, F.: Nonparametric predictive inference. In: International Encyclopedia of
Statistical Science, pp. 968–970. Springer, Berlin (2011)
3. Gehan, E.: A generalized Wilcoxon test for comparing arbitrarily singly-censored
samples. Biometrika 52, 203–224 (1965)
4. Mantel, N.: Evaluation of survival data and two new rank order statistics arising in
its consideration. Cancer Chemother. Rep. 50, 163–170 (1966)
5. Nelson, W.: Accelerated Testing: Statistical Models, Test Plans, and Data Analysis.
Wiley, New Jersey (1990)
6. Peto, R., Peto, J.: Asymptotically efficient rank invariant test procedures. J. R.
Stat. Soc. Ser. A 135, 185–207 (1972)
7. Benavoli, A., Mangili, F., Corani, G., Zaffalon, M., Ruggeri, F.: A Bayesian Wilcoxon
signed-rank test based on the Dirichlet process. In: Proceedings of the 30th Inter-
national Conference on Machine Learning (ICML 2014), pp. 1–9 (2014)
Descriptive Comparison of the Rating
Scales Through Different Scale Estimates:
Simulation-Based Analysis
1 Introduction
The Likert-type scales are frequently used in designing questionnaires to rate
characteristics or attributes that cannot be numerically measured (like satis-
faction, perceived quality, perception...). Although they are easy to answer and
they do not require a special training to use them, respondents often do not
find accurate answers to items and the available statistical methodology to ana-
lyze the data from these questionnaires is rather limited. This is mainly due to
the fact that Likert scales are discrete with a very small number of responses
to choose for each item (often 4 to 7). To overcome this concern, Hesketh et
al. [5] proposed the so-called fuzzy rating scale to allow a complete freedom and
expressiveness in responding, without respondents being constrained to choose
among a few pre-specified responses.
c Springer Nature Switzerland AG 2019
S. Destercke et al. (Eds.): SMPS 2018, AISC 832, pp. 9–16, 2019.
https://doi.org/10.1007/978-3-319-97547-4_2
10 I. Arellano et al.
By drawing the fuzzy number that best represents the respondent’s valua-
tion, the fuzzy rating scale captures the logical imprecision associated with such
variables. Moreover, this fuzzy rating scale allows us to have a rich continuous
scale of measurement, unlike the case of a posterior numerical or fuzzy encoding
(the latter encoding Likert points with fuzzy numbers from a linguistic scale,
and usually made by trained experts).
In previous studies (see Gil et al. [3], Lubiano et al. [6–8]) we have con-
firmed that the results when fuzzy rating scales are considered sometimes differ
importantly from the conclusions drawn from numerically or fuzzy linguistically
encoded Likert values.
As differences can often be even clearer from the dispersion than for the
location perspective, this paper aims to examine, by means of simulation devel-
opments, how location-based ‘scale’ estimates are affected by the considered scale
of measurement.
2 Preliminaries
A (bounded) fuzzy number is a mapping U : R → [0, 1] such that for all
α ∈ [0, 1], the α-level set Uα = {x ∈ R : U (x) ≥ α} if α ∈ (0, 1], and U 0 =
cl{x ∈ R : U (x) > 0} (with ‘cl’ denoting the closure of the set) is a nonempty
compact interval.
In dealing with fuzzy number-valued data, distances will be computed by
considering two different metrics introduced by Diamond and Kloeden [1]: the
2-norm metric ρ2 and the 1-norm metric ρ1 , which for fuzzy numbers U and V
are given by
1
ρ2 (U , V ) = (inf Uα − inf Vα )2 + (sup Uα − sup Vα )2 dα,
2 [0,1]
1 α − inf Vα | + | sup U
α − sup Vα | dα.
ρ1 (U , V ) = | inf U
2 [0,1]
and the sample 1-norm median is the fuzzy number such that for each α
In De la Rosa de Sáa et al. [2] one can find together the most commonly
used location-based scale estimates, namely: the sample Fréchet-type ρ2 -
xn , Me(
Standard Deviation and, for D ∈ {ρ1 , ρ2 } and M ∈ { xn )}, the sam-
ple D-Average Distance Deviation and the sample D-Median Distance
Deviation, which are respectively given by
1 2
n
xn ) =
ρ2 -SD( n ) ,
xi , x
ρ2 (
n i=1
1
n
xn , M ) =
D-ADD( D( xn , M ) = Mei D(
xi , M ), D-MDD( xi , M ) .
n i=1
In fact, fuzzy data will be generated by simulating the four real-valued ran-
dom variables X1 , X2 , X3 and X4 , so that the R×[0, ∞)×[0, ∞)×[0, ∞)-valued
random vector (X1 , X2 , X3 , X4 ) will provide us with the 4-tuples (x1 , x2 , x3 , x4 )
with x1 = center and x2 = radius of the core, and x3 = lower and x4 = upper
spread of the fuzzy number. To each generated 4-tuple (x1 , x2 , x3 , x4 ) we asso-
ciate the fuzzy number Trax1 , x2 , x3 , x4 .
According to the simulation procedure, data have been generated from ran-
dom fuzzy numbers with a bounded reference set and abstracting and mimicking
what we have observed in real-life examples employing the fuzzy rating scale
(FRS). More concretely, fuzzy data have been generated such that
12 I. Arellano et al.
– 100·ω1 % of the data have been obtained by first considering a simulation from
a simple random sample of size 4 from a beta β(p, q) distribution, ordering the
corresponding 4-tuple, and finally computing the values xi . The values of p
and q vary in most cases to cover different distributions (namely, symmetrical
weighting central values, symmetrical weighting extreme values, and asym-
metric ones). In most of the comparative studies involving simulations, the
values from the beta distribution are re-scaled and translated to an interval
[l0 , u0 ] different from [0, 1].
– 100 · ω2 % of the data have been obtained considering a simulation of four
random variables Xi = (u0 − l0 ) · Yi + l0 as follows:
Y1 ∼ β(p, q),
Y2 ∼ Uniform0, min{1/10, Y1 , 1 − Y 1 } ,
Y3 ∼ Uniform0, min{1/5, Y1 − Y2 } ,
Y4 ∼ Uniform 0, min{1/5, 1 − Y1 − Y2 } .
– 100 · ω3 % of the data have been obtained considering a simulation of four
random variables Xi = (u0 − l0 ) · Yi + l0 as follows:
Y1 ∼ β(p, q),
⎧
⎨ Exp(200) if Y1 ∈ [0.25, 0.75]
Y2 ∼ Exp(100 + 4 Y1 ) if Y1 < 0.25
⎩
Exp(500 − 4 Y1 ) otherwise
γ(4, 100) if Y1 − Y2 ≥ 0.25
Y3 ∼
γ(4, 100 + 4 Y1 ) otherwise
γ(4, 100) if Y1 + Y2 ≥ 0.25
Y4 ∼
γ(4, 500 − 4 Y1 ) otherwise.
5 Results
First, FRS data will be simulated in accordance with the above described realistic
simulation procedure. Later, fuzzy data based on a fuzzy rating scale can fairly be
associated/classified in accordance with labels in a Likert scale (more concretely,
with their numerical encoding). This process is to be called “Likertization”.
Furthermore, the associated Likert values could also be later encoded by means
of values from a fuzzy linguistic scale.
Descriptive Comparison of the Rating Scales 13
For carrying out the Likertization, the “minimum distance Likertization cri-
terion” will be employed (see Fig. 2):
Fig. 2. Minimum distance criterion scheme when the reference interval equals [1, k]
In this way, if the considered Likert scale is a k-point one, given a metric
D between fuzzy data and U the free fuzzy response to be classified, then U
is
associated with the integer κ(U ) such that
) = arg
κ(U min , 1{j} ).
D(U
j∈{1,...,k}
1 1
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Table 1. % of simulated samples of size n for which the Euclidean distance between
the sample scale estimate D associated with the FRS and the one associated with
either the NEL (numerically encoded Likert) or the FLS (fuzzy linguistic scale) with
k = 4 different values is greater than ε ∈ {1, 5, 10, 15} and (from top to bottom)
β(p, q) ≡ β(1, 1), β(.75, .75), β(4, 2), and β(6, 1)
% D(FRS) − D(S) > ε (k = 4, β(p, q) ≡ β(1, 1))
D ε = 1 ε = 5 ε = 10 ε = 15
n S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS
ρ
2 SD(
xn )
ρ2 n )
xn , x
ADD(
xn ))
ρ1
ADD(
xn , Me(
ρ2
MDD( n )
xn , x
xn ))
ρ1
MDD(
xn , Me(
% D(FRS) − D(S) > ε (k = 4, β(p, q) ≡ β(0.75, 0.75))
D ε = 1 ε = 5 ε = 10 ε = 15
n S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS
ρ
2 SD(
xn )
ρ2
ADD( n )
xn , x
xn ))
ρ1
ADD(
xn , Me(
ρ2
MDD( n )
xn , x
xn ))
ρ1 xn , Me(
MDD(
% D(FRS) − D(S) > ε (k = 4, β(p, q) ≡ β(4, 2))
D ε = 1 ε = 5 ε = 10 ε = 15
n S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS
ρ
2 SD(
xn )
ρ2
ADD( n )
xn , x
xn ))
ρ1
ADD(
xn , Me(
ρ2
MDD( n )
xn , x
xn ))
ρ1
MDD(
xn , Me(
% D(FRS) − D(S) > ε (k = 4, β(p, q) ≡ β(6, 1))
D ε = 1 ε = 5 ε = 10 ε = 15
n S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS
ρ
2 SD(
xn )
ρ2
ADD( n )
xn , x
xn ))
ρ1 xn , Me(
ADD(
ρ2
MDD( n )
xn , x
xn ))
ρ1 xn , Me(
MDD(
Descriptive Comparison of the Rating Scales 15
Table 2. % of simulated samples of size n for which the Euclidean distance between
the sample scale estimate D associated with the FRS and the one associated with
either the NEL (numerically encoded Likert) or the FLS (fuzzy linguistic scale) with
k = 5 different values is greater than ε ∈ {1, 5, 10, 15} and (from top to bottom)
β(p, q) ≡ β(1, 1), β(.75, .75), β(4, 2), and β(6, 1)
% D(FRS) − D(S) > ε (k = 5, β(p, q) ≡ β(1, 1))
D ε = 1 ε = 5 ε = 10 ε = 15
n S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS
ρ
2 SD(
xn )
ρ2 n )
xn , x
ADD(
xn ))
ρ1
ADD(
xn , Me(
ρ2
MDD( n )
xn , x
xn ))
ρ1
MDD(
xn , Me(
% D(FRS) − D(S) > ε (k = 5, β(p, q) ≡ β(0.75, 0.75))
D ε = 1 ε = 5 ε = 10 ε = 15
n S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS
ρ
2 SD(
xn )
ρ2
ADD( n )
xn , x
xn ))
ρ1
ADD(
xn , Me(
ρ2
MDD( n )
xn , x
xn ))
ρ1 xn , Me(
MDD(
% D(FRS) − D(S) > ε (k = 5, β(p, q) ≡ β(4, 2))
D ε = 1 ε = 5 ε = 10 ε = 15
n S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS
ρ
2 SD(
xn )
ρ2
ADD( n )
xn , x
xn ))
ρ1
ADD(
xn , Me(
ρ2
MDD( n )
xn , x
xn ))
ρ1
MDD(
xn , Me(
% D(FRS) − D(S) > ε (k = 5, β(p, q) ≡ β(6, 1))
D ε = 1 ε = 5 ε = 10 ε = 15
n S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS S = NEL S = FLS
ρ
2 SD(
xn )
ρ2
ADD( n )
xn , x
xn ))
ρ1
ADD(
xn , Me(
ρ2
MDD( n )
xn , x
xn ))
ρ1 xn , Me(
MDD(
16 I. Arellano et al.
The percentages have been quantified over 1000 samples of n ∈ {10, 30, 100}
FRS simulated (with different betas) data with reference interval [0, 100] (this
last fact being irrelevant for the study). On the basis of Tables 1 and 2 we cannot
get very general conclusions, but we can definitely assert that scale measures
mostly vary more from the FRS-based data to the encoded Likert ones.
Furthermore, one can state some approximate behaviour patterns, such as
– for almost all situations, the robust scale estimate (the last one) provides
us with much higher percentages than non-robust ones; more concretely,
ρ1
-MDD( xn , Me(
xn )) is almost generally more sensitive to the change in the
rating scale type; this is especially clear for small samples;
– distances are uniformly lower for k = 5 than for k = 4 when the midpoint
of the 1-level is beta distributed with (p, q) ∈ {(1, 1), (0.75, 0.75), (4, 2)};
when (p, q) = (6, 1) such a conclusion is appropriate for robust estimates
and ε ∈ {1, 5}, but there is no clear conclusion for non-robust estimates or
greater values of ε.
Acknowledgements. The research is this paper has been partially supported by the
Spanish Ministry of Economy, Industry and Competitiveness Grant MTM2015-63971-
P. Its support is gratefully acknowledged.
References
1. Diamond, P., Kloeden, P.: Metric spaces of fuzzy sets. Fuzzy Sets Syst. 35, 241–249
(1990)
2. De la Rosa de Sáa, S., Lubiano, S., Sinova, S., Filzmoser, P.: Robust scale estimators
for fuzzy data. Adv. Data Anal. Classif. 11, 731–758 (2017)
3. Gil, M.A., Lubiano, M.A., De la Rosa de Sáa, S., Sinova, B.: Analyzing data from
a fuzzy rating scale-based questionnaire: a case study. Psicothema 27, 182–191
(2015)
4. Herrera, F., Herrera-Viedma, E., Martı́nez, L.: A fuzzy linguistic methodology to
deal with unbalanced linguistic term sets. IEEE Trans. Fuzzy Syst. 16(2), 354–370
(2008)
5. Hesketh, T., Pryor, R., Hesketh, B.: An application of a computerized fuzzy graphic
rating scale to the psychological measurement of individual differences. Int. J. Man-
Mach. Stud. 29, 21–35 (1988)
6. Lubiano, M.A., De la Rosa de Sáa, S., Montenegro, M., Sinova, B., Gil, M.A.:
Descriptive analysis of responses to items in questionnaires. Why not using a fuzzy
rating scale? Inf. Sci. 360, 131–148 (2016)
7. Lubiano, M.A., Montenegro, M., Sinova, B., De la Rosa de Sáa, S., Gil, M.A.:
Hypothesis testing for means in connection with fuzzy rating scale-based data:
algorithms and applications. Eur. J. Oper. Res. 251, 918–929 (2016)
8. Lubiano, M.A., Salas, A., Gil, M.A.: A hypothesis testing-based discussion on the
sensitivity of means of fuzzy data with respect to data shape. Fuzzy Sets Syst.
328, 54–69 (2017)
9. Puri, M.L., Ralescu, D.A.: Fuzzy random variables. J. Math. Anal. Appl. 114,
409–422 (1986)
10. Sinova, B., Gil, M.A., Colubi, A., Van Aelst, S.: The median of a random fuzzy
number. The 1-norm distance approach. Fuzzy Sets Syst. 200, 99–115 (2012)
Central Moments of a Fuzzy Random
Variable Using the Signed Distance:
A Look Towards the Variance
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com