0% found this document useful (0 votes)
17 views9 pages

The Fuzzy Approach to Assessment of ANOVA Results

The document discusses the application of a fuzzy approach to the assessment of ANOVA results, emphasizing the limitations of traditional p-value comparisons in hypothesis testing. It introduces a modified Buckley’s fuzzy approach combined with bootstrap methods to derive fuzzy estimators for key statistics, enhancing decision-making in statistical analysis. The results demonstrate that this fuzzy methodology provides a more nuanced understanding of statistical significance, particularly when p-values are near the critical significance level.

Uploaded by

boynanthas10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views9 pages

The Fuzzy Approach to Assessment of ANOVA Results

The document discusses the application of a fuzzy approach to the assessment of ANOVA results, emphasizing the limitations of traditional p-value comparisons in hypothesis testing. It introduces a modified Buckley’s fuzzy approach combined with bootstrap methods to derive fuzzy estimators for key statistics, enhancing decision-making in statistical analysis. The results demonstrate that this fuzzy methodology provides a more nuanced understanding of statistical significance, particularly when p-values are near the critical significance level.

Uploaded by

boynanthas10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

The Fuzzy Approach to Assessment of ANOVA Results

Jacek Pietraszek1*, Maciej Kołomycki1, Agnieszka Szczotok2, Renata Dwornicka1


1
Department of Software Engineering and Applied Statistics,
Cracow University of Technology,
Al. Jana Pawla II 37, 31-864 Kraków, Poland
{pmpietra, mkolomycki, renata.dwornicka}@gmail.com
2
Institute of Materials Science, Faculty of Materials Science and Metallurgy,
Silesian University of Technology,
ul. Krasinskiego 8, 40-019 Katowice, Poland
[email protected]

Abstract. Typically, the analysis of variance (ANOVA) is used to compare


means in the subsets obtained through the division of a large numerical dataset
by assigning a categorical variable labels to dataset’s values. The test criterion
for the decision on ‘all equal’ vs. ‘not all equal’ is a comparison of the signifi-
cance level described by a well-known p-value and the a priori assigned critical
significance level, α, usually 0.05. This comparison is treated very strictly bas-
ing on the crisp value; however, it should not be so, especially if p-value is near
α, because the certainty of the decision varies rather smoothly from ‘strongly
not’ to ‘no opinion’ to ‘strongly yes’. It is very interesting to analyze such re-
sults on the basis of the fuzzy arithmetic theory, using the modified Buckley’s
fuzzy approach to the statistics combined with the bootstrap approach, because
it may be adopted to the cases where subjective assessments are introduced as
quasi-measurements.

Keywords: analysis of variance, ANOVA, alpha-cuts, fuzzy numbers, fuzzy


statistics, materials science, bootstrap

1 Introduction

In general, uncertainty expresses our lack of knowledge about the future behavior and
about the states of an investigated phenomenon. The oldest solution, provided by
Pascal, uses a probabilistic approach based on the frequency of events; Kolmogorov
however formalized it, basing on an axiomatic approach and Borel's field of sets.
Such an approach assumes implicitly that there is a possibility, at least a potential one,
to replicate the test many times and experimentally determine the associated asymp-
totic frequency. The investigations of Poincare [1] and Hadamard [2] provided at the
end of the 19th century and later in 1963 by Lorenz [3] revealed that a deterministic
system with large sensitivity may lead to chaos, practically indistinguishable from a
probabilistic random system.

* Corresponding author
A different approach to the uncertainty was presented in 1965 by Zadeh [4]. He
proposed to introduce a specific extension of the set algebra: the membership function
with a value varying from 0 to 1. Such an idea has been developed intensively since
the 1960s, in particular, the fuzzy arithmetic defined for real fuzzy numbers by Du-
bois and Prade [5].
In 1968, Zadeh [6] already considered the relation of the probability and fuzzy de-
scriptions of uncertainty, however, it was Buckley [7, 8] who offered in 2005 a con-
sistent approach to fuzzy estimators related to random samples defined as crisp da-
tasets. His concept of a fuzzy estimator is based on the mapping between a source
pair: a significance level and its related confidence interval and the resulting alpha-
cut. Such an approach requires rather difficult analytical transformations and it has
appeared to be rather impractical in a more complicated analysis.
In 2006, Grzegorzewski [9] chose the theory of decision as a starting point and de-
veloped a more general classification that contained three elements: analyzed data,
tested hypotheses and additional assumptions/conditions. The elements may be con-
sidered as fuzzy or non-fuzzy, which leads to many possible combinations. It is very
interesting as a conceptual idea but there is lack of practical instructions for the evalu-
ation of such tests.
Some elements of a fuzzy approach have been adopted by the design of experi-
ments analysis [10], which however, led to a very difficult inference. A similar at-
tempt to use the neural network approximation [11] revealed a large instability of
results affected by a random procedure of neural network identification. This may
also be caused by an unconsidered correlation between variables, which imposes a
selection of the particular pair of triangular norms [12].
In a further analysis the authors adopted Buckley’s approach but his very compli-
cated analytical transformations were replaced with a bootstrap approach [13], which
also led to the obtaining of empirical distributions. The distributions were used to
construct alpha-cuts related to fuzzy estimators.

2 Methods

2.1 Buckley’s fuzzy statistics


The fuzzy number is defined as a fuzzy subset of R [5] described by its membership
function A :

A : R → [ 0,1] . (1)

Alpha-cut A[α] is defined as a non-fuzzy subset of R where the membership func-


tion is greater than or equal to α i.e.:

{
A[α] = x ∈ R : A( x ) ≥ α ∧ α ∈ (0,1] . } (2)
The value of alpha-cut for α = 0 is specifically defined as a closed support of a mem-
bership function i.e.:

{
A[0] = x ∈ R : A( x ) ≠ 0 . } (3)

Buckley restricted the use of fuzzy numbers to the subtype of ‘triangular shaped fuzzy
numbers’, which means that a membership function is a combination of two monoton-
ic functions: left – monotonically increasing and right – monotonically decreasing.
Buckley used their inverse form to define the alpha-cut Q[α] of the fuzzy number Q
as a closed, bounded interval for 0 ≤ α ≤ 1 i.e.:

Q[α]=[q1 (α), q 2 (α)] , (4)

where q1(α) is an increasing function of α, q2(α) is a decreasing function of α and


q1(1) = q2(1).
The key element in Buckley’s approach is the identification of the fuzzy estimator
alpha-cuts with the confidence intervals and the alpha with the significance level. The
confidence interval is denoted as:

[θ1 (β), θ 2 (β)] (5)

where β is used as a symbol of the significance level because the traditional symbol α
collides with the argument of the alpha-cut. The values of β vary from something very
small but different from zero to less than 1 e.g. 0.01 ≤ β < 1. The value of 1 is treated
in a special manner because it results in zero-length interval i.e. in a confidence inter-
val of 0 confidence.
Thus the fuzzy estimator θ of the statistics θ is defined inversely through its al-
pha-cut θ[α] :

θ[α]=[θ1 (α),θ 2 (α)] . (6)

2.2 ANOVA analysis


The one-way analysis of variance (ANOVA) [14] compares means of subsets created
from the large dataset by a categorical classification. If the subdivided data are sam-
pled as i.i.d. (independent and identically distributed) normal random variables with
common variance then the null hypothesis that the means are equal may be tested.
The computational procedure is briefly described below according to the decomposi-
tion of the sum of squares as proposed by Fisher [15].
The raw dataset xij, i = 1,…, k; j = 1,…, ni is divided into k subgroups of nj values
each. The great mean x is evaluated
k ni k
1
x= ∑∑ xij , n = ∑ ni (7)
n i =1 j i =1

and the subgroup also means


ni
1
xi = ∑x ij
. (8)
ni j =1

Then the sum of square between the subgroups is evaluated


k
SS Factor = ∑ ni ( xi − x )
2
(9)
i =1

as well as the sum of squares within the subgroups


ni

SS Error = ∑∑ ( xij − xi ) .
k
2
(10)
i =1 j =1

Next, the mean squares are evaluated

SS Factor
MS Factor = , (11)
k −1

SS Error
MS Error = . (12)
n−k
Finally, F statistics is evaluated

MS Factor
F= . (13)
MS Error

If the assumptions are met, the F statistics should follow Fisher’s distribution with the
degrees of freedom equal to f1 = k – 1 and f2 = n – k, respectively. The significance
level p-Value is evaluated from an inverse cumulative distribution and the null hy-
pothesis is rejected if p-Value is smaller than the critical significance level α assigned
a priori.

2.3 Bootstrap of the dataset


The idea of the bootstrap procedure is based on the iteratively conducted re-sampling
from a raw dataset and generating a pool of alternative datasets [13]. However it
should be noted that re-sampling bases on drawing from i.i.d. values, whereas the raw
dataset does not contain such values. The key point is a linear model [16] constructing
the background of ANOVA:

( )
zij = µ + ai + ε ij , ε ∼ N 0, σ 2 , (14)

where ai is a subgroup effect deviating from a great mean µ.


This model should be identified prior to the bootstrap procedure giving µ, ai, and
residuals rij and then the bootstrap random draw should be taken from the pool of
residuals rij. Iteratively, new bootstrapped datasets will be created based on Eq.14 and
re-sampled residuals and ANOVA procedure will be applied. The final results will
constitute the large set of evaluated p-Values. This set will be a source for evaluation
of confidence intervals and related alpha-cuts.

3 Materials

Nickel-based superalloys are used mainly in aircraft and power-generation turbines


and typically produced by an investment casting process especially useful for making
castings of complex and near-net shape geometry. Studies were performed on the IN
713C superalloy [17]. The polycrystalline castings of IN 713C were produced in the
investment casting process conducted by the Laboratory for Aerospace Materials at
Rzeszow University of Technology in Poland. Finally, castings were cut off. The
cross-sections were included and prepared as metallographic samples from nickel-
based superalloy. The microstructural investigations of the cross-sections of the cast-
ing were carried out by means of the Hitachi S-4200 scanning electron microscope.
The recorded microphotographs were next subjected to a computer-aided image anal-
ysis by means of Met-Ilo program in order to estimate quantitatively the main pa-
rameters describing the (γ+γ’) eutectic islands that occurred in the investigated super-
alloy. The data obtained from the analysis of the GK casting were processed in the
bootstrap analysis in this paper. The data obtained from the image analysis were
transformed by the logarithmic formula (Eq. 15) because the eutectic area cannot
achieve negative values while the ANOVA analysis requires the assumption of a
normally distributed noise without bound limitations:

z = ln( y ) , (15)

where y denotes the obtained eutectic area and z denotes the transformed value.
The source data were divided into 6 groups with 31, 49, 80, 75, 64 and 61 values,
respectively [17]. The technological aim was to check the homogeneity between the
cross-sections.

4 Results

At the beginning, the classic analysis led to ANOVA table with crisp results (Tab.1).
Table 1. ANOVA table for transformed data (source [17]).

Effect SS df MS F p
Trace 3.318 5 0.664 0.411 0.841
Error 572.019 354 1.616 – –
Total 575.338 359 1.603 – –
Such results do not impose the rejection of a homogeneity hypothesis.
Next, the bootstrap procedure resulted in the dataset containing 10.000 records of
bootstrapped F and p values. The dataset size (10.000) was selected for convenience:
an easy selection of exact quantiles and – in reverse – an easy recalculation of the
membership function values associated with confidence intervals.
The obtained values of F and p were sorted, associated with their 1/nth of experi-
mental probability (i.e. 1/10.000 for the mentioned case) and – through the inverse
cumulative probability function – confidence intervals for different values of signifi-
cance level were evaluated.

5 Analysis

The description statistics for the bootstrapped F and p values are presented in Tab.2.

Table 2. The description statistics for bootstrapped F and p values


(the mode is related to the highest histogram class)

Value mean median mode -95% +95%


Fboot 1.414 1.246 0.75…1.00 0.247 3.531
pboot 0.354 0.287 0.00…0.05 0.004 0.941

Fig. 1. Histogram of the bootstrapped F statistics


The distribution of bootstrapped F values is presented in Fig.1. As mentioned by
Szczotok et al. [17], this distribution significantly differs from theoretical F distribu-
tion. It may be caused both by a limited size of source raw data and by only an as-
ymptotical equality of F distribution in Fisher’s theorem.

Fig. 2. Fuzzy estimator of the bootstrapped F statistics

Fig. 3. Histogram of the bootstrapped p values

The fuzzy estimator of the bootstrapped F statistics – constructed on the basis of


one-sided confidence interval – is presented in Fig.2. The procedure of Buckley’s
recalculation of confidence intervals into the alpha membership function imposes that
the maximum is set at zero value i.e. when all means are mutually equal without any
random noise inside groups.
The distribution of bootstrapped p values is presented in Fig.3. The fuzzy estimator of
the bootstrapped p values is presented in Fig.4.
The obtained fuzzy estimator may be used to reject or not the null hypothesis of
ANOVA: simultaneous equality of all means. In the case mentioned here, this leads to
the following fuzzy assessments:

• classic ANOVA F = 0.411 is related to the fuzzy assessment 0.922 (see Fig.2),
• classic ANOVA p = 0.841 is related to the fuzzy assessment 0.922 (see Fig.4).

Fig. 4. Fuzzy estimator of the bootstrapped p values

Firstly, it is worth noting that both assessments are equal, which means that such
fuzzy estimators are consistent in decision making. Secondly, such a relatively large
value means that the hypothesis of means equality may be treated with a high degree
of certainty. Such conclusions are a slightly different in comparison to the statistical
orthodox: ‘not reject’, which does not mean ‘accept’.

6 Conclusion

The authors have proposed a bootstrap approach to obtain fuzzy estimators of


ANOVA key statistics: F value and p value. The aim was to achieve a support for
decision making that would be more convenient than a typical flip-flop decision
scheme in the classic Neyman-Pearson hypothesis testing based on a comparison of p
value with critical significance level α assumed a priori. The classic decision scheme
is especially difficult if p value is near α level.
The proposed scheme bases on Buckley’s approach modified for one-sided confi-
dence intervals. It appeared to be effective and consistent with the classic decision
scheme.
Further activities will be oriented to develop consistent mathematical and algo-
rithmic formalism of such fuzzy-probabilistic mixture and – finally – to adopt such
formalism into DoE (the design of experiments) applications to analyze that cases
where subjective assessments are introduced as quasi-measurements.

References

1. Poincaré, J.H.: Sur le problème des trois corps et les équations de la dynamique.
Divergence des séries de M. Lindstedt. Acta Mathematica 13, 1-270 (1890)
2. Hadamard, J.: Les surfaces à courbures opposées et leurs lignes géodesiques.
Journal de Mathématiques Pures et Appliquées 4, 27-73 (1898)
3. Lorenz, E.N.: Deterministic non-periodic flow. Journal of the Atmospheric
Sciences 20, 130-141 (1963)
4. Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338-353 (1965)
5. Dubois, D., Prade, H.: Fuzzy real algebra: some results. Fuzzy Sets and Systems 2,
327-348 (1979)
6. Zadeh, L.A.: Probability Measures of Fuzzy Events. J Math Anal Appl 23, 421-&
(1968)
7. Buckley, J.J.: Fuzzy statistics: hypothesis testing. Soft Computing 9, 512-518
(2005)
8. Buckley, J.J.: Fuzzy probability and statistics. Springer Verlag, Heidelberg (2006)
9. Grzegorzewski, P.: Decision support under uncertainty. Statistical methods for
imprecise data. EXIT, Warszawa (2006)
10. Pietraszek, J.: Fuzzy Regression Compared to Classical Experimental Design in
the Case of Flywheel Assembly. Lect Notes Artif Int 7267, 310-317 (2012)
11. Pietraszek, J., Gadek-Moszczak, A.: The Smooth Bootstrap Approach to the
Distribution of a Shape in the Ferritic Stainless Steel AISI 434L Powders. Solid
State Phenomen 197, 162-167 (2013)
12. Pietraszek, J.: The Modified Sequential-Binary Approach for Fuzzy Operations on
Correlated Assessments. Lect Notes Artif Int 7894, 353-364 (2013)
13. Shao, J., Tu, D.: The Jackknife and Bootstrap. Springer, New York (1995)
14. Davies, L., Gather, U.: Robust Statistics. In: Gentle, J.E., Hardle, W.K., Mori, Y.
(eds.) Handbook of Computational Statistics, vol. 2, pp. 711-749. Springer,
Heidelberg (2012)
15. Fisher, R.A.: Statistical Methods for Research Workers. Oliver and Boyd Press,
Edinburgh (1925)
16. Rutherford, A.: Introducing ANOVA and ANCOVA - A GLM Approach. SAGE
Publications Ltd., London, Thousand Oaks, New Delhi (2001)
17. Szczotok, A., Nawrocki, J., Gądek-Moszczak, A., Kołomycki, M.: The Bootstrap
Analysis of One-Way ANOVA Stability in the Case of the Ceramic Shell Mould
of Airfoil Blade Casting. Solid State Phenomena 235, 24-30 (2015)

You might also like