Mathematics 06 00088 PDF
Mathematics 06 00088 PDF
3 Department of Medical Informatics and Biostatistics, Iuliu Haţieganu University of Medicine and
Abstract: The correct application of a statistical test is directly connected with information related
to the distribution of data. Anderson–Darling is one alternative used to test if the distribution of
experimental data follows a theoretical distribution. The conclusion of the Anderson–Darling test is
usually drawn by comparing the obtained statistic with the available critical value, which did not
give any weight to the same size. This study aimed to provide a formula for calculation of p-value
associated with the Anderson–Darling statistic considering the size of the sample. A Monte Carlo
simulation study was conducted for sample sizes starting from 2 to 61, and based on the obtained
results, a formula able to give reliable probabilities associated to the Anderson–Darling statistic is
reported.
1. Introduction
Application of any statistical test is made under certain assumptions, and violation of these
assumptions could lead to misleading interpretations and unreliable results [1,2]. One main
assumption that several statistical tests have is related with the distribution of experimental or
observed data (H0 (null hypothesis): The data follow the specified distribution vs. H1 (alternative
hypothesis): The data do not follow the specified distribution). Different tests, generally called
“goodness-of-fit”, are used to assess whether a sample of observations can be considered as a sample
from a given distribution. The most frequently used goodness-of-fit tests are Kolmogorov–Smirnov
[3,4], Anderson–Darling [5,6], Pearson’s chi-square [7], Cramér–von Mises [8,9], Shapiro–Wilk [10],
Jarque–Bera [11–13], D’Agostino–Pearson [14], and Lilliefors [15,16]. The goodness-of-fit tests use
different procedures (see Table 1). Alongside the well-known goodness-of-fit test, other methods
based for example on entropy estimator [17–19], jackknife empirical likelihood [20], on the prediction
of residuals [21], or for testing multilevel survival data [22] or multilevel models with binary
outcomes [23] have been reported in the scientific literature.
Tests used to assess the distribution of a dataset received attention from many researchers (for
testing normal or other distributions) [24–27]. The normal distribution is of higher importance, since
the resulting information will lead the statistical analysis on the pathway of parametric or non-
parametric tests [28–33]. Different normality tests are implemented on various statistical packages
(e.g., Minitab—http://www.minitab.com/en-us/; EasyFit—http://www.mathwave.com/easyfit-
distribution-fitting.html; Develve—http://develve.net/; r(“nortest” nortest)—https://cran.r-
project.org/web/packages/nortest/nortest.pdf; etc.).
Several studies aimed to compare the performances of goodness-of-fit tests. In a Monte Carlo
simulation study conducted on the normal distribution, Kolmogorov–Smirnov test has been
identified as the least powerful test, while opposite Shapiro–Wilks test was identified as the most
powerful test [34]. Furthermore, Anderson–Darling test was found to be the best option among five
normality tests whenever t-statistics were used [35]. More weight to the tails are given by the
Anderson–Darling test compared to Kolmogorov–Smirnov test [36]. The comparisons between
different goodness-of-fit tests is frequently conducted by comparing their power [37,38], using or not
confidence intervals [39], distribution of p-values [40], or ROC (receiver operating characteristic)
analysis [32].
The interpretation of the Anderson–Darling test is frequently made by comparing the AD
statistic with the critical value for a particular significance level (e.g., 20%, 10%, 5%, 2.5%, or 1%) even
if it is known that the critical values depend on the sample size [41,42]. The main problem with this
approach is that the critical values are available just for several distributions (e.g., normal and Weibull
distribution in Table 2 [43], generalized extreme value and generalized logistic [44], etc.) but could be
obtained in Monte Carlo simulations [45]. The primary advantage of the Anderson–Darling test is its
applicability to test the departure of the experimental data from different theoretical distributions,
which is the reason why we decided to identify the method able to calculate its associated p-value as
a function also of the sample size.
D’Augostino and Stephens provided different formulas for calculation of p-values associated to
the Anderson–Darling statistic (AD), along with a correction for small sample size (AD*) [37]. Their
equations are independent of the tested theoretical distribution and highlight the importance of the
sample size (Table 3).
Several Excel implementations of Anderson–Darling statistic are freely available to assist the
researcher in testing if data follow, or do not follow, the normal distribution [46–48]. Since almost all
distributions are dependent by at least two parameters, it is not expected that one goodness-of-fit test
will provide sufficient information regarding the risk of error, because using only one method (one
test) gives the expression of only one constraint between parameters. In this regard, the example
provided in [49] is illustrative, and shows how the presence of a single outlier induces complete
disarray between statistics, and even its removal does not bring the same risk of error as a result of
applying different goodness-of-fit tests. Given this fact, calculation of the combined probability of
independent (e.g., independent of the tested distribution) goodness-of-fit tests [50,51] is justified.
Good statistical practice guidelines request reporting the p-value associated with the statistics of
a test. The sample size influences the p-value of statistics, so its reporting is mandatory to assure a
proper interpretation of the statistical results. Our study aimed to identify, assess, and implement an
explicit function of the p-value associated with the Anderson–Darling statistic able to take into
consideration both the value of the statistic and the sample size.
Mathematics 2018, 6, 88 3 of 16
AD* = AD 1 +
0.75 2.25 ;
+ 2 AD = −n − i =0 (2 i − 1) ln (F( Xi ) + ln (1 − F( Xn−i +1 )) .
1 n
n n n
where H1 is the Shannon entropy for R in nats (the units of information or entropy) (H1(R,n) = −
Σri∙ln(ri)).
Mathematics 2018, 6, 88 4 of 16
Equation (2) gives the smallest possible value for AD. The value of the AD increases with the
increase of the departure between the perfect uniform distribution and the observed distribution (P).
Class t1 t2 t3 Case
0 0 0 1
0 0 1 2
0 1 0 3
“0” if ti < 0.5 0 1 1 4
“1” if ti ≥ 0.5 1 0 0 5
1 0 1 6
1 1 0 7
1 1 1 8
It is not a good idea to use the design presented in Table 4 in its crude form, since it is
transformed to a problem with an exponential (2 n) complexity. The trick is to observe the pattern in
Table 4. In fact, for (n + 1) cases, with different frequencies of occurrence following the model, the
results are given in Table 5.
The complexity of the problem of enumerating all the cases stays with the design presented in
Table 5 at the same order of magnitude with n (we need to list only n + 1 cases instead of 2n).
The frequencies listed in Table 5 are combinations of n objects taken by two (intervals), so instead
of enumerating all 2n cases, it is enough to record only n + 1 cases weighted with their relative
occurrence.
The effect of the pseudo-random generator is significantly decreased (the decrease is a precise
order of magnitude of the binary representation, one unit in log2 transformation: 1 = log22, for the (0,
0.5) and (0.5, 1) split) by doing a stratified random sample.
Mathematics 2018, 6, 88 5 of 16
The extractions of a number from (0, 0.5) and from (0.5, 1) were furthermore made in our
experiment with Mersenne Twister random (if x = Random() with 0 ≤ x < 1 then 0 ≤ x/2 < 1 and 0.5 ≤
0.5 + x/2 < 1). Table 5 provides all the information we need to do the design. For any n, for k from 0 to
n, exactly k numbers are generated as Random()/2, and sorted. Furthermore, exactly n−k numbers are
generated as 0.5 + Random()/2, and the frequency associated with this pattern is n!/(k!∙(n−k)!).
The combinations can also be calculated iteratively: cnk(n,0) = 1, and cnk(n,k) = cnk(n,(k − 1))∙(n
− k + 1)/k for successive 1 ≤ k ≤ n.
yˆ = a0 + a1 x1 / 4 + a2 x 2 / 4 + a3 x 3 / 4 + a4 x (3)
The statistics associated with the proposed model for data presented in Figure 1 are given in
Table 6.
Mathematics 2018, 6, 88 6 of 16
1.000 1.000
0.950 0.950
0.900 0.900
0.850 0.850
0.800 0.800
0.750 0.750
0.700 0.700
0.650 0.650
0.600 0.600
0.550 0.550
0.500 0.500
0 1 2 3 4 5 6 0 100 200 300 400 500
(a) (b)
1000 7
900
6
800
700 5
600
4
500
3
400
300 2
200
1
100
0 0
0 100 200 300 400 500 0 1 2 3 4 5 6
(c) (d)
Figure 1. Probability as function of the AD statistic for a selected case (n = 25) in the Monte Carlo
experiment: (a) p = p(AD); (b) p = p(eAD); (c) α-1 vs. eAD; (d) −ln(α) vs. AD.
Table 6. Proposed model tested for the AD = AD(p) series for n = 25. SST: Sum of Squares: Total; SSRes:
Sum of Squares: Residuals; SSE = Sum of Squares Error.
The analysis of the results presented in Table 6 showed that all coefficients are statistically
significant, and their significance increases from the coefficient of AD 1/4 to the coefficient of the AD.
Furthermore, the residuals of the regression are with ten orders of magnitude less than the total
residuals (F value = 3.4 × 1010). The adjusted determination coefficient has eight consecutive nines.
The model is not finished yet, because we need a model that also embeds the sample size (n).
Inverse powers of n are the best alternatives as already suggested in the literature [43]. Therefore, for
each coefficient (from a0 to a4), a function penalizing the small samples was used similarly:
3. Simulation Results
Twenty-five coefficients were calculated for Equation (5) from 60 values associated to sample
sizes from 2 to 61, based on 500 values of p (0.500 ≤ p ≤ 0.999) and with a step of 0.001. The values of
the obtained coefficients along with the related Student t-statistic are given in Table 7.
Table 7. Coefficients of the proposed model and their Student t-values provided in round brackets.
900
910
920
red (−0.07 to −0.05]
930 light green (−0.05 to −0.03]
940 blue (−0.03 to −0.01]
950 green (−0.01 to 0.01]
960
cyclam (0.01 to 0.03]
yellow (0.03 to 0.05]
970
grey (0.05–0.07]
980
orange (0.07–0.09]
990
1000
7 12 17 22 27 32 37 42 47 52
Figure 2. The effect in differences between classical and stratified random in calculated AD statistic.
30000 8400
7400
25000
6400
20000
Frequency (no)
Frequency (no)
5400
15000 4400
3400
10000
2400
5000 1400
400
0
7.0E-06
7.0×10 -6 5.7E-05
5.7×10 -5 1.1E-04
1.1×10-4 1.6E-04
1.6×10 -4 -600 -18 -15 -12 -9 -6 -3
Value of residuals Value of residuals
A sample of p ranging from 0.500 to 0.995 with a step of 0.005 (100 values), and for n in the same
range (from 2 to 61; 60 values) was extracted from the whole pool of data, and a 3D mesh with 6000
grid points was constructed. Figure 4 represents the differences log10( p − pˆ ) ( p̂ is calculated with
Equation (5)) and the values of the bi,j coefficients given in Table 4. For convenience, the equation for
p̂ and (α ≡ 1 × p) are
−1
4 4
pˆ = 1 − bi , j x i / 4 n− j
i =0 j =0
−1
4 4
ˆ = bi , j x i / 4 n− j
i =0 j =0 .
Figure 4 reveals that the calculated Equation (5) and the expected values (from MC simulation
for AD = AD(p,n)) differ less than 1‰ (−3 on the top of the Z axis). Even more than that, with
departure from n = 2, and from p = 0.500 to n = 61, or to p = 0.999, the difference dramatically decreases
to 10−6 (visible on the Z-axis as −6 moving from n = 2 to n = 61), to 10−9 (visible on the plot visible on
X-axis as −9 moving from p = 0.500 to p = 0.995), and even to 10−15 (visible on the plot on Z-axis as −15
moving on both from p = 0.500 to p = 0.995 and from n = 2 to n = 61). This behavior shows that the
model was designed in a way in which the estimation error ( p − p ˆ ) would be minimal for small α (α
close to 0; p close to 1). A regular half-circle shape pattern, depicted in Figure 4, suggests that an even
more precise method than the one archived by the proposed model must be done with periodic
functions.
Mathematics 2018, 6, 88 9 of 16
-3
Figure 4. 3D plot of the estimation error for data expressed in logarithm scale as function of p (ranging
from 0.500 to 0.999) and n (ranging from 2 to 61).
Figure 5 illustrates, more obviously, this pattern with the peak at n = 2 and p = 0.500.
10
0.60
0.70
0.80 0
12
0.90 24
36
48
1.00 60
Figure 5. 3D plot of the estimation error for untransformed data: Z-axis show the 105∙( p − pˆ ) as a
function of p (ranging from 0.500 to 0.999) and n (ranging from 2 to 61).
Median of residuals expressed in logarithmic scale indicate that half of the points have exactly
seven digits (e.g., 0.98900000 vs. 0.98900004). The cumulative frequencies for the residuals
represented in logarithmic scale also show that 75% have exactly six digits, while over 99% have
exactly five digits. The agreement between the observed Monte Carlo and the regression model is
excellent (r2(n = 30,000) = 0.99999) with a minimum value for the sum of squares of residuals
(0.002485). These results sustain the validity of the proposed model.
4. Case Study
Twenty sets of experimental data (Table 9) were used to test the hypothesis of the normal
distribution:
H0: The distribution of experimental data is not significantly different from the theoretical
normal distribution.
H1: The distribution of experimental data is not significantly different from the theoretical
normal distribution.
Mathematics 2018, 6, 88 10 of 16
Sample
Set ID What the Data Represent? Reference
Size
1 Distance (m) on treadmill test, applied on subject ts with peripheral arterial disease 24 [54]
2 Waist/hip ratio, determined in obese insulin-resistant patients 53 [55]
3 Insulin-like growth factor 2 (pg/mL) on newborns 60 [56]
4 Chitotriosidase activity (nmol/mL/h) on patients with critical limb ischemia 43 [57]
5 Chitotriosidase activity (nmol/mL/h) on patients with critical limb ischemia and on controls 86 [57]
6 Total antioxidative capacity (Eq/L) on the control group 10 [58]
7 Total antioxidative capacity (Eq/L) on the group with induced migraine 40 [53]
8 Mini mental state examination score (points) elderly patients with cognitive dysfunction 163 [59]
9 Myoglobin difference (ng/mL) (postoperative–preoperative) in patients with total hip arthroplasty 70 [60]
10 The inverse of the molar concentration of carboquinone derivatives, expressed in logarithmic scale 37 [61]
11 Partition coefficient expressed in the logarithmic scale of flavonoids 40 [62]
Evolution of determination coefficient in the identification of optimal model for lipophilicity of polychlorinated biphenyls using a
12 30 [63]
genetic algorithm
13 Follow-up days in the assessment of the clinical efficiency of a vaccine 31 [64]
14 Strain ratio elastography to cervical lymph nodes 50 [65]
15 Total strain energy (eV) of C42 fullerene isomers 45 [66]
16 Breslow index (mm) of melanoma lesions 29 [67]
17 Determination coefficient distribution in full factorial analysis on one-cage pentagonal face C40 congeners: dipole moment 44 [68]
18 The concentration of spermatozoids (millions/mL) in males with ankylosing spondylitis 60 [69]
19 The parameter of the Poisson distribution 31 [70]
20 Corolla diameter of Calendula officinalis L. for Bon-Bon Mix × Bon-Bon Orange 28 [71]
Mathematics 2018, 6, 88 11 of 16
Experimental data were analyzed with EasyFit Professional (v. 5.2) [72], and the retrieved AD
statistic, along with the conclusion of the test (Reject H0?) at a significance level of 5% were recorded.
The AD statistic and the sample size for each dataset were used to retrieve the p-value calculated with
our method. As a control method, the formulas presented in Table 3 [43], implemented in an Excel
file (SPC for Excel) [47], were used. The obtained results are presented in Table 10.
A perfect concordance was observed in regard to the statistical conclusion regarding the normal
distribution, when our method was compared to the judgment retrieved by EasyFit. The concordance
of the results between SPC and EasyFit, respectively, with the proposed method, was 60%, with
discordant results for both small (e.g., n = 24, set 1) samples as well as high (e.g., n = 70, set 9) sample
sizes. Normal probability plots (P–P) and the quantile–quantile plots (Q–Q) of these sets show slight,
but not significant deviations from the expected normal distribution (Figure 6).
Without any exceptions, the p-values calculated by our implemented method had higher values
compared to the p-values achieved by SPC for Excel. The most substantial difference is observed for
the largest dataset (set 8), while the smallest difference is noted for the set with 45 experimental data
values (set 15). The lowest p-value was obtained by the implemented method for set 3 (see Table 10);
the SPC for Excel retrieves, for this dataset, a value of 0.0000. The next smallest p-value was observed
for set 8. For both these sets, an agreement related to the statistical decision was found (see Table 10).
Table 10. Anderson–Darling (AD) statistic, associated p-values, and test conclusion: comparisons.
500
2
450
1 400
Expected Normal Value
0 300
250
-1
200
150
-2
100
-32.5 50 0.01 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.99
50 100 150 200 250 300 350 400 450 500 4.0
550-3 -2 -1 0 1 2 3
2.0 Observed value (set 9) 3.8 Theoretical Quantile
3.6
1.5
3.4
1.0
3.2
Expected Normal Value
2.8
0.0
2.6
-0.5
2.4
-1.0 2.2
2.0
-1.5
1.8
-2.0
1.6
-2.5 1.4
Figure 6.
1.6 Normal
1.8 2.0 probability
1.4 2.2 2.4 2.6plots2.8 (P–P)
3.0 and3.4quantile-quantile
3.2 3.6 3.8 -2.5
4.0 plot (Q–Q)
-2.0 -1.5 by example:
-1.0 -0.5 0.0 graphs0.5 for
1.0 set1.5 2.0 2.5
Observed value (set 11) Theoretical Quantile
9 (n = 70) in the first row, and for set 11 (n = 40) in the second row.
Our team has previously investigated the effect of sample size on the probability of Anderson–
Darling test, and the results are published online at http://l.academicdirect.org/Statistics/tests/AD/.
The method proposed in this manuscript, as compared to the previous one, assures a higher
resolution expressed by the lower unexplained variance between the AD and the model using a
formula with a smaller number of coefficients. Furthermore, the unexplained variance of the method
present in this manuscript has much less weight for big “p-values”, and much higher weight for small
“p-values”, which means that is more appropriate to be used for low (e.g., p ~10−5) and very low (p
~10−10) probabilities.
Further research could be done in both the extension of the proposed method and the evaluation
of its performances. The performances of the reported method could be evaluated for the whole range
of sample sizes if proper computational resources exist. Furthermore, the performance of the
implementation could be assessed using game theory and game experiments [73,74] using or not
using diagnostic metrics (such as validation, confusion matrices, ROC analysis, analysis of errors,
etc.) [75,76].
The implemented method provides a solution to the calculation of the p-values associated with
Anderson–Darling statistics, giving proper weight to the sample size of the investigated experimental
data. The advantage of the proposed estimation method, Equation (5), is its very low residual
(unexplained variance) and its very high estimation accuracy at convergence (with increasing of in
and for p near 1). The main disadvantage is related to its out of range p-values for small AD values,
but an extensive simulation study could solve this issue. The worst performances of the implemented
methods are observed when simultaneously n is very low (2 or 3) and p is near 0.5 (50–50%).
Author Contributions: L.J. and S.D.B. conceived and designed the experiments; L.J. performed the experiments;
L.J. and S.D.B. analyzed the data; S.D.B. wrote the paper and L.J. critically reviewed the manuscript.
Acknowledgments: No grants have been received in support of the research work reported in this manuscript.
No funds were received for covering the costs to publish in open access.
Mathematics 2018, 6, 88 13 of 16
References
1. Nimon, K.F. Statistical assumptions of substantive analyses across the General Linear model: A Mini-
Review. Front. Psychol. 2012, 3, 322, doi:10.3389/fpsyg.2012.00322.
2. Hoekstra, R.; Kiers, H.A.; Johnson, A. Are assumptions of well-known statistical techniques checked, and
why (not)? Front. Psychol. 2012, 3, 137, doi:10.3389/fpsyg.2012.00137.
3. Kolmogorov, A. Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Istituto Italiano
degli Attuari 1933, 4, 83–91.
4. Smirnov, N. Table for estimating the goodness of fit of empirical distributions. Ann. Math. Stat. 1948, 19,
279–281, doi:10.1214/aoms/1177730256.
5. Anderson, T.W.; Darling, D.A. Asymptotic theory of certain “goodness-of-fit” criteria based on stochastic
processes. Ann. Math. Stat. 1952, 23, 193–212, doi:10.1214/aoms/1177729437.
6. Anderson, T.W.; Darling, D.A. A Test of Goodness-of-Fit. J. Am. Stat. Assoc. 1954, 49, 765–769,
doi:10.2307/2281537.
7. Pearson, K. Contribution to the mathematical theory of evolution. II. Skew variation in homogenous
material. Philos. Trans. R. Soc. Lond. 1895, 91, 343–414, doi:10.1098/rsta.1895.0010.
8. Cramér, H. On the composition of elementary errors. Scand. Actuar. J. 1928, 1, 13–74,
doi:10.1080/03461238.1928.10416862.
9. Von Mises, R.E. Wahrscheinlichkeit, Statistik und Wahrheit; Julius Springer: Berlin, Germany, 1928.
10. Shapiro, S.S.; Wilk, M.B. An analysis of variance test for normality (complete samples). Biometrika 1965, 52,
591–611, doi:10.1093/biomet/52.3-4.591.
11. Jarque, C.M.; Bera, A.K. Efficient tests for normality, homoscedasticity and serial independence of
regression residuals. Econ. Lett. 1980, 6, 255–259, doi:10.1016/0165-1765(80)90024-5.
12. Jarque, C.M.; Bera, A.K. Efficient tests for normality, homoscedasticity and serial independence of
regression residuals: Monte Carlo evidence. Econ. Lett. 1981, 7, 313–318, doi:10.1016/0165-1765(81)90035-5.
13. Jarque, C.M.; Bera, A.K. A test for normality of observations and regression residuals. Int. Stat. Rev. 1987,
55, 163–172, doi:10.2307/1403192.
14. D’Agostino, R.B.; Belanger, A.; D’Agostino, R.B., Jr. A suggestion for using powerful and informative tests
of normality. Am. Stat. 1990, 44, 316–321, doi:10.2307/2684359.
15. Lilliefors, H.W. On the Kolmogorov-Smirnov test for normality with mean and variance unknown. J. Am.
Stat. Assoc. 1967, 62, 399–402, doi:10.2307/2283970.
16. Van Soest, J. Some experimental results concerning tests of normality. Stat. Neerl. 1967, 21, 91–97,
doi:10.1111/j.1467-9574.1967.tb00548.x.
17. Jänstchi, L.; Bolboacă, S.D. Performances of Shannon’s entropy statistic in assessment of distribution of
data. Ovidius Univ. Ann. Chem. 2017, 28, 30–42, doi:10.1515/auoc-2017-0006.
18. Noughabi, H.A. Two Powerful Tests for Normality. Ann. Data Sci. 2016, 3, 225–234, doi:10.1007/s40745-016-
0083-y.
19. Zamanzade, E.; Arghami, N.R. Testing normality based on new entropy estimators. J. Stat. Comput. Simul.
2012, 82, 1701–1713, doi:10.1080/00949655.2011.592984.
20. Peng, H.; Tan, F. Jackknife empirical likelihood goodness-of-fit tests for U-statistics based general
estimating equations. Bernoulli 2018, 24, 449–464, doi:10.3150/16-BEJ884.
21. Shah, R.D.; Bühlmann, P. Goodness-of-fit tests for high dimensional linear models. Journal of the Royal
Statistical Society. Ser. B Stat. Methodol. 2018, 80, 113–135.
22. Balakrishnan, K.; Sooriyarachchi, M.R. A goodness of fit test for multilevel survival data. Commun. Stat.
Simul. Comput. 2018, 47, 30–47, doi:10.1080/03610918.2016.1186184.
23. Perera, A.A.P.N.M.; Sooriyarachchi, M.R.; Wickramasuriya, S.L. A Goodness of Fit Test for the Multilevel
Logistic Model. Commun. Stat. Simul. Comput. 2016, 45, 643–659, doi:10.1080/03610918.2013.868906.
24. Villaseñor, J.A.; González-Estrada, E.; Ochoa, A. On Testing the inverse Gaussian distribution hypothesis.
Sankhya B 2017, doi:10.1007/s13571-017-0148-8.
25. MacKenzie, D.W. Applying the Anderson-Darling test to suicide clusters: Evidence of contagion at U. S.
Universities? Crisis 2013, 34, 434–437, doi:10.1027/0227-5910/a000197.
26. Müller, C., Kloft, H. Parameter estimation with the Anderson-Darling test on experiments on glass. Stahlbau
2015, 84, 229–240, doi:10.1002/stab.201590081.
Mathematics 2018, 6, 88 14 of 16
27. İçen, D.; Bacanlı, S. Hypothesis testing for the mean of inverse Gaussian distribution using α-cuts. Soft
Comput. 2015, 19, 113–119, doi: 10.1007/s00500-014-1235-7.
28. Ghasemi, A.; Zahediasl, S. Normality tests for statistical analysis: A guide for non-statisticians. Int. J.
Endocrinol. Metab. 2012, 10, 486–489, doi:10.5812/ijem.3505.
29. Hwe, E.K.; Mohd Yusoh, Z.I. Validation guideline for small scale dataset classification result in medical
domain. Adv. Intell. Syst. Comput. 2018, 734, 272–281, doi:10.1007/978-3-319-76351-4_28.
30. Ruxton, G.D.; Wilkinson, D.M.; Neuhäuser, M. Advice on testing the null hypothesis that a sample is drawn
from a normal distribution. Anim. Behav. 2015, 107, 249–252, doi:10.1016/j.anbehav.2015.07.006.
31. Lang, T.A.; Altman, D.G. Basic statistical reporting for articles published in biomedical journals: The
“Statistical Analyses and Methods in the Published Literature” or The SAMPL Guidelines. In Science
Editors’ Handbook; European Association of Science Editors, Smart, P., Maisonneuve, H., Polderman, A.,
Eds.; EASE: Paris, France, 2013. Available online: http://www.equator-network.org/wp-
content/uploads/2013/07/SAMPL-Guidelines-6-27-13.pdf (accessed on 3 January 2018).
32. Curran-Everett, D.; Benos, D.J.; American Physiological Society. Guidelines for reporting statistics in
journals published by the American Physiological Society. J. Appl. Physiol. 2004, 97, 457–459,
doi:10.1152/japplphysiol.00513.2004.
33. Curran-Everett, D.; Benos, D.J. Guidelines for reporting statistics in journals published by the American
Physiological Society: The sequel. Adv. Physiol. Educ. 2007, 31, 295–298, doi: 10.1152/advan.00022.2007.
34. Razali, N.M.; Wah, Y.B. Power comparison of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and
Anderson-Darling tests. J. Stat. Model. Anal. 2011, 2, 21–33.
35. Tui, I. Normality Testing—A New Direction. Int. J. Bus. Soc. Sci. 2011, 2, 115–118.
36. Saculinggan, M.; Balase, E.A. Empirical Power Comparison of Goodness of Fit Tests for Normality in the
Presence of Outliers. J. Phys. Conf. Ser. 2013, 435, 012041.
37. Sánchez-Espigares, J.A.; Grima, P.; Marco-Almagro, L. Visualizing type II error in normality tests. Am. Stat.
2017, doi:10.1080/00031305.2016.1278035.
38. Yap, B.W.; Sim, S.H. Comparisons of various types of normality tests. J. Stat. Comput. Simul. 2011, 81, 2141–
2155, doi:10.1080/00949655.2010.520163.
39. Patrício, M.; Ferreira, F.; Oliveiros, B.; Caramelo, F. Comparing the performance of normality tests with
ROC analysis and confidence intervals. Commun. Stat. Simul. Comput. 2017, 46, 7535–7551,
doi:10.1080/03610918.2016.1241410.
40. Mbah, A.K.; Paothong, A. Shapiro-Francia test compared to other normality test using expected p-value. J.
Stat. Comput. Simul. 2015, 85, 3002–3016, doi:10.1080/00949655.2014.947986.
41. Arshad, M.; Rasool, M.T.; Ahmad, M.I. Anderson Darling and Modified Anderson Darling Tests for
Generalized Pareto Distribution. Pak. J. Appl. Sci. 2003, 3, 85–88.
42. Stephens, M.A. Goodness of fit for the extreme value distribution. Biometrika 1977, 64, 585–588.
43. D’Agostino, R.B.; Stephens, M.A. Goodness-of-Fit Techniques; Marcel-Dekker: New York, NY, USA, 1986; pp.
123, 146.
44. Shin, H.; Jung, Y.; Jeong, C.; Heo, J.-H. Assessment of modified Anderson–Darling test statistics for the
generalized extreme value and generalized logistic distributions. Stoch. Environ. Res. Risk Assess. 2012, 26,
105–114, doi:10.1007/s00477-011-0463-y.
45. De Micheaux, P.L.; Tran, V.A. PoweR: A Reproducible Research Tool to Ease Monte Carlo Power
Simulation Studies for Goodness-of-fit Tests in R. J. Stat. Softw. 2016, 69. Available online:
https://www.jstatsoft.org/article/view/v069i03 (accessed on 10 April 2018).
46. 6ixSigma.org—Anderson Darling Test. Available online:
http://6ixsigma.org/SharedFiles/Download.aspx?pageid=14&mid=35&fileid=147 (accessed on 2 June 2017).
47. Spcforexcel. Anderson-Darling Test for Normality. 2011. Available online:
http://www.spcforexcel.com/knowledge/basic-statistics/anderson-darling-test-for-normality (accessed on
2 June 2017).
48. Qimacros—Data Normality Tests Using p and Critical Values in QI Macros. © 2015 KnowWare
International Inc. Available online: http://www.qimacros.com/hypothesis-testing//data-normality-
test/#anderson (accessed on 2 June 2017).
49. Jäntschi, L.; Bolboacă, S.D. Distribution Fitting 2. Pearson-Fisher, Kolmogorov-Smirnov, Anderson-
Darling, Wilks-Shapiro, Kramer-von-Misses and Jarque-Bera statistics. Bull. Univ. Agric. Sci. Vet. Med. Cluj-
Napoca Hortic. 2009, 66, 691–697.
Mathematics 2018, 6, 88 15 of 16
50. Mosteller, F. Questions and Answers—Combining independent tests of significance. Am. Stat. 1948, 2, 30–
31, doi:10.1080/00031305.1948.10483405.
51. Bolboacă, S.D.; Jäntschi, L.; Sestraş, A.F.; Sestraş, R.E.; Pamfil, D.C. Pearson-Fisher Chi-Square Statistic
Revisited. Information 2011, 2, 528–545, doi:10.3390/info2030528.
52. Rahman, M.; Pearson, L.M.; Heien, H.C. A Modified Anderson-Darling Test for Uniformity. Bull. Malays.
Math. Sci. Soc. 2006, 29, 11–16.
53. Matsumoto, M.; Nishimura, T. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-
random number generator (PDF). ACM Trans. Model. Comput. Simul. 1998, 8, 3–30,
doi:10.1145/272991.272995.
54. Ciocan, A.; Ciocan, R.A.; Gherman, C.D.; Bolboacă, S.D. Evaluation of Patients with Lower Extremity
Peripheral Artery Disease by Walking Tests: A Pilot Study. Not. Sci. Biol. 2017, 9, 473–479,
doi:10.15835/nsb9410168.
55. Răcătăianu, N.; Bolboacă, S.D.; Sitar-Tăut, A.-V.; Marza, S.; Moga, D.; Valea, A.; Ghervan, C. The effect of
Metformin treatment in obese insulin-resistant patients with euthyroid goiter. Acta Clin. Belg. Int. J. Clin.
Lab. Med. 2018, doi:10.1080/17843286.2018.1439273.
56. Hășmășanu, M.G.; Baizat, M.; Procopciuc, L.M.; Blaga, L.; Văleanu, M.A.; Drugan, T.C.; Zaharie, G.C.;
Bolboacă, S.D. Serum levels and ApaI polymorphism of insulin-like growth factor 2 on intrauterine growth
restriction infants. J. Matern.-Fetal Neonatal Med. 2018, 31, 1470–1476, doi:10.1080/14767058.2017.1319921.
57. Ciocan, R.A.; Drugan, C.; Gherman, C.D.; Cătană, C.-S.; Ciocan, A.; Drugan, T.C.; Bolboacă, S.D. Evaluation
of Chitotriosidase as a Marker of Inflammatory Status in Critical Limb Ischemia. Ann. Clin. Lab. Sci. 2017,
47, 713–719.
58. Bulboacă, A.E.; Bolboacă, S.D.; Stănescu, I.C.; Sfrângeu, C.-A. Bulboacă, A.C. Preemptive Analgesic and
Anti-Oxidative Effect of Curcumin for Experimental Migraine. BioMed Res. Int. 2017, 2017, 4754701,
doi:10.1155/2017/4754701.
59. Bulboacă, A.E.; Bolboacă, S.D.; Bulboacă, A.C.; Prodan, C.I. Association between low thyroid-stimulating
hormone, posterior cortical atrophy and nitro-oxidative stress in elderly patients with cognitive
dysfunction. Arch. Med. Sci. 2017, 13, 1160–1167, doi:10.5114/aoms.2016.60129.
60. Nistor, D.-V.; Caterev, S.; Bolboacă, S.D.; Cosma, D.; Lucaciu, D.O.G.; Todor, A. Transitioning to the direct
anterior approach in total hip arthroplasty. Is it a true muscle sparing approach when performed by a low
volume hip replacement surgeon? Int. Orthopt. 2017, 41, 2245–2252, doi:10.1007/s00264-017-3480-8.
61. Bolboacă, S.D.; Jäntschi, L. Comparison of QSAR Performances on Carboquinone Derivatives. Sci. World J.
2009, 9, 1148–1166, doi:10.1100/tsw.2009.131.
62. Harsa, A.M.; Harsa, T.E.; Bolboacă, S.D.; Diudea, M.V. QSAR in Flavonoids by Similarity Cluster
Prediction. Curr. Comput.-Aided Drug Des. 2014, 10, 115–128, doi:10.2174/1573409910666140410104542.
63. Jäntschi, L.; Bolboacă, S.D.; Sestraş, R.E. A Study of Genetic Algorithm Evolution on the Lipophilicity of
Polychlorinated Biphenyls. Chem. Biodivers. 2010, 7, 1978–1989, doi:10.1002/cbdv.200900356.
64. Chirilă, M.; Bolboacă, S.D. Clinical efficiency of quadrivalent HPV (types 6/11/16/18) vaccine in patients
with recurrent respiratory papillomatosis. Eur. Arch. Oto-Rhino-Laryngol. 2014, 271, 1135–1142,
doi:10.1007/s00405-013-2755-y.
65. Lenghel, L.M.; Botar-Jid, C.; Bolboacă, S.D.; Ciortea, C.; Vasilescu, D.; Băciuț, G.; Dudea, S.M. Comparative
study of three sonoelastographic scores for differentiation between benign and malignant cervical lymph
nodes. Eur. J. Radiol. 2015, 84, 1075–1082, doi:10.1016/j.ejrad.2015.02.017.
66. Bolboacă, S.D.; Jäntschi, L. Nano-quantitative structure-property relationship modeling on C42 fullerene
isomers. J. Chem. 2016, 2016, 1791756, doi:10.1155/2016/1791756.
67. Botar-Jid, C.; Cosgarea, R.; Bolboacă, S.D.; Șenilă, S.; Lenghel, M.L.; Rogojan, L.; Dudea, S.M. Assessment
of Cutaneous Melanoma by Use of Very- High-Frequency Ultrasound and Real-Time Elastography. Am. J.
Roentgenol. 2016, 206, 699–704, doi:10.2214/AJR.15.15182.
68. Jäntschi, L.; Balint, D.; Pruteanu, L.L.; Bolboacă, S.D. Elemental factorial study on one-cage pentagonal face
nanostructure congeners. Mater. Discov. 2016, 5, 14–21, doi:10.1016/j.md.2016.12.001.
69. Micu, M.C.; Micu, R.; Surd, S.; Girlovanu, M.; Bolboacă, S.D.; Ostensen, M. TNF-a inhibitors do not impair
sperm quality in males with ankylosing spondylitis after short-term or long-term treatment. Rheumatology
2014, 53, 1250–1255, doi:10.1093/rheumatology/keu007.
70. Sestraş, R.E.; Jäntschi, L.; Bolboacă, S.D. Poisson Parameters of Antimicrobial Activity: A Quantitative
Structure-Activity Approach. Int. J. Mol. Sci. 2012, 13, 5207–5229, doi:10.3390/ijms13045207.
Mathematics 2018, 6, 88 16 of 16
71. Bolboacă, S.D.; Jäntschi, L.; Baciu, A.D.; Sestraş, R.E. Griffing’s Experimental Method II: Step-By-Step
Descriptive and Inferential Analysis of Variances. JP J. Biostat. 2011, 6, 31–52.
72. EasyFit. MathWave Technologies. Available online: http://www.mathwave.com (accessed on 25 March
2018).
73. Arena, P.; Fazzino, S.; Fortuna, L.; Maniscalco, P. Game theory and non-linear dynamics: The Parrondo
Paradox case study. Chaos Solitons Fractals 2003, 17, 545–555, doi:10.1016/S0960-0779(02)00397-1.
74. Ergün, S.; Aydoğan, T.; Alparslan Gök, S.Z. A Study on Performance Evaluation of Some Routing
Algorithms Modeled by Game Theory Approach. AKU J. Sci. Eng. 2016, 16, 170–176.
75. Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data
Min. Knowl. Manag. Process 2015, 5, 1–11, doi:10.5121/ijdkp.2015.5201.
76. Gopalakrishna, A.K.; Ozcelebi, T.; Liotta A.; Lukkien, J.J. Relevance as a Metric for Evaluating Machine
Learning Algorithms. In Machine Learning and Data Mining in Pattern Recognition; Perner, P., Eds.; Lecture
Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7988.
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).