g08cbc cl06
g08cbc cl06
1 Purpose
nag_1_sample_ks_test (g08cbc) performs the one sample Kolmogorov±Smirnov test, using one of the
standard distributions provided.
2 Speci®cation
#include <nag.h>
#include <nagg08.h>
3 Description
The data consist of a single sample of n observations denoted by x1 ; x2 ; . . . ; xn. Let Sn (x i and F0 (x i
represent the sample cumulative distribution function and the theoretical (null) cumulative distribution
function respectively at the point x i where x i is the ith smallest sample observation.
The Kolmogorov±Smirnov test provides a test of the null hypothesis H0 : the data are a random sample of
observations from a theoretical distribution speci®ed by the user against one of the following alternative
hypotheses:
(i) H1 : the data cannot be considered to be a random sample from the speci®ed null distribution.
(ii) H2 : the data arise from a distribution which dominates the speci®ed null distribution. In practical
terms, this would be demonstrated if the values of the sample cumulative distribution function Sn x
tended to exceed the corresponding values of the theoretical cumulative distribution function F0 x.
(iii) H3 : the data arise from a distribution which is dominated by the speci®ed null distribution. In practical
terms, this would be demonstrated if the values of the theoretical cumulative distribution function
F0 x tended to exceed the corresponding values of the sample cumulative distribution function
Sn x.
One of the following test statistics is computed depending on the particular alternative null hypothesis
speci®ed (see the description of the parameter dtype in Section 4).
For the alternative hypothesis H1 .
Dn ± the largest absolute deviation between the sample cumulative distribution function and the
theoretical cumulative distribution function. Formally Dn maxfD
n ; Dn g.
[NP3491/6] g08cbc.1
g08cbc NAG C Library Manual
by Kolmogorov (1933), and then tabulated by Smirnov (1948). The asymptotic distributions for the one-
sided statistics were obtained by Smirnov (1933).
The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that
observed, is computed. If n 100 an exact method given by Conover (1980), is used. Note that the
method used is only exact for continuous theoretical distributions and does not include Conover's
modi®cation for discrete distributions. This method computes the one-sided probabilities. The two-sided
probabilities are estimated by doubling the one-sided probability. This is a good estimate for small p, that
is p 0:10, but it becomes very poor for larger p. If n > 100 then p is computed using the Kolmogorov±
Smirnov limiting distributions, see Feller (1948), Kendall and Stuart (1973), Kolmogorov (1933), Smirnov
(1933) and Smirnov (1948).
4 Parameters
1: n ± Integer Input
On entry: the number of observations in the sample, n.
Constraint: n 3.
g08cbc.2 [NP3491/6]
g08 ± Nonparametric Statistics g08cbc
if a binomial distribution is used, then par[0] and par[1] must contain the parameters m and
p respectively;
if a exponential distribution is used, then par[0] must contain the parameter ;
if a poisson distribution is used, then par[0] must contain the parameter ;
if estima Nag_ParaEstimated, par need not be set except when the null distribution
requested is the binomial distribution in which case par[0] must contain the parameter m.
On exit: if estima Nag_ParaSupplied, par is unchanged. If estima Nag_ParaEstimated, then
par[0] and par[1] are set to values as estimated from the data.
Constraints:
if dist Nag_Uniform, par[0] < par[1],
if dist Nag_Normal, par[1] > 0.0,
if dist Nag_Gamma, par[0] > 0.0 and par[1] > 0.0,
if dist Nag_Beta, par[0] > 0.0 and par[1] > 0.0, and par[0] 106 and par[1] 106 ,
if dist Nag_Binomial, par[0] 1.0 and 0.0 < par[1] < 1.0, and par[0] par[1]
(1.0 par[1]) 106 and par[0] < 1/eps, where eps the machine precision, see
nag_machine_precision (X02AJC),
if dist Nag_Exponential, par[0] > 0.0,
if dist Nag_Poisson, par[0] > 0.0 and par[0] 106 .
7: d ± double * Output
On exit: the Kolmogorov±Smirnov test statistic Dn , D
n or Dn according to the value of dtype).
8: z ± double * Output
On exit: a standardized value, Z, of the test statistic, D, without any correction for continuity.
9: p ± double * Output
On exit: the probability, p, associated with the observed value of D where D may be Dn ; D
n or Dn
depending on the value of dtype (see Section 3).
[NP3491/6] g08cbc.3
g08cbc NAG C Library Manual
NE_BAD_PARAM
On entry, parameter dist had an illegal value.
On entry, parameter estima had an illegal value.
On entry, parameter dtype had an illegal value.
NE_G08CB_PARAM
On entry, the parameters supplied for the speci®ed null distribution are out of range. This error will
only occur if estima Nag_ParaEstimates.
NE_G08CB_DATA
The data supplied in x could not arise from the chosen null distribution, as speci®ed by the
parameters dist and par.
NE_G08CB_SAMPLE
The whole sample is constant i.e., the variance is zero. This error may only occur if (dist
Nag_Uniform, Nag_Normal, Nag_Gamma or Nag_Beta) and estima Nag_ParaEstimatesE.
NE_G08CB_VARIANCE
The variance of the binomial distribution (dist Nag_Binomial) is too large. That is mp(1 p) >
1.0e6.
NE_G08CB_INCOMP_GAMMA
When dist Nag_Gamma, in the computation of the incomplete gamma function by
nag_incomplete_gamma (s14bac) the convergence of the Taylor's series or Legendre continued
fraction fails within 600 iterations.
NE_ALLOC_FAIL
Memory allocation failed.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the
call is correct then please consult NAG for assistance.
6 Further Comments
The time taken by the routine increases with n until n > 100 at which point it drops and then increases
slowly with n. The time may also depend on the choice of null distribution and on whether or not the
parameters are to be estimated.
The data supplied in the parameter x must be consistent with the chosen null distribution as follows:
when dist Nag_Uniform, then par[0] xi par[1], for i 1; 2; . . . ; n;
when dist Nag_Normal, then there are no constraints on the xi 's;
when dist Nag_Gamma, then xi 0:0, for i 1; 2; . . . ; n;
g08cbc.4 [NP3491/6]
g08 ± Nonparametric Statistics g08cbc
6.1 Accuracy
The approximation for p, given when n > 100, has a relative error of at most 2.5% for most cases. The
two-sided probability is approximated by doubling the one-sided probability. This is only good for small p,
i.e., p < 0:10 but very poor for large p. The error is always on the conservative side, that is the tail
probability, p, is over estimated.
6.2 References
Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov±Smirnov limit theorems for empirical distributions Ann. Math.
Statist. 19 179±181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) Grif®n (3rd Edition)
Kolmogorov A N (1933) Sulla determinazione empirica di una legge di distribuzione Giornale dell'
Istituto Italiano degli Attuari 4 83±91
Smirnov N (1933) Estimate of deviation between empirical distribution functions in two independent
samples Bull. Moscow Univ. 2 (2) 3±16
Smirnov N (1948) Table for estimating the goodness of ®t of empirical distributions Ann. Math. Statist. 19
279±281
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw-Hill
7 See Also
None.
8 Example
The following example program reads in a set of data consisting of 30 observations. The Kolmogorov±
Smirnov test is then applied twice, ®rstly to test whether the sample is taken from a uniform distribution,
U 0; 2 and secondly to test whether the sample is taken from a Normal distribution where the mean and
variance are estimated from the data. In both cases we are testing against H1 that is we are doing a two-
tailed test. The values of d, z and p are printed for each case.
#include <stdio.h>
#include <nag.h>
#include <nag_stdlib.h>
#include <nagg08.h>
[NP3491/6] g08cbc.5
g08cbc NAG C Library Manual
{
double d, p, *par=0, *x=0, z;
Integer i, n, np, ntype;
Integer exit_status=0;
Nag_TestStatistics ntype_enum;
NagError fail;
INIT_FAIL(fail);
Vprintf("g08cbc Example Program Results\n");
Vscanf("%ld", &n);
x = NAG_ALLOC(n, double);
Vprintf("\n");
for (i = 1; i <= n; ++i)
Vscanf("%lf", &x[i - 1]);
Vscanf("%ld", &np);
if (!(par = NAG_ALLOC(np, double)))
{
Vprintf("Allocation failure\n");
exit_status = -1;
goto END;
}
g08cbc.6 [NP3491/6]
g08 ± Nonparametric Statistics g08cbc
else if (ntype == 2)
ntype_enum = Nag_TestStatisticsDPos;
else if (ntype == 3)
ntype_enum = Nag_TestStatisticsDNeg;
Test against Normal distribution with parameters estimated from the data