0% found this document useful (0 votes)
24 views7 pages

g08cbc cl06

This function performs the one sample Kolmogorov-Smirnov test to test if a sample comes from a theoretical distribution. It takes a sample, specifies the theoretical distribution, and can estimate distribution parameters. It returns the test statistic and associated p-value to evaluate the null hypothesis that the sample matches the theoretical distribution.

Uploaded by

Khhg Agdds
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views7 pages

g08cbc cl06

This function performs the one sample Kolmogorov-Smirnov test to test if a sample comes from a theoretical distribution. It takes a sample, specifies the theoretical distribution, and can estimate distribution parameters. It returns the test statistic and associated p-value to evaluate the null hypothesis that the sample matches the theoretical distribution.

Uploaded by

Khhg Agdds
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

g08 ± Nonparametric Statistics g08cbc

NAG C Library Function Document


nag_1_sample_ks_test (g08cbc)

1 Purpose
nag_1_sample_ks_test (g08cbc) performs the one sample Kolmogorov±Smirnov test, using one of the
standard distributions provided.

2 Speci®cation
#include <nag.h>
#include <nagg08.h>

void nag_1_sample_ks_test (Integer n, const double x[],


Nag_Distributions dist, double par[], Nag_ParaEstimates estima,
Nag_TestStatistics dtype, double *d, double *z, double *p,
NagError *fail)

3 Description
The data consist of a single sample of n observations denoted by x1 ; x2 ; . . . ; xn. Let Sn (x i† † and F0 (x i† †
represent the sample cumulative distribution function and the theoretical (null) cumulative distribution
function respectively at the point x i† where x i† is the ith smallest sample observation.
The Kolmogorov±Smirnov test provides a test of the null hypothesis H0 : the data are a random sample of
observations from a theoretical distribution speci®ed by the user against one of the following alternative
hypotheses:
(i) H1 : the data cannot be considered to be a random sample from the speci®ed null distribution.
(ii) H2 : the data arise from a distribution which dominates the speci®ed null distribution. In practical
terms, this would be demonstrated if the values of the sample cumulative distribution function Sn x†
tended to exceed the corresponding values of the theoretical cumulative distribution function F0 x†.
(iii) H3 : the data arise from a distribution which is dominated by the speci®ed null distribution. In practical
terms, this would be demonstrated if the values of the theoretical cumulative distribution function
F0 x† tended to exceed the corresponding values of the sample cumulative distribution function
Sn x†.
One of the following test statistics is computed depending on the particular alternative null hypothesis
speci®ed (see the description of the parameter dtype in Section 4).
For the alternative hypothesis H1 .
Dn ± the largest absolute deviation between the sample cumulative distribution function and the
theoretical cumulative distribution function. Formally Dn ˆ maxfD‡
n ; Dn g.

For the alternative hypothesis H2 .


D‡n ± the largest positive deviation between the sample cumulative distribution function and the
theoretical cumulative distribution function. Formally D‡
n ˆ maxfSn x i† † F0 x i† †; 0g for both
discrete and continuous null distributions.
For the alternative hypothesis H3 .
Dn ± the largest positive deviation between the theoretical cumulative distribution function and the
sample cumulative distribution function. Formally if the null distribution is discrete then
Dn ˆ maxfF0 x i† † Sn x i† †; 0g a n d if th e n ul l d i s t r i bu t i o n i s c o nt i n u ou s t h e n
Dn ˆ maxfF0 x i† † Sn x i 1† †; 0g.
p
The standardized statistic Z ˆ D  n is also computed where D may be Dn ; D‡ n or Dn depending on
the choice of the alternative hypothesis. This is the standardised value of D with no correction for
continuity applied and the distribution of Z converges asymptotically to a limiting distribution, ®rst derived

[NP3491/6] g08cbc.1
g08cbc NAG C Library Manual

by Kolmogorov (1933), and then tabulated by Smirnov (1948). The asymptotic distributions for the one-
sided statistics were obtained by Smirnov (1933).
The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that
observed, is computed. If n  100 an exact method given by Conover (1980), is used. Note that the
method used is only exact for continuous theoretical distributions and does not include Conover's
modi®cation for discrete distributions. This method computes the one-sided probabilities. The two-sided
probabilities are estimated by doubling the one-sided probability. This is a good estimate for small p, that
is p  0:10, but it becomes very poor for larger p. If n > 100 then p is computed using the Kolmogorov±
Smirnov limiting distributions, see Feller (1948), Kendall and Stuart (1973), Kolmogorov (1933), Smirnov
(1933) and Smirnov (1948).

4 Parameters
1: n ± Integer Input
On entry: the number of observations in the sample, n.
Constraint: n  3.

2: x[n] ± const double Input


On entry: the sample observations x1 ; x2 ; . . . ; xn .
Constraint: the sample observations supplied must be consistent, in the usual manner, with the null
distribution chosen, as speci®ed by the parameters dist and par. For further details see Section 6.

3: dist ± Nag_Distributions Input


On entry: the theoretical (null) distribution from which it is suspected the data may arise, as follows:
dist ˆ Nag_Uniform, uniform distribution over a; b† U a; b†.
dist ˆ Nag_Normal, Normal distribution with mean  and variance 2 N ; 2 †.
dist ˆ Nag_Gamma, gamma distribution with shape parameter and scale parameter ,
where the mean ˆ .
dist ˆ Nag_Beta, beta distribution with shape parameters and , where the mean ˆ
= ‡ †.
dist ˆ Nag_Binomial, binomial distribution with the number of trials, m, and the probability
of a success, p.
dist ˆ Nag_Exponential, exponential distribution with parameter , where the mean ˆ 1=.
dist ˆ Nag_Poisson, poisson distribution with parameter , where the mean ˆ .
Constraint: dist ˆ Nag_Uniform, Nag_Normal, Nag_Gamma, Nag_Beta, Nag_Binomial,
Nag_Exponential or Nag_Poisson.

4: par[2] ± double Input/Output


On entry: if estima ˆ Nag_ParaSupplied, par must contain the known values of the parameter(s)
of the null distribution as follows:
if a uniform distribution is used, then par[0] and par[1] must contain the boundaries a and b
respectively.;
if a Normal distribution is used, then par[0] and par[1] must contain the mean, , and the
variance, 2 , respectively;
if a gamma distribution is used, then par[0] and par[1] must contain the parameters and
respectively;
if a beta distribution is used, then par[0] and par[1] must contain the parameters and
respectively;

g08cbc.2 [NP3491/6]
g08 ± Nonparametric Statistics g08cbc

if a binomial distribution is used, then par[0] and par[1] must contain the parameters m and
p respectively;
if a exponential distribution is used, then par[0] must contain the parameter ;
if a poisson distribution is used, then par[0] must contain the parameter ;
if estima ˆ Nag_ParaEstimated, par need not be set except when the null distribution
requested is the binomial distribution in which case par[0] must contain the parameter m.
On exit: if estima ˆ Nag_ParaSupplied, par is unchanged. If estima ˆ Nag_ParaEstimated, then
par[0] and par[1] are set to values as estimated from the data.
Constraints:
if dist ˆ Nag_Uniform, par[0] < par[1],
if dist ˆ Nag_Normal, par[1] > 0.0,
if dist ˆ Nag_Gamma, par[0] > 0.0 and par[1] > 0.0,
if dist ˆ Nag_Beta, par[0] > 0.0 and par[1] > 0.0, and par[0]  106 and par[1]  106 ,
if dist ˆ Nag_Binomial, par[0]  1.0 and 0.0 < par[1] < 1.0, and par[0]  par[1] 
(1.0 par[1])  106 and par[0] < 1/eps, where eps ˆ the machine precision, see
nag_machine_precision (X02AJC),
if dist ˆ Nag_Exponential, par[0] > 0.0,
if dist ˆ Nag_Poisson, par[0] > 0.0 and par[0]  106 .

5: estima ± Nag_ParaEstimates Input


On entry: estima must specify whether values of the parameters of the null distribution are known
or are to be estimated from the data:
if estima ˆ Nag_ParaSupplied, values of the parameters will be supplied in the array par
described above;
if estima ˆ Nag_ParaEstimated, parameters are to be estimated from the data except when
the null distribution requested is the binomial distribution in which case the ®rst parameter,
m, must be supplied in par[0] and only the second parameter, p is estimated from the data.
Constraint: estima ˆ Nag_ParaSupplied or Nag_ParaEstimated.

6: dtype ± Nag_TextStatistics Input


On entry: the test statistic to be calculated, i.e., the choice of alternative hypothesis.
dtype ˆ Nag_TestStatisticsDAbs : Computes Dn , to test H0 against H1 ,
dtype ˆ Nag_TestStatisticsDPos : Computes D‡
n , to test H0 against H2 ,

dtype ˆ Nag_TestStatisticsDNeg : Computes Dn , to test H0 against H3 .


Constraint: dtype ˆ Nag_TestStatisticsDAbs, Nag_TestStatisticsDPos or
Nag_TestStatisticsDNeg.

7: d ± double * Output
On exit: the Kolmogorov±Smirnov test statistic Dn , D‡
n or Dn according to the value of dtype).

8: z ± double * Output
On exit: a standardized value, Z, of the test statistic, D, without any correction for continuity.

9: p ± double * Output
On exit: the probability, p, associated with the observed value of D where D may be Dn ; D‡
n or Dn
depending on the value of dtype (see Section 3).

[NP3491/6] g08cbc.3
g08cbc NAG C Library Manual

10: fail ± NagError * Input/Output


The NAG error parameter (see the Essential Introduction).

5 Error Indicators and Warnings


NE_INT_ARG_LT
On entry, n must not be less than 3: n ˆ <value>.

NE_BAD_PARAM
On entry, parameter dist had an illegal value.
On entry, parameter estima had an illegal value.
On entry, parameter dtype had an illegal value.

NE_G08CB_PARAM
On entry, the parameters supplied for the speci®ed null distribution are out of range. This error will
only occur if estima ˆ Nag_ParaEstimates.

NE_G08CB_DATA
The data supplied in x could not arise from the chosen null distribution, as speci®ed by the
parameters dist and par.

NE_G08CB_SAMPLE
The whole sample is constant i.e., the variance is zero. This error may only occur if (dist ˆ
Nag_Uniform, Nag_Normal, Nag_Gamma or Nag_Beta) and estima ˆ Nag_ParaEstimatesE.

NE_G08CB_VARIANCE
The variance of the binomial distribution (dist ˆ Nag_Binomial) is too large. That is mp(1 p) >
1.0e6.

NE_G08CB_INCOMP_GAMMA
When dist ˆ Nag_Gamma, in the computation of the incomplete gamma function by
nag_incomplete_gamma (s14bac) the convergence of the Taylor's series or Legendre continued
fraction fails within 600 iterations.

NE_ALLOC_FAIL
Memory allocation failed.

NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the
call is correct then please consult NAG for assistance.

6 Further Comments
The time taken by the routine increases with n until n > 100 at which point it drops and then increases
slowly with n. The time may also depend on the choice of null distribution and on whether or not the
parameters are to be estimated.
The data supplied in the parameter x must be consistent with the chosen null distribution as follows:
when dist ˆ Nag_Uniform, then par[0]  xi  par[1], for i ˆ 1; 2; . . . ; n;
when dist ˆ Nag_Normal, then there are no constraints on the xi 's;
when dist ˆ Nag_Gamma, then xi  0:0, for i ˆ 1; 2; . . . ; n;

g08cbc.4 [NP3491/6]
g08 ± Nonparametric Statistics g08cbc

when dist ˆ Nag_Beta, then 0:0  xi  1:0, for i ˆ 1; 2; . . . ; n;


when dist ˆ Nag_Binomial, then 0.0  xi  par[0], for i ˆ 1; 2; . . . ; n;
when dist ˆ Nag_Exponential, then xi  0:0, for i ˆ 1; 2; . . . ; n;
when dist ˆ Nag_Poisson, then xi  0:0, for i ˆ 1; 2; . . . ; n.

6.1 Accuracy
The approximation for p, given when n > 100, has a relative error of at most 2.5% for most cases. The
two-sided probability is approximated by doubling the one-sided probability. This is only good for small p,
i.e., p < 0:10 but very poor for large p. The error is always on the conservative side, that is the tail
probability, p, is over estimated.

6.2 References
Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov±Smirnov limit theorems for empirical distributions Ann. Math.
Statist. 19 179±181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) Grif®n (3rd Edition)
Kolmogorov A N (1933) Sulla determinazione empirica di una legge di distribuzione Giornale dell'
Istituto Italiano degli Attuari 4 83±91
Smirnov N (1933) Estimate of deviation between empirical distribution functions in two independent
samples Bull. Moscow Univ. 2 (2) 3±16
Smirnov N (1948) Table for estimating the goodness of ®t of empirical distributions Ann. Math. Statist. 19
279±281
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw-Hill

7 See Also
None.

8 Example
The following example program reads in a set of data consisting of 30 observations. The Kolmogorov±
Smirnov test is then applied twice, ®rstly to test whether the sample is taken from a uniform distribution,
U 0; 2† and secondly to test whether the sample is taken from a Normal distribution where the mean and
variance are estimated from the data. In both cases we are testing against H1 that is we are doing a two-
tailed test. The values of d, z and p are printed for each case.

8.1 Program Text


/* nag_1_sample_ks_test (g08cbc) Example Program.
*
* Copyright 2000 Numerical Algorithms Group.
*
* Mark 6, 2000.
*/

#include <stdio.h>
#include <nag.h>
#include <nag_stdlib.h>
#include <nagg08.h>

int main (void)

[NP3491/6] g08cbc.5
g08cbc NAG C Library Manual

{
double d, p, *par=0, *x=0, z;
Integer i, n, np, ntype;
Integer exit_status=0;
Nag_TestStatistics ntype_enum;
NagError fail;

INIT_FAIL(fail);
Vprintf("g08cbc Example Program Results\n");

/* Skip heading in data file */


Vscanf("%*[^\n]");

Vscanf("%ld", &n);
x = NAG_ALLOC(n, double);

Vprintf("\n");
for (i = 1; i <= n; ++i)
Vscanf("%lf", &x[i - 1]);
Vscanf("%ld", &np);
if (!(par = NAG_ALLOC(np, double)))
{
Vprintf("Allocation failure\n");
exit_status = -1;
goto END;
}

for (i = 1; i <= np; ++i)


Vscanf("%lf", &par[i - 1]);
Vscanf("%ld", &ntype);
if (ntype == 1)
ntype_enum = Nag_TestStatisticsDAbs;
else if (ntype == 2)
ntype_enum = Nag_TestStatisticsDPos;
else if (ntype == 3)
ntype_enum = Nag_TestStatisticsDNeg;
else
ntype_enum = (Nag_TestStatistics)-999;

g08cbc(n, x, Nag_Uniform, par, Nag_ParaSupplied, ntype_enum, &d, &z, &p,


&fail);
if (fail.code != NE_NOERROR)
{
Vprintf("Error from g08cbc.\n%s\n", fail.message);
exit_status = 1;
goto END;
}
Vprintf("Test against uniform distribution on (0,2)\n");
Vprintf("\n");
Vprintf("Test statistic D = %8.4f\n", d);
Vprintf("Z statistic = %8.4f\n", z);
Vprintf("Tail probability = %8.4f\n", p);
Vprintf("\n");
Vscanf("%ld", &np);
for (i = 1; i <= np; ++i)
Vscanf("%lf", &par[i - 1]);
Vscanf("%ld", &ntype);
if (ntype == 1)
ntype_enum = Nag_TestStatisticsDAbs;

g08cbc.6 [NP3491/6]
g08 ± Nonparametric Statistics g08cbc

else if (ntype == 2)
ntype_enum = Nag_TestStatisticsDPos;
else if (ntype == 3)
ntype_enum = Nag_TestStatisticsDNeg;

g08cbc(n, x, Nag_Normal, par, Nag_ParaEstimated, ntype_enum, &d, &z, &p,


&fail);
if (fail.code != NE_NOERROR)
{
Vprintf("Error from g08cbc.\n%s\n", fail.message);
exit_status = 1;
goto END;
}

Vprintf("Test against Normal distribution with parameters estimated from the


data\n");
Vprintf("\n");
Vprintf("%s%6.4f%s%6.4f\n", "Mean = ", par[0], " and variance = ", par[1]);
Vprintf("Test statistic D = %8.4f\n", d);
Vprintf("Z statistic = %8.4f\n", z);
Vprintf("Tail probability = %8.4f\n", p);
END:
if (x) NAG_FREE(x);
if (par) NAG_FREE(par);
return exit_status;
}

8.2 Program Data


g08cbc Example Program Data
30
0.01 0.30 0.20 0.90 1.20 0.09 1.30 0.18 0.90 0.48
1.98 0.03 0.50 0.07 0.70 0.60 0.95 1.00 0.31 1.45
1.04 1.25 0.15 0.75 0.85 0.22 1.56 0.81 0.57 0.55
2 0.0 2.0 1
2 0.0 1.0 1

8.3 Program Results


g08cbc Example Program Results

Test against uniform distribution on (0,2)

Test statistic D = 0.2800


Z statistic = 1.5336
Tail probability = 0.0143

Test against Normal distribution with parameters estimated from the data

Mean = 0.6967 and variance = 0.2564


Test statistic D = 0.1108
Z statistic = 0.6068
Tail probability = 0.8925

[NP3491/6] g08cbc.7 (last)

You might also like