0% found this document useful (0 votes)
33 views5 pages

Berger 2014

The Kolmogorov–Smirnov test is a one-sample nonparametric test used to assess the goodness of fit of a dataset to a theoretical distribution, while the Smirnov test is a two-sample test that compares whether two samples follow the same distribution. Both tests utilize cumulative distribution functions for their analysis, with the Kolmogorov–Smirnov test focusing on a single sample against a known distribution and the Smirnov test comparing two empirical distributions. The document also discusses the implications of these tests on statistical analysis, particularly regarding Type I error rates and the importance of using exact methods for accurate results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views5 pages

Berger 2014

The Kolmogorov–Smirnov test is a one-sample nonparametric test used to assess the goodness of fit of a dataset to a theoretical distribution, while the Smirnov test is a two-sample test that compares whether two samples follow the same distribution. Both tests utilize cumulative distribution functions for their analysis, with the Kolmogorov–Smirnov test focusing on a single sample against a known distribution and the Smirnov test comparing two empirical distributions. The document also discusses the implications of these tests on statistical analysis, particularly regarding Type I error rates and the importance of using exact methods for accurate results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Kolmogorov–Smirnov Test: Overview

By Vance W. Berger1 and YanYan Zhou2


Keywords: distribution function, exact test, nonparametric test, ranks

Abstract: While often confused, the Kolmogorov–Smirnov test and the Smirnov test are
actually distinct. Specifically, the Kolmogorov–Smirnov test is used to test the goodness
of fit of a given set of data to a theoretical distribution, making this a one-sample test. In
contrast, the Smirnov test is a two-sample test, used to determine if two samples appear
to follow the same distribution. The intuition behind the two tests is the same, however,
in that both compare cumulative distribution functions, either two empirical cumulative
distribution functions for the two-sample Smirnov test, or one empirical cumulative dis-
tribution function and one known cumulative distribution function for the one-sample
Kolmogorov–Smirnov test.

The Kolmogorov–Smirnov and Smirnov Tests

The Kolmogorov–Smirnov test is used to test the goodness of fit of a given set of data to a theoretical
distribution, while the Smirnov test is used to determine if two samples appear to follow the same
distribution. Both tests compare cumulative distribution functions (cdfs). The problem of determining
whether a given set of data appear to have been drawn from a known distribution is often of interest
only because it has implications for the subsequent statistical analysis. For example, one may test a given
distribution for normality, and if one fails to reject this hypothesis, then one may proceed as if normality
were proven and use a parametric analysis that relies on normality for validity. This two-stage approach is
problematic, because it confuses failure to reject a null hypothesis (in this case, normality) with proving
its truth [10] . The consequence of this error is that the true Type I error rate may then exceed the nominal
Type I error rate [1, 6] , which is often taken to be 0.05.

1 University of Maryland, Baltimore, MD, USA


2 Florida International University, Miami, FL, USA

This article was originally published online in 2005 in Encyclopedia of Statistics in Behavioral Science, c John Wiley & Sons,
Ltd and republished in Wiley StatsRef: Statistics Reference Online, 2014.

Copyright c 2005 John Wiley & Sons, Ltd. All rights reserved. 1
Kolmogorov–Smirnov Test: Overview

The Smirnov Test


The problem of determining whether two sets are drawn from the same distribution functions arises
naturally in many areas of research and, in contrast with the problem of fitting a set of data to a known
theoretical distribution, is often of intrinsic interest. For example, one may ask if scores on a standardized
test are the same in two different states or in two different counties. Or one may ask if within a family
the incidence of chicken pox is the same for the first-born child and the second-born child. Formally, we
can state the problem as follows:
Let x:(x1 ,x2, . . . ,xm ) and y:(y1 ,y2, . . . ,yn ) be independent random samples of size m and n, respectively,
from continuous or ordered categorical populations with cdfs of F and G, respectively. We wish to test the
null hypothesis of equality of distribution functions:
H0 : F (t) = G(t), for every t. (1)
This null hypothesis can be tested against the omnibus two-sided alternative hypothesis:
Ha : F (t) 6= G(t) for at least one value of t, (2)
or it can be tested against a one-sided alternative hypothesis:
Ha : F (t) ≥ G(t) for all values of t, strictly greater for at least one value of t, (3)
or
Ha : F (t) ≤ G(t) for all values of t, strictly smaller for at least one value of t. (4)
To compute the Smirnov test statistic, we first need to obtain the empirical cdfs for the x and y samples.
These are defined by
number of sample x′ s ≤ t
Fm (t) = (5)
m
and
number of sample y ′ s ≤ t
G n (t) = . (6)
n
Note that these empirical cdfs take the value 0 for all values of t below the smallest value in the combined
samples and the value 1 for all values of t at or above the largest value in the combined samples. Thus, it
is the behavior between the smallest and largest sample values that distinguishes the empirical cdfs. For
any value of t in this range, the difference between the two distributions can be measured by the signed
differences, [Fm (t) − Gn (t)] or [Gn (t) − Fm (t)], or by the absolute value of the difference, |Fm (t) − Gn (t)|. The
absolute difference would be the appropriate choice where the alternative hypothesis is nondirectional,
F(t) 6= G(t), while the choice between the two signed differences depends on the direction of the alternative
hypothesis. The value of the difference or absolute difference may change with the value of t. As a result,
there would be a potentially huge loss of information in comparing distributions at only one value of t
[3, 14, 17]
.
The Smirnov test resolves this problem by employing as its test statistic, D, the maximum value of the
selected difference between the two empirical cumulative distribution functions:
D = max|Fm (t) − G n (t)|, min(x, y) ≤ t ≤ max(x, y), (7)
where the alternative hypothesis is that F(t) 6= G(t),
D+ = max[Fm (t) − G n (t)], min(x, y) ≤ t ≤ max(x, y), (8)
2 Copyright c 2005 John Wiley & Sons, Ltd. All rights reserved.
Kolmogorov–Smirnov Test: Overview

where the alternative hypothesis is that F(t) > G(t) for some value(s) of t, and
D− = max[G n (t) − Fm (t)], min(x, y) ≤ t ≤ max(x, y), (9)
where the alternative hypothesis is that F(t) < G(t) for some value(s) of t. Note, again, that the difference
in empirical cdfs need be evaluated only for the set of unique values in the samples x and y.
To test H0 at the " level of significance, one would reject H0 if the test statistic is large, specifically,
if it is equal to or larger than D" . The Smirnov test has been discussed by several authors, among them
Hilton, Mehta, and Patel [11] , Nikiforov [15] , and Berger, Permutt, and Ivanova [7] . One key point is that the
exact permutation reference distribution (see Exact Methods for Categorical Data) should be used instead
of an approximation, because the approximation is often quite poor, [1, 6] . In fact, in analyzing a real set of
data, Berger [1] found enormous discrepancies between the exact and approximate Smirnov tests for both
the one-sided and the two-sided P values. Because of the discreteness of the distribution of sample data,
it generally will not be possible to choose D" to create a test whose significance level is exactly equal to a
prespecified value of ". Thus, D" should be chosen from the permutation reference distribution so as to
make the Type I error probability no greater than ".
For any value of t, the two-sample data can be fit to a 2 × 2 table of frequencies. The rows of the
table correspond to the two populations sampled, X and Y, while the columns correspond to response
magnitudes – responses that are no larger than t in one column and responses that exceed t in a second
column. Taken together, these tables are referred to as the Lancaster decomposition [3, 16] . Such a set of tables
is unwieldy where the response is continuously measured, but may be of interest where the response is
one of a small number of ordered categories, for example, unimproved, slightly improved, and markedly
improved. In this case, the Smirnov D can be viewed as the maximized Fisher exact test statistic – the
largest difference in success proportions across the two groups, as the definition of success varies over
the set of Lancaster decomposition tables [16] . Is D maximized when success is defined as either slightly
or markedly improved, or when success is defined as markedly improved?
The row margins, sample sizes, are fixed by the design of the study. The column margins are complete
and sufficient statistics under the equality null hypothesis. Hence, the exact analysis would condition on
the two sets of margins [8] . The effect of conditioning on the margins is that the permutation sample space
is reduced compared to what it would be with an unconditional approach. Berger and Ivanova [4] provide
S-PLUS code for the exact conditional computations; see also [13, 18, 19] . The Smirnov test is invariant
under reparameterization of x. That is, the maximum difference D will not change if x undergoes a
monotonic transformation. So the same test statistic D results if x is analyzed or if log(x) or some other
transformation, such as the square of x, is analyzed; hence, the P value remains the same too.
The Smirnov test tends to be most sensitive around the median value where the cumulative probability
equals 0.5. As a result, the Smirnov test is good at detecting a location shift, especially changes in the
median values, but it is not always as good at detecting a scale shift (spreads), which affects the tails of the
probability distribution more, and which may leave the median unchanged. The power of the Smirnov
test to detect specific alternatives can be improved in any of several ways. For example, the power can be
improved by refining the Smirnov test so that any ties are broken. Ties result from different tables in the
permutation reference distribution having the same value of the test statistic.
There are several approaches to breaking the ties, so that at least some of the tables that had been tied
are now assigned distinct values of the test statistic [12, 16] . Only one of these approaches represents a true
improvement in the sense that even the exact randomized version of the test becomes uniformly more
powerful [8] . Another modification of the Smirnov test is based on recognizing its test statistic to be a
maximized difference of a constrained set of linear rank test statistics, and then removing the constraint,
so that a wider class of linear rank test statistics is considered in the maximization [5] . The power of this
adaptive omnibus test is excellent.
While there do exist tests that are generally or in some cases even uniformly more powerful than the
Smirnov test, the Smirnov test still retains its appeal as probably the best among the simple tests that
Copyright c 2005 John Wiley & Sons, Ltd. All rights reserved. 3
Kolmogorov–Smirnov Test: Overview

are available in standard software packages (the exact Smirnov test is easily conducted in StatXact). One
could manage the conservatism of the Smirnov test by reporting not just its P value but also the entire
P value interval [2] , whose upper end point is the usual P value but whose lower end point is what the P
value would have been without any conservatism.

The Kolmogorov–Smirnov Test

As noted above, the Kolmogorov–Smirnov test assesses whether a single sample could have been
sampled from a specified probability distribution. Letting Gn (t) be the empirical cdf for the single sample
and F(t) the theoretical cdf – for example, Normal with mean : and variance F 2 – the Kolmogorov–Smirnov
test statistic takes one of theses forms:

Dk = max|F (t) − G n (t)|, min(x) ≤ t ≤ max(x), (10)

where the alternative hypothesis is that F(t) 6= G(t),

Dk+ = max[F (t) − G n (t)], min(x) ≤ t ≤ max(y), (11)

where the alternative hypothesis is that F(t) > G(t) for some value(s) of t, and

Dk− = max[G n (t) − F (t)], min(x) ≤ t ≤ max(x), (12)

where the alternative hypothesis is that F(t) < G(t) for some value(s) of t.
The Kolmogorov–Smirnov test has an exact null distribution for the two directional alternatives but
the distribution must be approximated for the nondirectional case [9] . Regardless of the alternative, the
test is less accurate if the parameters of the theoretical distribution have been estimated from the sample.

References
[1] Berger, V. W. (2000). Pros and cons of permutation tests in clinical trials, Statistics in Medicine 19, 1319–1328.
[2] Berger, V. W. (2001). The p-value interval as an inferential tool, Journal of the Royal Statistical Society. Series D, (The
Statistician) 50, 79–85.
[3] Berger, V. W. (2002). Improving the information content of categorical clinical trial endpoints, Controlled Clinical Trials
23, 502–514.
[4] Berger, V. W. & Ivanova, A. (2001). Permutation tests for phase III clinical trials. Chapter 14, in Applied Statistics in the
Pharmaceutical Industry with Case Studies Using S-PLUS, S. P. Millard & A. Krause, eds, Springer Verlag, New York.
[5] Berger, V. W. & Ivanova, A. (2002). Adaptive tests for ordered categorical data, Journal of Modern Applied Statistical
Methods 1, 269–280.
[6] Berger, V. W., Lunneborg, C., Ernst, M. D. & Levine, J. G. (2002). Parametric analyses in randomized clinical trials,
Journal of Modern Applied Statistical Methods 1, 74–82.
[7] Berger, V. W., Permutt, T. & Ivanova, A. (1998). The convex hull test for ordered categorical data, Biometrics 54,
1541–1550.
[8] Berger, V. & Sackrowitz, H. (1997). Improving tests for superior treatment in contingency tables, Journal of the American
Statistical Association 92, 700–705.
[9] Conover, W. J. (1999). Practical Nonparametric Statistics, Wiley, New York.
[10] Greene, W. L., Concato, J. & Feinstein, A. R. (2000). Claims of equivalence in medical research: are they supported by
the evidence? Annals of Internal Medicine 132, 715–722.
[11] Hilton, J. F., Mehta, C. R. & Patel, N. R. (1994). An algorithm for conducting exact Smirnov tests, Computational Statistics
and Data Analysis 17, 351–361.
[12] Ivanova, A. & Berger, V. W. (2001). Drawbacks to integer scoring for ordered categorical data, Biometrics 57, 567–570.

4 Copyright c 2005 John Wiley & Sons, Ltd. All rights reserved.
Kolmogorov–Smirnov Test: Overview

[13] Kim, P. J. & Jennrich, R. I. (1973). Tables of the exact sampling distribution of the two-sample Kolmogorov–Smirnov
criterion, in Selected Tables in Mathematical Statistics, Vol. I, H. L. Harter & D. B. Owen, eds, American Mathematical
Society, Providence.
[14] Moses, L. E., Emerson, J. D. & Hosseini, H. (1984). Analyzing data from ordered categories, New England Journal of
Medicine 311, 442–448.
[15] Nikiforov, A. M. (1994). Exact Smirnov two-sample tests for arbitrary distributions, Applied Statistics 43, 265–284.
[16] Permutt, T. & Berger, V. W. (2000). A new look at rank tests in ordered 2 × k contingency tables, Communications in
Statistics – Theory and Methods 29, 989–1003.
[17] Rahlfs, V. W. & Zimmermann, H. (1993). Scores: ordinal data with few categories – how should they be analyzed? Drug
Information Journal 27, 1227–1240.
[18] Schroer, G. & Trenkler, D. (1995). Exact and randomization distributions of Kolmogorov–Smirnov tests for two or three
samples, Computational Statistics and Data Analysis 20, 185–202.
[19] Wilcox, R. R. (1997). Introduction to Robust Estimation and Hypothesis Testing, Academic Press, San Diego.

Copyright c 2005 John Wiley & Sons, Ltd. All rights reserved. 5

You might also like