0% found this document useful (0 votes)
2 views

Lecture 4_250311_143137

The document discusses non-parametric statistics, focusing on paired sample tests such as the Wilcoxon signed-rank test and the Sign test. It outlines the procedures, hypotheses, and assumptions for these tests, providing examples to illustrate their application. The document serves as a guide for conducting statistical tests without assuming a specific distribution for the data.

Uploaded by

2mnxnqwtrq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 4_250311_143137

The document discusses non-parametric statistics, focusing on paired sample tests such as the Wilcoxon signed-rank test and the Sign test. It outlines the procedures, hypotheses, and assumptions for these tests, providing examples to illustrate their application. The document serves as a guide for conducting statistical tests without assuming a specific distribution for the data.

Uploaded by

2mnxnqwtrq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Non Parametric Statistics

Dr. S.K. Appiah

May 10, 2022


Outline:

Paired Sample Tests:


i. Wilcoxon signed–rank test
ii. Sign Test
Kolmogorov-Smirnov (K-S) Tests
i. K-S one-sample test for goodness-of-fit ii. K-S two-sample test

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 2 / 71


1. PAIRED SAMPLES TESTS:
1.1 Wilcoxon Signed–Rank Test
It is used to compare two probability distributions by considering the
ranks of the absolute differences between the paired observations.
The test proceeds as follows:
i. Take the difference for each sample pair, (X1 -X2 ) Differences equal
to zero are eliminated, thus reducing the number of pairs,n.
ii. Rank the absolute differences from lowest to highest. Tied
observations are assigned the average of the ranks.
iii. Determine the sum of ranks R+ for the positive differences and
R− for the negative differences.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 3 / 71


iv. The hypotheses are
H0 ; The two populations have identical distributions.
H1 ; The two populations are not identical; the probability distribution
of population 1 is shifted to the right of population 2 (or to the left of
population 2), at α−level of significance
v. Test-statistic (for small sample pairs, n<25):
Take R = min(R+ ,R− )fortwo − tailedtest.TakeR− if probability
distribution 1 is shifted to the right of 2, or take R+ if the probability
distribution of 1 is shifted to the left of 2.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 4 / 71


The decision rule is to reject H0 if, R ≤ R0 where R0 is the critical
value determined from the Wilcoxon-Signed Test table for two-tailed
at significance level of α and number of untied pairs, or reject H0 if,
R− (or R+ ) ≤ R0 where R0 is obtained from Wilcoxon-Signed Test
table for one-tailed level of significance, and number of untied pairs, n.
v. Test-statistic (for large sample pairs,n ≥ 25The sampling distribution
of the Signed rank statistic is approximately normal when the number of
paired observations is large.
The advantage of this approximation is that the p-values are much
easier to determine.
It also serves as a measure of the strength of the test. That is, for
large n ≥ 25 paired observations:

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 5 / 71


Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 6 / 71
v. Assumptions:
Each pair observation is randomly selected.
The absolute differences in the paired observations can be ranked.
The paired sample size may be small (n < 25) or large (n ≥ 25)

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 7 / 71


Example 5-1.16:
Suppose 10 judges are given a sample of two paper products that a
company wants to compare. Each judge rates the softness of each product
on scale from 1 to 10 with higher rating implying a softer product. The
results are shown in the table below. Test at α = 0.05 (α = 0.10).

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 8 / 71


Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 9 / 71
ii. Sum of positive and negative ranks,
R+ : 5+7.5+2+7.5+9+5+10 = 46
R− : 2+5+2 = 9
iii. The hypotheses are:
H0 ; The probability distributions of ratings for products A and B are
identical.
H1 ; The two probability distributions differ at α = 0.05 or 0.10.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 10 / 71


iv. The test-statistic, R = min(R+ ,R− ) = min(46,9) = 9
From the Wilcoxon Signed Rank Test table,
R0 = 0.08 for α = 0.05 which is not equal or greater than R
indicating that the experiment does not provide sufficient evidence to
indicate that the two paper products differ with respect to their
softness rating.
If α = 0.10
We have R0 = 11, which is greater than R and rejecting that the
samples provide sufficient evidence that the two probability
distributions differ.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 11 / 71


Example 5-1.17:
A paired difference with 30 paired observations yielded the sum of ranks
R+ = 345 .
(i) Test whether the probability for population A is located to the right of
that for population B at α = 0.05. What is the approximate p-value of the
test?
(ii) What assumptions are necessary to ensure the validity of the test?

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 12 / 71


Solution
(i) The null and alternative hypotheses are:
H0 ; The probability distributions for the two populations are identical.
H1 ; The probability distribution for population A is located to the
right of population B at α = 0.05

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 13 / 71


Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 14 / 71
vi. P- value = p(Z > 2.31) = 1 − P(Z ≤ 2.31)
P- value = p(2.31) = 1 − 0.9896 = 0.0104
vii. The required assumptions are:
The paired observations are randomly selected.
The probability distribution of the paired differences is continuous.
The sample size must be large such that n≥ 25

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 15 / 71


Example 5-1.18:
The Director of a corporation must choose between two plans for
improving employee safety. To aid in reaching a decision, both plans, A
and B are examined by 10 safety experts, each of whom scale from one (1)
to ten (10) with high ratings implying a better plan. The corporation will
adopt the plan B which is more expensive only if the data provide evidence
that the safety experts rate plan B higher than plan A.
The results are shown in the table below. Do the data provide evidence at
α = 0.05 that the distribution of the ratings for plan B lie above that of
plan A.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 16 / 71


Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 17 / 71
i. Sum of ranks,
R+ : 2 + 4.5 + 9 = 15.5
R− : 4.5 + 2 + 6 + 7.5 + 2 + 7.5 = 29.5
H0 ; The probability distributions of the rating of plans A and B are
identical.
H1 ; The probability distribution of the ratings of Plan B is shifted to
the right of Plan A. That is the ratings of the more expensive Plan B
tends to exceed that of Plan A.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 18 / 71


Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 19 / 71
iii. The test-statistic is R = Min(R+ ,R− ) = Min(15.5,29.5) = 15.5
which is compared with the critical value R0 = 8 at α = 0.05
iv. Conclusion: We fail to reject H0 since R > R0 and conclude that the
sample data provide insufficient evidence at α = 0.05 to support the
research hypothesis (i.e. the Director has no sufficient evidence to
conclude that Plan B is rated higher than Plan A).

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 20 / 71


Example 1.5:
1. In an attempt to judge the effectiveness of anti-smoking campaign, a
researcher exposed 12 smokers to a series of presentations. The numbers
of cigarettes the smokers smoked the week before and after the
presentations were obtained as follows:

Apply the Wilcoxon Sign-Rank Test to test the effectiveness of the


anti-smoking campaign using a level of significance of 5%..

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 21 / 71


2. The production of 12 workers was observed during one week when they
were under strict supervision and during another week when they were not.
The following scores provide a measure of their productivity. Do the data
indicate that the strict supervision improves productivity? Use α = 0.05.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 22 / 71


1.2 Sign Test
The Sign test is one of the simplest non-parametric tests for
comparing two populations using paired data.
It is appropriately named sign test because it is based on plus (+) and
minus (-) signs of the response differences of the two samples.
Test Procedure:
a) Let the scores of the two samples be X1 and X2 . Take the differences
between the observations, noting the number of plus signs (n+ ) if X1 >X2
and the number of minus signs (n− ) if X1 <X2 . Discard zero differences
and reduce the paired sample size by the number of zeros recorded.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 23 / 71


b. The Hypotheses are:
H0 ; The two population distributions are identical.
H1 ; The two population distributions are not identical at α level of
significance.
H0 ; P(n+ ) = P(n− )
H1 ; P(n+ ) ̸= P(n− ), P(n+ ) > P(n− ) or P(n+ ) < P(n− ) at α level of
significance.
If H0 is true, then we expect as many plus(+) signs as minus(-) signs.
Hence the hypotheses may be stated as:
H0 : P(n+ ) = P(n− ): p = 1/2
H1 ; P(n+ ) ̸= P(n− ); p ̸= 1/2, p > 1/2 or p < 1/2.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 24 / 71


c. Test statistic:
The test-statistic is either the observed number of plus (n+ ) or minus signs
(n− ). The nature of H1 determines which of the statistics is appropriate:
If H1 ; p ̸= 1/2, then either a sufficiently smaller number of plus signs
or sufficiently small number of minus signs causes rejection of H0 . In
this case we may take the test-statistic to be the less frequently
occurring signs.
If p > 1/2, a sufficiently small number of minus (-) signs cause
rejection of H0 . The test-statistic is the number of minus (-) signs.
Similarly, if p < 1/2, a sufficiently small number of plus (+) signs
causes rejection of H1 . The test-statistic is taken to be the number of
plus (+) signs.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 25 / 71


. Distribution of Test statistic:
Let y be the test-statistic (n+ or n− ). Then we shall consider two cases,
small and large sample sizes:
i. Small sample size: When the sample size is small, then the sampling
distribution of is the Binomial probability distribution with parameter p =
1/2, if is true. That is, y ∼ B(n, p) and the decision rule then becomes

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 26 / 71


(ii) Large sample size: If n is large such that np = n(1 − p) > 5, normal
approximation to Binomial is used. The sampling distribution of n
becomes approximately normal with mean uy = np = 1/2 and standard

deviation, δy = np(1 − p) = 21 n.
p

The test-statistic becomes:


y − uy
Z=
δy
y − 0.5n
= √
0.5 n
. which is compared with Zα or Zα/2
Taking into consideration the continuity correction factor the test-statistic
then becomes Z = y ±0.5−0.5n

0.5 n
, where (y+0.5) is used when y< 0.5n and
(y-0.5) is used when y > 0.5n.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 27 / 71


Example 1:
The salaries paid to 6 men and 6 women (in dollars) in similar positions
were as shown in the table below. Use the Sign test to test if there is
significant difference in salary between the males and females at 5% level
of significance.

Solution:
Computing the differences between salaries mean Xm - Xf using plus and
minus signs and noting the number of plus (+) signs (n+ )= 5 and the
number of minus (-) signs n− = 1.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 28 / 71


(i) The hypotheses are: H0 : The salary distribution of men and women are
identical, p=1/2
H1 : The salary distribution for both men and women are not identical, p ̸=
1/2 at α = 0.05 level of significance.
ii. Test Statistic:

1 6 1 1
!
y ∼ B(n, p) = B(6, ) = ( )( )( )6−y
2 y 2 2

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 29 / 71


p(y ≤ n) = P(y ≤ 1)
1
!y !6−y
6 1 1
!
=
X

y =0
y 2 2
!0 !6 !1 !5
6 1 1 6 1 1
! !
= =
0 2 2 1 2 2
!6 !1 !5 !6
1 1 1 1
= + =7+ = 0.109375
2 2 2 2
which is compared with α2 = 0.025.
(iii) Conclusion: We fail to reject at α = 0.05 since and conclude that
there is no significance difference in salary between the females and males.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 30 / 71


Example 2:
Two presidential candidates A and B were rated by 18 randomly picked
voters on a scale of 0 to 5. The results are provided in the table below.
Use the Sign test to test the null hypotheses that there is no significance
preference for one candidate to the other. Use α = 0.10.

Solution:
The number of plus (+) signs (for RA > RB ) and minus signs (for RA <
RB ) are n+ = 13 and n− = 4 respectively. The sample size n = 17 (since
there is one tie).

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 31 / 71


i. The hypotheses:
H0 : The two candidates are equally preferred by the voters, p = 1/2.
H1 : There is significant preference for one candidate to the other, p ̸= 1/2
at α = 0.10 significant level.
ii. Test Statistic: Using the normal approximation, since np = 17(0.5) >
5, we have

µy = np = 17(0.5) = 8.5
q q
σy = np(1 − p) = 17(0.5)(0.5) = 2.062
Hence the test-statistic,
y − 0.5 − µy 4 − 0.5 − 8.5
Z= = = −2.425
σy 2.062
which is compared with the critical value, or Zα/2 = Z0.05 = 1.65

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 32 / 71


iii. Conclusion:
Reject H0 since Z = −2.425 < Zα/2 = −1.65
Conclude that there is significant difference in terms of preference for
the two candidates by the voters.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 33 / 71


Exercise
1. A physical instructor believes his planned exercise programme is
effective in weight-reduction. To investigate this claim, a random sample
of 7 persons enrolled in the programme was selected. All the 7 persons
were weighed before they started the programme and again after
completing it. The following results were produced.

student 1 2 3 4 5 6 7
Before 192 198 170 130 113 165 168
After 185 185 162 118 118 159 164

Use the Sign test to determine that the weight-reduction programme is


effective. Use = 0.10 and let weight reduction be denoted “+” and
increase be “-”.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 34 / 71


2. The productivity of 12 workers was observed during one week when they
were under strict supervision and during another week when they were not.
The following scores provide a measure of their productivity. Do the data
indicate that the strict supervision improves productivity? Use α = 0.05.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 35 / 71


Spearman’s Rank Correlation Test
The Spearman’s rank correlation test is used to test for association
between bivariate data. The test uses ranks of the paired data (xi , yi )
to establish linear relationship between them. The test proceeds as
follows:
i. Hypotheses
H0 : The paired data are independent (ie no assocation between
paired data ρ = 0)
H1 : The paired data are dependent (ie there is an association ρ ̸= 0 ,
ρ > 0, there is positive association. or ρ < 0, there is negative
association.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 36 / 71


ii. Test Statistic (small n ≤ 30):
Rank the paired data to compute the differences between ranks (di ).
Compute the test statistic, which is the sample correlation coefficient,
called the Spearman’s rank correlation coefficient,

6 ni=1 di
P
rs = 1 −
n(n2 − 1)

where di = rx − ry

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 37 / 71


iii. Decision Rule
At a given level of significance (α) and number of paired observations
(n), the critical values for Spearman’s rank correlation can be
obtained from the Spearman rank table.
The decision rule is to reject H0 if the critical value is less than or
equal to the test statistic value: that is : reject H0 if |rs | ≥ rα/2(n) for
two- tailed test or rs ≥ rα(n) or rs ≤ −rα(n) for one- tailed test.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 38 / 71


Assumptions
The n paired observations are randomly sampled from the
populations. The probability distributions of the paired variables
(xi , yi ) are continuous
iv. Test-Statistic (large n > 30)
If the paired samples are large, n > 30. The sampling distribution of rs is
approximately normally distributed as with mean 0,and variance n−11
. The
test-statistic becomes
rs − E (rs ) √
Z= p = rs n − 1
Var (rs )

which is compared with the critical value Zα or Zα /2.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 39 / 71


Examples
1. Manufacturers of perishable foods often use preservatives to retard
spoilage. One concern is that using too much preservative will change the
flavour of the food. Suppose an experiment is conducted, using samples of
a food preservative added. The length of time until the food shows signs
of spoiling and a taste rating are recorded for measurements are shown in
the table below:

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 40 / 71


Sample(i) 1 2 3 4 5 6 7 8 9 10
Times(Days) xi 30 47 26 94 67 83 36 77 43 109
Taste Rating 4.3 3.6 4.5 2.8 3.3 2.7 4.2 3.9 3.6 2.2
Rankxi (Rx ) 2 5 1 11 7 10 3 9 4 12
Rankxi (Ry ) 11 7.5 12 3 6 2 10 9 7.5 1
di = rx − ry 9 -2.5 -11 8 1 8 -7 0 -2.5 11

11 12
56 70
3.1 2.9
6 8
5 4
1 4

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 41 / 71


Use the Spearman’s rank correlation test to find whether spoilage times
and taste ratings are correlated. Use α = 0.05.
Solution
i. Computing sum of squares of differences between xi and yi :
n
di2 = 81 + 6.25 + 121 + · · · + 121 + 1 + 16 = 530.5
X

i=1

ii. Hypotheses:
H0 : ρ = 0 (Spoilage times do not correlate with taste ratings)
H1 : ρ ̸= 0 (Spoilage times correlate with taste ratings) at α = 0.05.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 42 / 71


iii. Test- statistic

6 ni=1 di2
P
rs = 1 −
n(n2 − 1)
6(530.5)
rs = 1 −
12(122 − 1)
1 − 1.855 = −0.855.
which is compared with rα/2(n) = r0.0591
iv. Test Decision
H0 is rejected since |rs | = 0.855 > r0.025 = 0.0587 and conclude that
preservative does affect the taste of the food.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 43 / 71


2. For a certain factory that requires great skills, it is thought that
productivity on the job should increase with an increase in years of
experience. Ten employees were randomly selected from among those who
hold this type of job. Data in years of experience and a measure of
productivity were found to be as follows:

Employee 1 2 3 4 5 6 7 8 9 10
years of Emp. 4 6 10 2 12 6 5 10 13 9
Productivity 80 82 88 81 92 85 83 86 91 90

Do the data support the conjecture that years of experience is positively


correlated with productivity? Take α = 0.10.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 44 / 71


3. Many large businesses send representatives to college campuses to
conduct job interviews. To aid the interview, one company decides to
study the correlation between the strength of an applicant’s references and
the performance of the applicant on the job. Eight (8) recently hired
employees are sampled and independent evaluation of both references and
job performance are made on a scale from 1 to 20. The scores are given in
the table below:
Employee 1 2 3 4 5 6 7 8
References 18 14 19 13 16 11 20 9
Job Performance 20 13 16 9 15 18 15 12

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 45 / 71


a. Compute the Spearman’s rank correlation coefficient for the data.
b. Is there evidence that the strength of reference and job performance are
positively correlated ?. Use α = 0.05.
4. Two students were asked to rate 8 different textbooks for a specific
course on a scale from 0 to 20 points. Points were assigned for each of the
several categories such as reading level, use of illustrations and use of
colour. At α = 0.05 test the hypotheses that there is significant linear
relationship between the two students’ ratings.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 46 / 71


Employee A B C D E F G H
Student1′ s Rating 4 10 18 20 12 2 5 9
Student2′ s Rating 4 6 20 14 16 8 11 7

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 47 / 71


2.Kolmogorov-Smirnov (K-S) Tests
There are two types of K-S tests:
K-S one sample test: Designed for goodness-of-fit tests for continuous
distributions. It involves specifying the cumulative frequency which
would occur under the theoretical frequency and comparing it with
the observed or empirical cumulative frequency distribution.
K-S two-samples test: Tests the hypothesis that the distributions of
the two sampled populations are identical.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 48 / 71


2.1 Kolmogorov-Smirnov (K-S) Two-Sample Test
The K-S two-sample test determines whether two independent
samples have been drawn from the same population.
If the two samples are drawn from the sample population, the
cumulative distributions of both samples are expected to be fairly
close to each other.
Test Procedure:
(i). Arrange each of the two samples data to obtain the cumulative
frequency distributions Sn1 (x ) and Sn2 (x ), and where necessary using same
class intervals for both distributions.
(ii). Compute the absolute difference |Sn1 (x ) − Sn2 (x )| between the two
sample cumulative distributions at each listed point/interval (x) where
Sn1 (x ) = k/n1 , Sn2 (x ) = k/n2 and k is the number of scores equal to or
less than x.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 49 / 71


(iii) Determine the largest of these differences in (ii), since the test focuses
on the largest of these deviations:

D = max |(Sn1 (x ) − (Sn2 (x )|,


for two or one-tailed test.
(iv) The determination for the significance of the observed D value
depends on the size of the samples n1 and n2 and nature of H1 .

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 50 / 71


Small Samples: When n1 = n2 ≤ 40
For small samples, the hypotheses are:
H0 : The two samples are drawn from the same population distribution,
against
H1 ; The two samples are drawn from different population distributions.
From the small samples table, we obtain the critical value KD defined as
the numerator of D = max |(Sn1 (x ) − (Sn2 (x )| for various levels of
significance for both one- and two-tailed tests.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 51 / 71


Large Samples: When n1 and n2 are both larger than 40.
For a two-tailed test, regardless of whether or not n1 = n2 , K-S
two-sample test table for large samples is used.
The critical values of D for any given large n1 and n2 may be
computed by expressions given at various levels of significance in the
table.
For a one-tailed test when n1 and n2 are large, we determine the
test-statistic: D = max |(Sn1 (x ) − (Sn2 (x )|
The significance of the observed value D may also or usually be
determined by:
n1 n2
X 2 = 4D 2 ( ) ∼ χ2α(2)
n1 + n2

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 52 / 71


If the observed value of X 2 is equal to or greater than the chi-square
distribution value, χ2α(2) then H0 may be rejected at that level of
significance.
Examples 3.1.5.1
1. The data below are the learning errors made (in percentages) by the
two groups of pupils in memorizing a text material.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 53 / 71


(i) Hypotheses: Test the hypothesis
H0 : there is no difference in the proportion of errors made in memorizing
the test material between the two groups of pupils against
H1 : The older group of pupils make proportionally fewer errors than the
younger group of pupils.
(ii) Let α = 0.01 and n1 = n2 = n = 10 the number of pupils in each
group. The K-S two sample Test Table for small samples is used to find
the critical value KD for n1 = n2 ≤ 40
(iii) Computing the test-statistic, we have:

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 54 / 71


Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 55 / 71
D = max |(Sn1 (x ) − (Sn2 (x )| = 7/10,
from which we have KD = 7.
From table: K-S two sample for small samples, the critical value of KD is 7
at α = 0.01.
(iv) Decision Rule: Reject H0 since KD = 7 is equal to the critical value
and conclude that older pupils make proportionally fewer errors than the
younger pupils.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 56 / 71


2.2 Kolmogorov-Smirnov (K-S) One -Sample Test
The K-S one sample test provides an alternative to the chi-square
goodness-of-test
It is preferably used for continuous probability distribution.
The test involves specifying the cumulative frequency which would
occur under the theoretical frequency and comparing it with observed
or empirical cumulative frequency distribution Fn (y ).

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 57 / 71


Test Procedure:
Let y be a continuous random variable with CDF F(y) and a random
sample of size n with sample observations y1 , y2 , · · · , yn , such that
y1 ≤ y2 ≤ · · · ≤ yn .
(i) Define the cumulative distribution formulations
P(Y ≤ y ) = F (y ) = Fy (Y ), the specified (and theoretical) continuous
cumulative distribution and the empirical distribution functions,
Fn (yi ) = ni ; yi ≤ yi+1 and Fn (yi−1 ) = i−1
n ; yi−1 ≤ yi

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 58 / 71


Test Procedure: Let y be a continuous random variable with CDF F(y)
and a random sample of size n with sample observations y1 , y2 , · · · , yn ,
such that y1 ≤ y2 ≤ · · · ≤ yn .
(i) Define the cumulative distribution functions:
P(Y ≤ y ) = F (y ) = Fy (Y ), the specified (and theoretical)
continuous cumulative distribution and
the empirical distribution functions,
Fn (yi ) = ni ; yi ≤ yi+1 and
Fn (yi−1 ) = i−1 n ; yi−1 ≤ yi

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 59 / 71


(ii) The hypotheses:

H0 : F (y ) = FT (yi ) vrs H1 : F (y ) ̸= FT (yi )


at α level of significance, where y is a continuous random variable having a
specified and theoretical cumulative distribution FT (y )∀y from −∞ to ∞.
(iii) The K-S Test Statistic: D = max |FT (y ) − Fn (y )|.
The test statistic D is thus computed by

D = max (D− , D+ )where :

D− = max |FT (yi ) − Fn (yi−1 )| and D+ = max |Fn (yi ) − FT (yi )|


i<j<n i<j<1

(iv) Decision Rule: Reject H0 if D > the critical value obtained from K-S
one-sample test-statistic Table at α - level of significance and sample size.
(v) Assumptions:
The sample is randomly selected.
The hypothesized distribution FT is continuous.
Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 60 / 71
Remark:
(i) If the theoretical distribution FT (y ) has some parameters unspecified,
then these unknown parameters are estimated using the sample data
before the test is carried out.
(ii) The decision to reject or accept H0 could be taken using the modified
form of the K-S statistic given by Staple (1974) for three cases in the table
below:

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 61 / 71


Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 62 / 71
The three cases being:
(i) A fully specified F(y)
(ii) A normal distribution F(y) with unknown mean and variance.
(iii) An exponential F(y) with unknown mean.
The advantages and disadvantages of the K-S goodness-of-fit test in
comparison with the chi-square test are given below:
Advantages:
1. The K-S test does not require that the observations be grouped as in
the case of the chi-square test. The consequence of this difference is that
the K-S test makes use of all the information present in a set of data.
2. The K-S test can be used with any sample size. It will be recalled that
certain minimum sample sizes are required for the use of the chi-square
test.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 63 / 71


Disadvantage:
1. K-S test is not applicable when parameters have to be estimated from
sample. The chi-square test may be used in these situations by reducing
the degrees of freedom by 1 each parameter estimated.
Example 2.1:
The time (in seconds) between vehicle arrivals at certain intersection was
measured for a certain time period with the following results:

9.0, 10.10, 10.2, 9.3, 9.5, 9.8, 14.2, 16.1.


Use an appropriate test technique to test the hypothesis that these data
are from and exponential distribution. Use α = 0.05.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 64 / 71


Solution: Let y have an exponential distribution.

f (y ) = 1/θe −y /θ , y ≥ 0 where F (y ) = (1 − e −y /θ ), y ≥ 0
(i) Hypotheses:
H0 : F (y ) = FT (y ), specified cumulative exponential distribution.
H1 : F (y ) ̸= FT (yi ) at α = 0.05 level of significance.
(ii) Test-statistic: We estimate θ by
n
1X 88.2
ȳ = yi = = 11.025
n i=1 8

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 65 / 71


Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 66 / 71
D− = max |FT (yi ) − Fn (yi−1 )| = 0.5579
i<j<n

D+ = max |Fn (yi ) − FT (yi )| = 0.4347


i<j<1

Hence the test statistic,

D+ = max(D− , D+ ) = max(0.5579, 0.4329) = 0.5579,


which is compared with the critical value 0.454, obtained from K-S one
sample test table at α = 0.05.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 67 / 71


(iii) Conclusion:
Reject H0 since D is greater than the critical value and conclude that the
data do not conform to the exponential distribution.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 68 / 71


Exercise
1. A random sample of size 10 gives the following data:
0.621, 0.503, 0.203, 0.477, 0.710, 0.581, 0.329, 0.480, 0.554 and 0.382.
Test the hypothesis that the observations are from the uniform distribution
over the interval [0,1]. Take α = 0.10.
2. The following values are realizations of a random variable Y:100, 150,
225, 290, 300 and 500. You want to test whether the data come from a
Pareto distribution with parameters α = 3 and θ = 600. What is the value
of the K-S test-statistic, stating conclusion of the test at α = 0.05.
600 3
Hint: F (y ) = 1 − ( )
y + 600

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 69 / 71


3. A study was conducted to determine whether persons in high authority
would show a greater tendency to possess stereotypes about members of
various ethnic groups than would those in low authority. The hypothesis
was tested (using α = 0.01) with a group of 100.
Test the hypothesis that X has a binomial distribution with n = 4 and p =
1⁄2, using α = 0.05 on the basis of the following results:

X 0 1 2 3 4
f 6 38 58 47 11

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 70 / 71


Thank You
For any concerns, please contact
[email protected]
[email protected]
0322 191132.

Dr. S.K. Appiah Non Parametric Statistics May 10, 2022 71 / 71

You might also like