Non Parametric Method
Non Parametric Method
Nonparametric
(Distribution-Free)
Statistical Methods
Many of the inferential techniques presented in earlier
chapters required specic assumptions about the shape of
the relevant population or treatment distributions. For
example, when sample sizes are small, the validity of the
two-sample t test and of the ANOVA F test rests on the
assumption of normal population distributions. This
chapter introduces some inferential procedures that do not
depend on specic assumptions about the population dis-
tributions, such as normality. Such methods are described
as nonparametric or distribution-free.
Make the most of your study time by accessing everything you need to succeed
online with CourseMate.
Visit http://www.cengagebrain.com where you will find:
An interactive eBook, which allows you to take notes, highlight, bookmark, search
the text, and use in-context glossary definitions
Step-by-step instructions for Minitab, Excel, TI-83/84, SPSS, and JMP
Video solutions to selected exercises
Data sets available for selected examples and exercises
Online quizzes
Flashcards
Videos
16-1
The two population or treatment response distributions have the same shape
and spread. The only possible difference between the distributions is that one
may be shifted to one side or the other.
Distributions consistent with these assumptions are shown in Figure 16.1(a). The
distributions shown in Figure 16.1(b) have different shapes and spreads and so the
methods of this section would not be appropriate in this case. Inferences that involve
comparing distributions with different shapes and spreads can be quite complicated;
if you encounter that situation, good advice from a statistician is particularly
important.
FIGURE 16.1
Two possible population distribution
pairs: (a) same shape and spread,
differing only in location;
(b) very different shape and spread. (a) (b)
Procedures that do not require any overly specic assumptions about the popula-
tion distributions are said to be distribution-free or nonparametric. The two-
sample t test with small samples is not distribution-free because its appropriate use
depends on the specic assumption of (at least approximate) normality.
Inferences about m1 2 m2 are made using information from two independent
random samples, one consisting of n1 observations from the rst population and the
other consisting of n2 observations from the second population. Suppose that the two
population distributions are in fact identical (so that m1 5 m2). In this case, each of
the n1 n2 observations is actually drawn from the same population distribution.
The distribution-free procedure presented here is based on regarding the n1 n2
observations as a single data set and assigning ranks to the ordered values. The assign-
ment is easiest when there are no ties among the n1 n2 values (each observation is
different from every one of the other observations), so assume for the moment that
this is the case. Then the smallest among the n1 n2 values receives rank 1, the sec-
ond smallest rank 2, and so on, until nally the largest value is assigned rank n1 n2.
This procedure is illustrated in Example 16.1.
38 40 42 44
Rank 1 2 3 4 5 6 78 9 10
Sample 1
Sample 2
The ranks of the ve observations in the rst sample are 3, 6, 8, 9, and 10. If
these ve observations had all been larger than every value in the second sample, the
corresponding ranks would have been 6, 7, 8, 9, and 10. On the other hand, if all ve
Sample 1 observations had been less than each value in the second sample, the ranks
would have been 1, 2, 3, 4, and 5. The ranks of the ve observations in the rst
Data set available online sample might be any set of ve numbers from among 1, 2, 3, p ,9, 10there are
actually 252 possibilities.
Testing Hypotheses
Lets rst consider testing
H0: m1 m2 0 (m1 m2) versus Ha: m1 m2 0 (m1 m2)
If H0 is true, all n1 n2 observations in the two samples are actually drawn from
identical population distributions. We would then expect that the observations in
the rst sample would be intermingled with those of the second sample when plot-
ted along the number line. In this case, the ranks of the observations should also
be intermingled. For example, with n1 5 and n2 5, the set of Sample 1 ranks
2, 3, 5, 8, 10 would be consistent with m1 m2, as would the set 1, 4, 7, 8, 9.
However, when m1 m2, it would be quite unusual for all ve values from Sample
1 to be larger than every value in Sample 2, resulting in the set 6, 7, 8, 9, 10 of
Sample 1 ranks.
A convenient measure of the extent to which the ranks are intermingled is the
sum of the Sample 1 ranks. These ranks in Example 16.1 were 3, 6, 8, 9, and 10, so
rank sum 3 6 8 9 10 36
That is, when H0 is true, a rank sum at least as large as 36 would be observed only
about 4.76% of the time. Thus, a test of H0: m1 m2 0 versus Ha: m1 m2 0,
based on n1 5 and n2 5 and a rank sum of 36, would have an associated P-value
of .0476. It is this type of reasoning that allows us to reach a conclusion about
whether or not to reject H0.
The process of looking at different rank-sum sets to carry out a test can be quite
tedious. Fortunately, information about both one- and two-tailed P-values associated
with values of the rank-sum statistic has been tabulated. Chapter 16 Appendix
Table 1 provides information on P-values for selected values of n1 and n2. For ex-
ample, when n1 and n2 are both 5, Chapter 16 Appendix Table 1 tells us that, for an
upper-tailed test with a rank-sum statistic value of 36, P-value .05. This is consis-
tent with the value of .0476 computed previously. Had the rank sum been 40, using
the table, we would have concluded that the P-value was less than .01 (because 40 is
greater than the tabled value of 39). The use of this appendix table is further illus-
trated in the examples that follow.
*This procedure is often called the Wilcoxon rank-sum test or the MannWhitney test, after the statisti-
cians who developed it. Some sources use a slightly different (but equivalent) test statistic formula.
Many statistical computer packages can perform the rank-sum test and give exact
P-values. Partial Minitab output for the data of Example 16.2 follows (Minitab uses
the symbol W to denote the rank-sum statistic and the terms ETA1 and ETA2 in
place of m1 and m2, but the test statistic value and the associated P-values are the same
as for the test presented here):
Unexposed N=7 Median = 14.00
Exposed N=8 Median = 110.00
Point estimate for ETA1 ETA2 is 79.00
95.7 Percent CI for ETA1 ETA2 is (156.00, 23.99)
W = 33.0
Test of ETA1 = ETA2 vs ETA1 < ETA2 is signicant at 0.0046
The output indicates that W 33 and that the test is signicant at .0046.
These two values, 33 and .0046, are the rank-sum statistic and the P-value, respec-
tively. In Example 16.2, we used Chapter 16 Appendix Table 1 to determine that
P-value .01. This statement is consistent with the actual P-value given in the
Minitab output.
The test procedure just described can be easily modied to handle a hypothesized
value other than 0. Consider as an example testing H0: m1 m2 5. This hypothesis
is equivalent to (m1 5) m2 0. That is, if 5 is subtracted from each Population
1 value, then according to H0, the distribution of the resulting values coincides with
the Population 2 distribution. This suggests that, if the hypothesized value of 5 is rst
subtracted from each Sample 1 observation, the test can then be carried out as
before.
To test H0: m1 m2 hypothesized value, subtract the hypothesized value from each
observation in the rst sample and then determine the ranks when these modied
sample 1 values are combined with the n2 observations from the second sample.
Frequently the n1 n2 observations in the two samples are not all different from one
another. When this occurs, the rank assigned to each observation in a tied group is
the mean of the ranks that would be assigned if the values in the group all differed
slightly from one another. Consider, for example, the 10 ordered values
5.6 6.0 6.0 6.3 6.8 7.1 7.1 7.1 7.9 8.2
If the two 6.0 values differed slightly from each other, they would be assigned ranks
2 and 3. Therefore, each one is assigned rank (2 3)/2 2.5. If the three 7.1 obser-
vations were all slightly different, they would receive ranks 6, 7, and 8, so each of the
three is assigned rank (6 7 8)/3 7. The ranks for the above 10 observations
are then
1 2.5 2.5 4 5 7 7 7 9 10
If the proportion of tied values is quite large, it is recommended that the rank-
sum statistic be multiplied by a correction factor. Consult the references by Conover,
Daniel, or Mosteller and Rourke for additional information.
Chapter 16 Appendix Table 1 contains information about P-values for the rank-
sum test when n1 8 and n2 8. More extensive tables exist for other combinations
of sample size, but with larger sample sizes, you may want to use a statistical software
package to compute the value of the test statistic and the associated P-value. There is
also a test procedure based on using a normal distribution to approximate the sam-
pling distribution of the rank-sum statistic. This alternative procedure is often used
when the two sample sizes are larger than 8.
The rank-sum condence interval for m1 m2 is the interval consisting of all hy-
pothesized values for which
H0: m1 m2 hypothesized value
cannot be rejected when using a two-tailed test.
A 95% condence interval consists of those hypothesized values for which the previous
null hypothesis is not rejected by a test with signicance level a .05. A 99%
condence interval is associated with a level a .01 test, and a 90% condence interval
is associated with a level a .10 test.
Thus, if H0: m1 m2 100 cannot be rejected at level .05, then 100 is included
in the 95% condence interval m1 m2.
Remember that Minitab uses ETA1 ETA2 in place of m1 m2. Also note that
it was not possible to construct an interval for an exact 95% condence level. Minitab
calculated a 96.4% condence interval. The reported interval is (4730, 8480). Based
on the sample information, we estimate that the difference in mean crushing strength
for epoxy-impregnated bark board and board impregnated with a different polymer
is between 4730 and 8480 psi.
E X E RC I S E S 1 6 . 1 - 1 6 . 7
16.1 Urinary uoride concentration (in parts per concentration for both grazing areas have the same shape
million) was measured for both a sample of livestock that and spread, and use a level .05 rank-sum test.
had been grazing in an area previously exposed to
Polluted 21.3 18.7 23.0 17.1 16.8 20.9 19.7
uoride pollution and a similar sample of livestock that
Unpolluted 14.2 18.3 17.2 18.4 20.0
had grazed in an unpolluted region. Do the accompany-
ing data indicate strongly that the mean uoride concen- 16.2 A modication has been made to the process for
tration for livestock grazing in the polluted region is producing a certain type of lm. Because the modication
larger than that for livestock grazing in the unpolluted involves extra cost, it will be incorporated only if sample
region? Assume that the distributions of urinary uoride data strongly indicate that the modication decreases
Bold exercises answered in back Data set available online Video Solution available
mean developing time by more than 1 second. Assuming 16.5 The effectiveness of antidepressants in treating
that the developing-time distributions have the same the eating disorder bulimia was examined in the article
shape and spread, use the rank-sum test at level .05 and Bulimia Treated with Imipramine: A Placebo-Controlled
the following data to test the appropriate hypotheses: Double-Blind Study (American Journal of Psychology
Original process 8.6 5.1 4.5 5.4 6.3 6.6 5.7 8.5 [1983]: 554558). A group of patients diagnosed with bu-
Modied process 5.5 4.0 3.8 6.0 5.8 4.9 7.0 5.7 limia were randomly assigned to one of two treatment
groups, one receiving imipramine and the other a placebo.
16.3 The study reported in Gait Patterns During One of the variables recorded was binge frequency. The
Free Choice Ladder Ascents (Human Movement Sci- authors chose to analyze the data using a rank-sum test
ence [1983]: 187195) was motivated by publicity con- because it makes no assumption of normality. They stated
cerning the increased accident rate for individuals climb- that because of the wide range of some measures, such as
ing ladders. A number of different gait patterns were frequency of binges, the rank sum is more appropriate and
used by subjects climbing a portable straight ladder ac- somewhat more conservative. Data on number of binges
cording to specied instructions. The following data during one week that are consistent with the ndings of the
consist of climbing times for seven subjects who used a article are given in the following table:
lateral gait and six subjects who used a four-beat diagonal
gait: Placebo 8 3 15 3 4 10 6 4
Imipramine 2 1 2 7 3 12 1 5
Lateral gait 0.86 1.31 1.64 1.51 1.53 1.39 1.09
Diagonal 1.27 1.82 1.66 0.85 1.45 1.24 Do these data strongly suggest that imipramine is effec-
gait tive in reducing the mean number of binges per week?
Use a level .05 rank-sum test.
a. Use the rank-sum test to decide whether the data
suggest a difference in the mean climbing times for 16.6 In an experiment to compare the bond strength
the two gaits. of two different adhesives, each adhesive was used in ve
b. Interpret the 95% condence interval for the differ- bondings of two surfaces, and the force necessary to sepa-
ence between the mean climbing times given in the rate the surfaces was determined for each bonding. For
following Minitab output: Adhesive 1, the resulting values were 229, 286, 245, 299,
and 259, whereas the Adhesive 2 observations were 213,
lateral N=7 Median = 1.3900 179, 163, 247, and 225. Let m1 and m2 denote the mean
diagonal N=6 Median = 1.3600 bond strengths of Adhesives 1 and 2, respectively. Inter-
Point estimate for ETA1 ETA2 is 0.0400
pret the 90% distribution-free condence interval esti-
96.2 percent C.I. for ETA1 ETA2 is (0.4300, 0.3697)
mate of m1 m2 given in the Minitab output shown here:
16.4 A blood lead level of 70 mg /ml has been com- adhes. 1 N=5 Median = 259.00
monly accepted as safe. However, researchers have noted adhes. 2 N=5 Median = 213.00
that some neurophysiological symptoms of lead poison- Point estimate for ETA1 ETA2 is 61.00
ing appear in people whose blood lead levels are below 90.5 Percent C.I. for ETA1 ETA2 is (16.00, 95.98)
70 mg/ml. The article Subclinical Neuropathy at Safe
Levels of Lead Exposure (Archives of Environmental 16.7 The article A Study of Wood Stove Particu-
Health [1975]: 180183) gave the following nerve- late Emissions ( Journal of the Air Pollution Control
conduction velocities for a group of workers who were Association [1979]: 724728) reported the following data
exposed to lead in the workplace but whose blood lead on burn time (in hours) for samples of oak and pine:
levels were below 70 mg/ml and for a group of controls Oak 1.72 0.67 1.55 1.56 1.42 1.23 1.77 0.48
who had no exposure to lead: Pine 0.98 1.40 1.33 1.52 0.73 1.20
Exposed to lead 46 46.5 43 41 38 36 31 An estimate of the difference between mean burn time
Control 54 50.5 46 45 44 42 41 for oak and mean burn time for pine is desired. Interpret
Use a level .05 rank-sum test to determine whether there the interval given in the following Minitab output:
is a signicant difference in mean conduction velocity Oak N = 8 Median = 1.4850
between workers exposed to lead and those not exposed Pine N = 6 Median = 1.2650
to lead. Point estimate for ETA1 ETA2 is 0.2100
95.5 Percent C.I. for ETA1 ETA2 is (0.4998, 0.5699)
Bold exercises answered in back Data set available online Video Solution available
If there are ties in the differences, the average of the appropriate ranks is assigned,
as was the case with the rank-sum test in Section 16.1.
The signed-rank test statistic for testing H0: md 0 is the signed-rank sum,
which is the sum of the signed ranks. A large positive sum suggests that md 0, since,
if this were the case, most differences would be positive and larger in magnitude than
the few negative differences; most of the ranks, and especially the larger ones would
then be positively signed. Similarly, a large negative sum would suggest md 0. A
signed-rank sum near zero would be compatible with H0: md 0.
Patient 1 2 3 4 5
Before Surgery 107 102 95 106 112
After Surgery 87 97 101 113 80
Difference 20 5 26 27 32
We can determine whether the mean blood pressure before surgery exceeds the
mean blood pressure 2 months after surgery by testing
where m1 denotes the mean diastolic blood pressure for patients in renal failure and
m2 denotes the mean blood pressure for patients 2 months after surgery (equivalent
hypotheses are H0: md 0 and Ha: md 0 where md is the mean difference in blood
pressure).
A normal probability plot for this set of differences follows. Since the plot appears
to be more S-shaped than linear, the assumption of a normal difference distribution
is questionable. If it is reasonable to assume that the difference distribution is sym-
metric, a test based on the signed ranks can be used.
30
20
Difference
10
10
1 0 1
Normal score
The absolute values of the differences and the corresponding ranks are as follows.
Absolute Difference 5 6 7 20 32
Rank 1 2 3 4 5
Associating the appropriate sign with each rank then yields signed ranks 1, 2,
3, 4, and 5, and a signed-rank sum of 1 2 3 4 5 5.
The largest possible value for this sum would be 15, occurring only when all differ-
ences are positive. There are 32 possible ways to associate signs with ranks 1, 2, 3, 4, and
5, and 10 of them have rank sums of at least 5. When the null hypothesis H0: md 0 is
true, each of the 32 possible assignments is equally likely to occur, and so
10
P(signed-rank sum 5 when H0 is true) 5 .3125
32
Therefore, the observed sum of 5 is compatible with H0it does not provide evi-
dence that H0 should be rejected.
Signed-rank sum 15 13 11 9 7 5 3 1
Number of rankings yielding sum 1 1 1 2 2 3 3 3
Signed-rank sum 1 3 5 7 9 11 13 15
Number of rankings yielding sum 3 3 3 2 2 1 1 1
Swimmer 1 2 3 4 5 6 7 8 9 10
Flat Entry 1.13 1.11 1.18 1.26 1.16 1.41 1.43 1.25 1.33 1.36
Hole Entry 1.07 1.03 1.21 1.24 1.33 1.42 1.35 1.32 1.31 1.33
Difference .06 .08 2.03 .02 2.17 2.01 .08 2.07 .02 .03
The authors of the paper used a signed-rank test with a .05 significance level to de-
termine if there is a difference between the mean time to water entry for the two entry
methods. Ordering the absolute differences results in the following assignment of
signed ranks.
Difference .01 .02 .02 .03 .03 .06 .07 .08 .08 .17
Signed Rank 1 2.5 2.5 4.5 4.5 6 7 8.5 8.5 10
1. Let md denote the mean difference in time to water entry for at and hole
entry.
2. H0: md 0
3. Ha: md 0
4. Test statistic: signed-rank sum
5. With n 10 and a .05, Chapter 16 Appendix Table 2 gives 39 as the critical
value for a two-tailed test. Therefore, H0 will be rejected if either the signed-rank
sum 39 or signed-rank sum 39.
6. Signed-rank sum 1 2.5 1 p 1 (10) 10
7. Since 10 does not fall in the rejection region, we do not reject H0. There is not
sufcient evidence to indicate that the mean time to water entry differs for the
two methods.
Example 16.7 illustrates how zero differences are handled when performing a
signed-rank test. Since zero is considered to be neither positive nor negative, zero
values are generally excluded from a signed-rank analysis, and the sample size is re-
duced accordingly.
samples were taken from 15 healthy adults, and, for each blood sample, the B12 level
was determined using both methods. The resulting data are given here.
Subject 1 2 3 4 5 6 7 8
Method 1 204 238 209 277 197 227 207 205
Method 2 204 238 198 253 180 209 217 204
Difference 0 0 11 24 17 18 10 1
Subject 9 10 11 12 13 14 15
Method 1 131 282 76 194 120 92 114
Method 2 137 250 82 165 79 100 107
Difference 6 32 6 29 41 8 7
We assume that the difference distribution is symmetric and proceed with a signed-
rank test to determine whether there is a signicant difference between the two meth-
ods for measuring B12 content. A signicance level of .05 will be used.
Two of the observed differences are zero. Eliminating the two zeros reduces the
sample size from 15 to 13. Ordering the nonzero absolute differences results in the
following assignment of signed ranks.
1. Let md denote the mean difference in B12 determination for the two methods.
2. H0: md 0
3. Ha: md 0
4. Test statistic: signed-rank sum
5. The form of H0 indicates that a two-tailed test should be used. With n 13 and
a .05, Chapter 16 Appendix Table 2 gives a critical value of 57 (corresponding
to an actual signicance level of .048). Therefore, H0 will be rejected if either the
signed-rank sum 57 or signed-rank sum 57.
6. Signed-rank sum 1 1 122.52 1 122.52 1 p 1 13 5 59
7. Since 59 falls in the rejection regions, H0 is rejected in favor of Ha. We conclude
that there is a signicant difference in measured B12 levels for the two assay
methods.
The procedure for testing H0: md 0 just described can be easily adapted to
test H0: md hypothesized value, where the hypothesized value is something other
than zero.
To test H0: md hypothesized value, subtract the hypothesized value from each dif-
ference prior to assigning signed ranks.
TSI Scores
Patient 1 2 3 4 5 6 7
Patient 8 9 10 11 12 13 14
1. Let md denote the mean difference in TSI score for Deanol and the placebo
treatment.
2. H0: md 1
3. Ha: md 1
4. Test statistic: signed-rank sum
5. The form of H0 indicates that an upper-tailed test should be used. With n 14
and a .01, Chapter 16 Appendix Table 2 gives a critical value of 73. There-
fore, H0 will be rejected if the signed-rank sum equals or exceeds 73.
6. Subtracting 1 from each difference results in the following set of values.
2.2 4.4 .6 .5 .7 2.4 1.3 1 2.5 4.7 2.7 2.9 .6 2.2
Ordering these values and associating signed ranks yields:
Sign
Absolute Difference .5 .6 .6 .7 1 1.3 2.2
Signed Rank 1 2.5 2.5 4 5 6 7.5
Sign
Absolute Difference 2.2 2.4 2.5 2.7 2.9 4.4 4.7
Signed Rank 7.5 9 10 11 12 13 14
Data set available online Then the signed-rank sum 21 1 122.52 1 2.5 1 p 1 12142 5 24.
7. Since 4 73, we fail to reject H0. There is not sufcient evidence to indicate
that the mean TSI score for the drug Deanol exceeds the mean TSI score for a
placebo treatment by more than 1.
A Normal Approximation
Signed-rank critical values for sample sizes up to 20 are given in Chapter 16 Appendix
Table 2. For larger sample sizes, the distribution of the signed-rank statistic when H0
is true can be approximated by a normal distribution.
If n 20, the distribution of the signed-rank sum when H0 is true is well approximated
by the normal distribution with mean 0 and standard deviation !n 1n 1 12 12n 1 12 /6.
This implies that the standardized statistic
signed-rank sum
!n 1n 1 12 12n 1 12 /6
z5
has approximately a standard normal distribution. This z statistic can be used as a test
statistic and the associated P-value can be determined using the z table.
Patient 1 2 3 4 5 6 7 8 9 10 11
Condition 1 62 57 56 55 50.5 50 47.2 43.5 40 40 41
Condition 2 52 46 51 52.4 55 51 43 40 34.2 34 33
Difference 10 11 5 2.6 4.5 1 4.2 3.5 5.8 6 8
Patient 12 13 14 15 16 17 18 19 20 21
Condition 1 33 31 28 27.1 27.5 27 25 19.2 17.5 12
Condition 2 32 38 26 28 28 18 21 18 16 15
Difference 1 7 2 .9 .5 9 4 1.2 1.5 3
Do these data suggest that the mean ventilation is different for the two experi-
mental conditions? Lets analyze the data using a level .05 signed-rank test.
1. Let md denote the mean difference in ventilation between experimental condi-
tions 1 and 2.
2. H0: md 0
Data set available online 3. Ha: md 0
4. a .05
5. Test statistic:
signed-rank sum
!n 1n 1 12 12n 1 12 /6
z5
1 2 3.5 3.5 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21
so
140
z5 5 2.43
57.54
8. Using Appendix Table 2, P-value 5 2P 1z . 2.432 5 2 1.00752 5 .015
9. Since P-value # a we reject H0 in favor of Ha. The sample data do suggest that
the mean ventilation rate differs for the two experimental conditions.
Patient 1 2 3 4 5
Conventional 10 16 17 20 10
Pump 9 7 8 8 6
Difference 1 9 9 12 4
Difference
1 4 9 9 12
1 1 2.5 5 5 6.5
4 4 6.5 6.5 8
Difference 9 9 9 10.5
9 9 10.5
12 12
As you can see, the calculations required to obtain the pairwise averages can be te-
dious, especially for larger sample sizes. Fortunately, many of the standard computer
packages calculate both the signed-rank sum and the signed-rank condence interval.
An approximate 90% signed-rank condence interval from Minitab is as follows:
Wilcoxon Signed Rank CI
Estimated Achieved
N Median Condence Condence Interval
5 6.50 89.4 (2.50, 10.50)
E X E R C I S E S 1 6 . 8 - 1 6 . 18
16.8 The effect of a restricted diet in the treatment of Player Preimpact Postimpact
autistic children was examined in the paper Gluten, Milk
1 26.7 38.2
Proteins, and Autism: Dietary Intervention Effects on 2 44.3 47.2
Behavior and Peptide Secretion (Journal of Applied 3 53.9 61.0
Nutrition [1991]: 18). Ten children with autistic syn- 4 26.4 34.3
drome participated in the study. Peptide secretion was 5 47.6 64.9
measured before diet restrictions and again after a period 6 43.1 44.2
of restricted diet. The resulting data follow. Do these data
suggest that the restricted diet was successful in reducing
16.10 In an experiment to study the way in which
mean peptide secretion? Use the signed-rank test.
different anesthetics affected plasma epinephrine con-
Subject Before After Subject Before After centration, 10 dogs were selected and concentration was
measured while they were under the inuence of the
1 25 10 6 50 19 anesthetics isourane and halothane (Sympathoadrenal
2 22 9 7 15 8 and Hemodynamic Effects of Isourane, Halothane,
3 84 29 8 41 19
and Cyclopropane in Dogs Anesthesiology [1974]:
4 84 7 9 19 14
465470). The resulting data are as follows.
5 60 2 10 27 11
Dog 1 2 3 4 5 6 7 8 9 10
16.9 Peak force (N) on the hand was measured just Isourane .51 1.00 .39 .29 .36 .32 .69 .17 .33 .28
prior to impact and just after impact on a backhand drive Halothane .30 .39 .63 .38 .21 .88 .39 .51 .32 .42
for six advanced tennis players. The resulting data, from
the paper Forces on the Hand in the Tennis One-
Handed Backhand (International Journal of Sport Use a level .05 signed-rank test to see whether the mean
Biomechanics [1991]: 282292), are given in the accom- epinephrine concentration differs for the two anesthetics.
panying table. Use the signed-rank test to determine if What assumption must be made about the epinephrine
the mean postimpact force is greater than the mean pre- concentration distributions?
impact force by more than 6.
Bold exercises answered in back Data set available online Video Solution available
16.11 The accompanying data refer to the concentra- and during growth hormone therapy for 14 children
tion of the radioactive isotope strontium-90 in samples with hypopituitarism.
of nonfat and 2% fat milk from ve dairies. Do the data
strongly support the hypothesis that mean strontium-90 Child 1 2 3 4 5 6 7
concentration is higher for 2% fat milk than for nonfat Before 5.3 3.8 5.6 2.0 3.5 1.7 2.6
milk? Use a level .05 signed-rank test. During 8.0 11.4 7.6 6.9 7.0 9.4 7.9
Dairy 1 2 3 4 5
6.4 5.8 6.5 7.7 6.1
Child 8 9 10 11 12 13 14
Nonfat
2% fat 7.1 9.9 11.2 10.5 8.8 Before 2.1 3.0 5.5 5.4 2.1 3.0 2.4
During 7.4 7.4 7.5 11.8 6.4 8.8 5.0
Initial Velocity
Swimmer 1 2 3 4 5 6 7 8 9 10
Hole 24.0 22.5 21.6 21.4 20.9 20.8 22.4 22.9 23.3 20.7
Flat 25.1 22.4 24.0 22.4 23.9 21.7 23.8 22.9 25.0 19.5
Bold exercises answered in back Data set available online Video Solution available
16.15 The paper Effects of a Rice-rich versus rior and Posterior Fusion ( Journal of Bone Joint Sur-
Potato-rich Diet on Glucose, Lipoprotein, and Cho- gery [1974]: 14191434). Do the data suggest that surgery
lesterol Metabolism in Noninsulin-Dependent Dia- increases the mean lung capacity? Use a level .05 large-
betics (American Journal of Clinical Nutrition sample signed-rank test.
[1984]: 598606) gave the data below on cholesterol 16.17 Using the data of Exercise 16.13, estimate the
synthesis rate for eight diabetic subjects. Subjects were
mean difference in height velocity before and during
fed a standardized diet with potato or rice as the major
growth hormone therapy with a 90% distribution-free
carbohydrate source. Participants received both diets
condence interval.
for specied periods of time, with cholesterol synthesis
rate (mmol/day) measured at the end of each dietary 16.18 The signed-rank test can be adapted for use in
period. The analysis presented in this paper used the testing H0: m 5 hypothesized value, where m is the mean
signed-rank test. Use a test with signicance level .05 of a single population (see the last part of this section).
to determine whether the mean cholesterol synthesis Suppose that the time required to process a request at a
rate differs signicantly for the two sources of banks automated teller machine is recorded for each of
carbohydrates. 10 randomly selected transactions, resulting in the fol-
lowing times (in minutes); 1.4, 2.1, 1.9, 1.7, 2.4, 2.9,
16.16 The data below on pre- and postoperative lung
1.8, 1.9, 2.6, 2.2. Use the one-sample version of the
capacities for 22 patients who underwent surgery as
signed-rank test and a .05 signicance level to decide if
treatment for tuberculosis kyphosis of the spine appeared
the data indicate that the mean processing time exceeds
in the paper Tuberculosis Kyphosis, Correction with
2 minutes.
Spinal Osteotomy, Halo-Pelvic Distractor, and Ante-
Patient 12 13 14 15 16 17 18 19 20 21 22
Preoperative 1440 1770 2850 2860 1530 3770 2260 3370 2570 2810 2990
Postoperative 1680 1750 3730 3430 1570 3750 2840 3500 2640 3260 3100
Bold exercises answered in back Data set available online Video Solution available
Basic Assumption
The k population or treatment distributions all have the same shape and spread.
We assign rank 1 to the smallest observation among all N in the k samples, rank
2 to the next smallest, and so on (for the moment lets assume that there are no tied
observations). The average of all ranks assigned is
112131p1N N11
5
N 2
If all ms are equal, the average of the ranks for each of the k samples should be
N11
reasonably close to (since the observations will typically be intermingled, their
2
ranks will be also). On the other hand, large differences between some of the ms
N11
will usually result in some samples having average ranks much below (those
2
samples that contain mostly small observations), whereas others will have average
N11
ranks considerably exceeding . The KW statistic measures the discrepancy
2
N11
between the average rank in each of the k samples and the overall average .
2
DEFINITION
Let r1 denote the average of the ranks for observations in the rst sample, r2 denote
the average rank for observations in the second sample, and let r3, p , rk denote the
analogous rank averages for samples 3, p , k. Then the KW statistic is
N11 2 N11 2
c n1 ar1 2 b 1 n2 ar2 2 b
12
N 1 N 1 12
KW 5
2 2
N11 2
1 % 1 nk ark 2 b d
2
and the other rank averages are r2 5 12.7, r3 5 10.5, and r4 5 7.4. The average
of all ranks assigned is
N11 23
5 5 11.5
2 2
so
1174.742 5 4.14
12
1222 1232
5
1. Let m1, m2, m3, and m4 denote the mean starting salaries for all graduates in each
of the four disciplines respectively.
2. H0: m1 m2 m3 m4
3. Ha: At least two of the four ms are different
4. Test statistic:
N11 2 N11 2
cn1 ar1 2 b 1 n2 ar2 2 b
12
N 1 N 1 12
KW 5
2 2
N11 2
1 p 1 n4 ar4 2 b d
2
5. Rejection region: The number of df for the chi-squared approximation is
k 1 3. For a .05, Chapter 16 Appendix Table 4 gives 7.82 as the critical
value. H0 will be rejected if KW 7.82.
6. We previously computed KW as 4.14.
7. The computed KW value 4.14 does not exceed the critical value 7.82, so H0
should not be rejected. The data do not provide enough evidence to conclude
that the mean starting salaries for the four disciplines are different.
When there are tied values in the data set, ranks are determined as they were for
the rank-sum testby assigning each tied observation in a group the average of the
ranks they would receive if they all differed slightly from one another.
Rejection of H0 by the KW test can be followed by the use of an appropriate
multiple comparison procedure. Also, the most widely used statistical computer pack-
ages will perform a KW test.
The KW test does not require normality, but it does require equal population or
treatment-response distribution variances (all distributions must have the same
spread). If you encounter a data set for which variances appear to be quite different,
you should consult a statistician for advice.
Basic Assumption
Observations in the experiment are assumed to have been selected from distri-
butions having exactly the same shape and spread, but the mean value may
depend separately both on the treatment applied and on the block.
are k observations in any block). Then the rank averages r1, r2, p , rk for treatments 1,
2, p , k, respectively, are computed. When H0 is false, some treatments will tend to
receive small ranks in most blocks, whereas other treatments will tend to
receive mostly large ranks. In this case the rs will tend to be rather different. On
the other hand, when H0 is true, all the rs will tend to be close to the same value
(k 1)/2, the average of the ranks 1, 2, p , k. The test statistic measures the discrep-
ancy between the rs and (k 1)/2. A large discrepancy suggests that H0 is false.
Friedmans Test
After ranking observations separately from 1 to k within each of the l blocks, let
r1, r2, p , rk denote the resulting rank averages for the k treatments. The test
statistic is
k11 2 k11 2 k11 2
c ar1 2 b 1 ar2 2 b 1 p 1 ark 2 b d
12l
k 1 k 1 12
Fr 5
2 2 2
As long as l is not too small, when H0 is true Fr has approximately a chi-squared
distribution based on k 1 df. The rejection region for a test that has approxi-
mate level of signicance a is then Fr chi-square critical value.
Salesperson
Cancellation
Rate 1 2 3 4 5 6 7 8 9
1973 2.8 5.9 3.3 4.4 1.7 3.8 6.6 3.1 0.0
1974 3.6 1.7 5.1 2.2 2.1 4.1 4.7 2.7 1.3
1975 1.4 .9 1.1 3.2 .8 1.5 2.8 1.4 .5
1976 2.0 2.2 .9 1.1 .5 1.2 1.4 3.5 1.2
Salesperson
Rank 1 2 3 4 5 6 7 8 9 ri
1973 3 4 3 4 3 3 4 3 1 3.11
1974 4 2 4 2 4 4 3 2 4 3.22
1975 1 1 2 3 2 2 2 1 2 1.78
1976 2 3 1 1 1 1 1 4 3 1.89
Data set available online
H0: mean cancellation rate is the same for all four years
Ha: mean cancellation rates differ for at least two of the years
Test statistic: Fr
Rejection region: With a .05 and k 1 3, chi-square critical value 7.82.
H0 will be rejected at level of signicance .05 if Fr 7.82.
k11
Computations: Using 5 2.5,
2
1122 192
3 13.11 2 2.52 2 1 13.22 2 2.52 2 1 11.78 2 2.52 2 1 11.89 2 2.52 2 4
1 42 1 52
Fr 5
5 9.62
Conclusion: Since 9.62 7.82, H0 is rejected in favor of Ha. Mean cancellation rate
is not the same for all four years.
E X E R C I S E S 1 6 . 1 9 - 1 6 . 25
16.19 The paper The Effect of Social Class on Brand Treatment Concentration (mg/g)
and Price Consciousness for Supermarket Products
I 8.1 5.9 7.0 8.0 9.0
(Journal of Retailing [1978]: 3342) used the Kruskal
II 11.5 10.9 12.1 10.3 11.9
Wallis test to determine if social class (lower, middle,
III 15.3 17.4 16.4 15.8 16.0
and upper) inuenced the importance (scored on a scale IV 23.0 33.0 28.4 24.6 27.7
of 1 to 7) attached to a brand name when purchasing
paper towels. The reported value of the KW statistic was
.17. Use a .05 signicance level to test the null hypoth- 16.22 The paper Physiological Effects During Hyp-
esis of no difference in the mean importance score for the notically Requested Emotions (Psychosomatic Medi-
three social classes. cine [1963]: 334343) reported data (see page 16-28) on
skin potential (mV) when the emotions of fear, happiness,
16.20 Protoporphyrin levels were determined for depression, and calmness were requested from each of
three groups of peoplea control group of normal eight subjects. Do the data suggest that the mean skin
workers, a group of alcoholics with sideroblasts in their potential differs for the emotions tested? Use a signicance
bone marrow, and a group of alcoholics without sidero- level of .05.
blasts. The given data appeared in the paper Erythro-
cyte Coproporphyrin and Protoporphyrin in Ethanol- 16.23 In a test to determine if soil pretreated with
Induced Sideroblastic Erythroporiesis (Blood [1974]: small amounts of Basic-H makes the soil more permeable
291295). Do the data (see page 16-28) suggest that nor- to water, soil samples were divided into blocks and each
mal workers and alcoholics with and without sideroblasts block received all four treatments under study. The treat-
differ with respect to mean protoporphyrin level? Use the ments were (1) water with .001% Basic-H on untreated
KW test with a .05 significance level. soil; (2) water without Basic-H on untreated soil; (3) water
with Basic-H on soil pretreated with Basic-H; and (4)
16.21 The given data on phosphorus concentration in water without Basic-H, on soil pretreated with Basic-H.
topsoil for four different soil treatments appeared in the Using a signicance level of .01, determine if mean perme-
article Fertilisers for Lotus and Clover Establishments ability differs for the four treatments. (Data on page
on a Sequence of Acid Soils on the East Otago Up- 16-28)
lands (New Zealand Journal of Experimental Agricul-
ture [1984]: 119129). Use the KW test and a .01
signicance level to test the null hypothesis of no differ-
ence in true mean phosphorus concentration for the four
soil treatments.
Bold exercises answered in back Data set available online Video Solution available
16.24 The following data on amount of food con- 16.25 The article Effect of Storage Temperature
sumed (g) by eight rats after 0, 24, and 72 hours of food on the Viability and Fertility of Bovine Sperm Diluted
deprivation appeared in the paper The Relation Be- and Stored in Caprogen (New Zealand Journal of
tween Differences in Level of Food Deprivation and Agricultural Research [1984]: 173177) examined the ef-
Dominance in Food Getting in the Rat (Psychological fect of temperature on sperm survival. Survival data for
Science [1972]: 297298). Do the data indicate a differ- various storage times are given below. Use Friedmans
ence in the mean food consumption for the three experi- test with a .05 signicance level to determine if survival
mental conditions? Use a 5 .01. is related to storage temperature. Regard time as the
blocking factor.
Rat
Hours 1 2 3 4 5 6 7 8 Storage Storage Time (hours)
Tempera-
0 3.5 3.7 1.6 2.5 2.8 2.0 5.9 2.5 ture C 6 24 48 120 168
24 5.9 8.1 8.1 8.6 8.1 5.9 9.5 7.9
15.6 61.9 59.6 57.0 58.8 53.7
72 13.9 12.6 8.1 6.8 14.3 4.2 14.5 7.9
21.1 62.5 60.0 57.4 59.3 54.9
26.7 60.7 55.5 54.5 53.3 45.3
32.2 59.9 48.6 42.6 36.6 24.8
Bold exercises answered in back Data set available online Video Solution available
TA B L E 1 (continued)
Upper-tailed test Lower-tailed test Two-tailed test
P-value , .05 P-value , .01 P-value , .05 P-value , .01 P-value , .05 P-value , .01
if rank sum is if rank sum is if rank sum is if rank sum is if rank sum if rank sum
greater than greater than less than or less than or is not is not
n1 n2 or equal to or equal to equal to equal to between* between
8 3 57 59 39 37 58,38 60,36
8 4 62 66 42 38 64,40 67,37
8 5 68 72 44 40 70,42 73,39
8 6 73 78 47 42 76,44 80,40
8 7 79 84 49 44 81,47 86,42
8 8 84 90 52 46 87,49 92,44
*Including endpoints. For example, when n1 3 and n2 4, P-value .05 if 6 rank sum 18.
T A B L E 2 (continued)
Signicance Level Signicance Level
n for One-Tailed Test for Two-Tailed Test Critical Value
12 .010 .020 58
.026 .052 50
.046 .092 44
.102 .204 34
13 .011 .022 65
.024 .048 57
.047 .094 49
.095 .190 39
14 .010 .020 73
.025 .050 63
.052 .104 53
.097 .194 43
15 .011 .022 80
.024 .048 70
.047 .094 60
.104 .208 46
16 .011 .022 88
.025 .050 76
.052 .104 64
.096 .192 52
17 .010 .020 97
.025 .050 83
.049 .098 71
.103 .206 55
18 .010 .020 105
.025 .050 91
.049 .098 77
.098 .196 61
19 .010 .020 114
.025 .050 98
.052 .104 82
.098 .196 66
20 .010 .020 124
.024 .048 106
.049 .098 90
.101 .202 70
13 99.0 11
95.2 18
90.6 22