Session 8 DEN1015H 2012 Lecture Notes & Review Problems With Solutions
Session 8 DEN1015H 2012 Lecture Notes & Review Problems With Solutions
1
Prepared by Dr. Herenia P. Lawrence
Table 1. Summary of the main non-parametric methods.
Mann-Whitney U test Alternatives to the Wilcoxon rank sum test Two-sample t test
Kendall’s S test which give identical results Two-sample t test
2
goodness of fit test Comparison of an observed frequency
distribution with a theoretical one
Kolmogorov-Smirnov tests
One-sample Alternative to 2 goodness of fit test
Two-sample Comparison of two frequency distributions
2
Prepared by Dr. Herenia P. Lawrence
NON-PARAMETRIC TESTS FOR PAIRED DATA
Example: A study was designed to evaluate a new pain relieving medication used in endodontic
therapy. The study was based on 9 patients’ reports regarding the degree of pain or discomfort
(scale ranging from 0 to 3, with 0 = no pain and 3 = severe pain) during treatment using the new
medication compared with the discomfort during a separate, similar occasion using the regular
control medication. The results are as follows:
N=8, because the one tied pair is ignored. Six patients felt greater pain with the control
medication as compared with the test medication (T=6). The null hypothesis is that control and
test medications are equally effective. If there is no effect of medication, you would expect that,
on average, half the signs would be + and half -. The statistical question is, “What is the
likelihood that a 2/6 split of pluses and minuses could occur by chance?”
3
Prepared by Dr. Herenia P. Lawrence
CRITICAL VALUES OF T FOR THE EXACT SIGN TEST
(two-sided) 0.10 0.05 (two-sided) 0.10 0.05
N (one-sided) 0.05 0.025 N (one-sided) 0.05 0.025
1 11 2, 9 1, 10
2 12 2, 10 2, 10
3 13 3, 10 2, 11
4 14 3, 11 2, 12
5 0, 5 15 3, 12 3, 12
6 0, 6 0, 6 16 4, 12 3, 13
7 0, 7 0, 7 17 4, 13 4, 13
8 1, 7 0, 8 18 5, 13 4, 14
9 1, 8 1, 8 19 5, 14 4, 15
10 1, 9 1, 9 20 5, 15 5, 15
Source: Weintraub JA, Douglass CW, Gillings DB. Biostats. Data analysis for dental health care
professionals (2nd ed.). Research Triangle Park, NC: CAVCO Inc., 1994.
For N=8 there must be 7 positive differences (or conversely 1 negative difference) to obtain one-
sided significance at the 0.05 level. To obtain two-sided significance at the 0.05 level all eight
differences would have to be positive. Thus the conclusion is not to reject the null hypothesis that
the test medication and the control medication have equal probability of alleviating discomfort.
1. Exclude any differences that are zero. Put the remaining differences in ascending order of
magnitude, ignoring their signs and give them ranks 1, 2, 3, etc. If any differences are equal,
average their ranks.
2. Count up the ranks of the positive differences and of the negative differences and denote these
sums by T+ and T-, respectively.
3. If there were no differences in the two groups then the sums T+ and T- would be similar. If
there were a difference then one sum would be much smaller and one sum would be much
larger than expected. Denote the smaller sum by T.
T = smaller of T+ and T-
4
Prepared by Dr. Herenia P. Lawrence
4. The Wilcoxon Signed-rank test is based on assessing whether T, the smaller of the observed
sums, T+ and T-, is smaller than could be expected by chance. Compare the value obtained with
the critical values for the 5%, 2% and 1% significance levels given in Table A7 (see attached).
Note that the appropriate sample size, N, is the number of differences that were ranked rather
than the total number of differences, and does not therefore include the zero differences.
In contrast to the usual situation, a result is significant if T is smaller than the critical value
shown. Thus, if T is less than the critical value, then reject the null hypothesis.
In the example below, the sample size is 9 and the 5%, 2%, and 1% two-sided percentage
points are 6, 3, and 2 respectively. The result is therefore significant at the 5% level, since 5.5
is smaller than 6, indicating that the fluoride toothpaste was more effective than the non-
fluoride toothpaste.
Example: In a trial to compare a fluoride and a non-fluoride toothpaste, pairs of children were
matched for sex, age, social class and numbers of surfaces exposed. One child in each pair was
given each type of paste. Five years later the children were examined and the numbers of decayed
surfaces were counted:
The smaller value is the sum of the ranks of the negative differences, 5.5. Looking in the table of
critical values for the Wilcoxon Signed-rank test (Table A7 attached), 5.5 is less than the value 6
for 9 pairs (N), so p<0.05 (5%) for a two-tailed test. This means that there is no reason to accept
the null hypothesis and we conclude that the non-fluoride toothpaste group has considerably more
decay than the fluoride matched-pair group. How does this result compare with that obtained
using a paired t test?
5
Prepared by Dr. Herenia P. Lawrence
NON-PARAMETRIC TESTS FOR UNPAIRED DATA
1. Rank the observations from both groups together in ascending order of magnitude, as shown in
the example below. If any of the values are equal, average their ranks.
2. Add up the ranks in the group with the smaller sample size. In this case it is the ‘patients’, and
their ranks add up to 116.5. If the two groups are of the same size either one may be picked.
3. Compare this sum with the critical ranges given in Table A8 (see attached), which is arranged
somewhat differently to the tables for other significant tests. Look up the row corresponding to
the sample sizes of the two groups, in this case row 9, 11. The range shown for the 5%
significance level is 68 to 121 and corresponds to non-significant values. In other words, sums
of 68 and below or 121 and above are significant at the 5% level.
Example: To compare systolic blood pressures taken from 9 ‘patients’ and 11 ‘normal’ people.
The null hypothesis is that there is no difference in systolic blood pressure of the ‘patients’ and
‘normal’ people.
‘Patients’ ‘Normal’
132 139
160 107
145 98
114 140
125 115
128 136
154 123
134 129
123 126
110
105
6
Prepared by Dr. Herenia P. Lawrence
To do the test, list all the observations in ascending order:
Tied 7th and 8th and so the ranks are averaged. That is, 123 appears twice (ranks 7 and 8) and is
ranked = (7 + 8)/2 = 7.5. If there were three ties for 123, their ranks would be equal to (7+8+9)/3
= 8.
Look at the table for the critical values for the Wilcoxon rank sum test for the figures
corresponding to 9 and 11 observations. In the smaller (n=9) of the two groups, we would not
expect to see a value below 68 or above 121 on more than 0.05 of occasions. Since 68 < 116.5 <
121, this means that under the null hypothesis of no difference, this difference could have
occurred by chance (or we could have found a significant difference if we had more people in the
sample). So there is no reason to reject the null hypothesis. If rank sum = 132, this difference in
ranks would be extremely unlikely to have occurred by chance.
Mann-Whitney U Test
The Mann-Whitney test is another commonly used alternative to the two independent-samples t
test. It gives identical results to the Wilcoxon rank-sum test. For the actual computation of the
test, you rank the combined data values for the two groups. Then you find the average rank in
each group. The statistic U is more complicated than T (due to Wilcoxon), being calculated as
U = n1n2 + ½n1(n1+1) - T
7
Prepared by Dr. Herenia P. Lawrence
The Kruskal-Wallis Test
The Kruskal-Wallis test is a non-parametric alternative to one-way analysis of variance (One-way
ANOVA). It is computed exactly like the Mann-Whitney U test, except that there are more
groups. The null hypothesis for the test is that all population means are equal. This implies that
the population variances for the groups must be the same.
Example: Consider the relationship between cigarette smoking and education. The table below
shows the average ranks for the number of cigarettes smoked for three groups of people: grammar
school education only, high school only, and some college. You see that the mean ranks are
similar for the three groups. The observed significance level is large (0.78), so you do not reject
the null hypothesis that the distributions are the same for the three groups.
Ranks
Highest Level of
Schooling N Mean Rank
No of Cigarettes per Day Grammar School 32 101.44
in 1958 High School 112 108.57
College 67 103.89
Total 211
Test Statisticsa,b
No of
Cigarettes
per Day in
1958
Chi-Square .505
df 2
Asymp. Sig. .777
a. Kruskal Wallis Test
b. Grouping Variable: Highest Level of Schooling
8
Prepared by Dr. Herenia P. Lawrence
DEN 1015H Review Problems Session 8
1. Twenty subjects each receive two restorations using two different materials. After 3 months,
the cold sensitivity of the two teeth is compared by asking each subject to indicate which tooth
causes more pain when exposed to cold water. Based on observing that 18 of the 20 subjects
reported more sensitivity with a particular one of the materials, a highly significant difference is
declared. This is an example of the use of which statistical test?
2. The following data show the bacteroides levels (possible values: 1, 13, 14, 15, 16, 17) in nine
healthy subjects and fifteen with gingivitis.
Use the Wilcoxon’s rank-sum test to compare the levels of bacteroides of the two groups.
Compare your conclusion with that reached using the two independent-samples t test.
3. The following observations are of anxiety scores recorded on ten patients receiving midazolam
premedication before dental surgery under general anesthesia and no premedication in random
order. The two responses are recorded on the same person, meaning that the person is his/her own
control. Use the Wilcoxon’s signed-rank test to evaluate if there is any difference in the effect of
the premedication on the anxiety scores. Compare your conclusion with that reached using the
matched-pairs t test.
4. Perform a Kruskal-Wallis test on the data from question 4 of the review problems provided to
you during Session 7, in which an anthropologist measured mean upper central incisor width
(mm) in individuals from four different aboriginal groups. Do these data suggest that incisor
width may vary significantly between these groups? How does the p-value from the Kruskal-
Wallis test compare with the p-value from the One-way ANOVA test? I would like you to enter
the data in SPSS (or other statistical package) in order to run the analysis. Then cut and paste
your output into Word, with your conclusion, to hand in at the next class.
9
Prepared by Dr. Herenia P. Lawrence
Enter the data using the following format:
Analyze
Nonparametric Tests
K Independent Samples...
The Tests for Several Independent Samples dialog box will appear.
10
Prepared by Dr. Herenia P. Lawrence
DEN 1015H Solutions to Review Problems Session 8
1. Twenty subjects each receive two restorations using two different materials. After 3 months,
the cold sensitivity of the two teeth is compared by asking each subject to indicate which tooth
causes more pain when exposed to cold water. Based on observing that 18 of the 20 subjects
reported more sensitivity with a particular one of the materials, a highly significant difference is
declared. This is an example of the use of which statistical test?
2. The following data show the bacteroides levels (possible values: 1, 13, 14, 15, 16, 17) in nine
healthy subjects and fifteen with gingivitis.
Use the Wilcoxon’s rank-sum test to compare the levels of bacteroides of the two groups.
Compare your conclusion with that reached using the two independent-samples t test.
Report
Levels of Bacteroides
Std.
Group N Mean Deviation Minimum Maximum
Healthy 9 4.00 5.96 1 15
Gingivitis 15 10.00 6.64 1 17
Total 24 7.75 6.93 1 17
Ranks
Mean Sum of
Group N Rank Ranks
Levels of Bacteroides Healthy 9 9.28 83.50
Gingivitis 15 14.43 216.50
Total 24
11
Prepared by Dr. Herenia P. Lawrence
Test Statisticsb
Levels of
Bacteroid
es
Mann-Whitney U 38.500
Wilcoxon W 83.500
Z -1.869
Asymp. Sig. (2-tailed) .062
Exact Sig. [2*(1-tailed a
.084
Sig.)]
a. Not corrected for ties.
b. Grouping Variable: Group
Combine the data from both groups and arrange the values in increasing order and assign ranks....
Healthy: 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 16.5, 21.5
Gingivitis: 6.5, 6.5, 6.5, 6.5, 6.5, 13, 16.5, 16.5, 16.5, 16.5, 16.5, 21.5, 21.5, 21.5, 24
The rank-sum statistic, T or W = sum of the ranks in group with smaller sample size
So for these data, the rank-sum statistic is W = 83.5. Using the table for the critical ranges for the
Wilcoxon rank sum test, with 9 and 15 observations, the probability that W = 83.5 lies outside the
range 79, 146 is 0.05 (two-sided P value). Since 79 < 83.5 < 146, we do not reject the null
hypothesis at significance level 0.05. Using the independent-samples t test, we would reject the
null hypothesis at significance level 0.05 (see below).
Levels of Bacteroides
Equal variances Equal variances
assumed not assumed
Levene's Test for F 1.325
Equality of Variances Sig. .262
t-test for Equality of t -2.222 -2.286
Means df 22 18.504
Sig. (2-tailed)
.037 .034
Mean Difference
-6.00 -6.00
12
Prepared by Dr. Herenia P. Lawrence
3. The following observations are of anxiety scores recorded on ten patients receiving midazolam
premedication before dental surgery under general anesthesia and no premedication in random
order. The two responses are recorded on the same person, meaning that the person is his/her own
control. Use the Wilcoxon’s signed-rank test to evaluate if there is any difference in the effect of
the premedication on the anxiety scores. Compare your conclusion with that reached using the
matched-pairs t test.
The null hypothesis is that there is no difference in anxiety scores between drug and placebo.
Sum of ranks of negative differences = 6.5 + 8.0 + 6.5 + 5.0 + 2.5 + 9.5 = 38
T = smaller of T+ and T-
Looking at the table for the critical values for the Wilcoxon signed-rank test, the smaller value, 17,
is bigger than any of the values for 10 pairs of observations. There is, therefore, no reason to
reject the null hypothesis of no difference between the drug and the placebo. Similar conclusions
are reached using the paired t test, as shown below:
t=d-0
SE(d)
t= d
s/ n
13
Prepared by Dr. Herenia P. Lawrence
t = -1.3/1.44 = -0.90
Looking at the t table, the value corresponding to 9 (n-1) degrees of freedom and a probability of
0.05 is t(0.05, 9) = 2.26. For a two-tailed test, P > 0.05 (or P = 0.39), so we do not reject the null
hypothesis and conclude that there is not a statistically significant difference in anxiety scores due
to the effect of drug.
SPSS OUTPUT for the Wilcoxon Signed Ranks test and the Paired t test
Ranks
Mean Sum of
N Rank Ranks
Placebo - Drug Negative Ranks 4a 4.25 17.00
Positive Ranks 6b 6.33 38.00
Ties 0c
Total 10
a. Placebo < Drug
b. Placebo > Drug
c. Drug = Placebo
Test Statisticsb
Placebo -
Drug
Z -1.079a
Asymp. Sig. (2-tailed) .281
a. Based on negative ranks.
b. Wilcoxon Signed Ranks Test
T-Test
14
Prepared by Dr. Herenia P. Lawrence
Paired Samples Test
Pair 1
Drug - Placebo
Paired Differences Mean -1.30
Std. Deviation
4.55
Std. Error Mean
1.44
4. Kruskal-Wallis Test
Ranks
Test Statisticsa,b
Mean
WIDTH
GROUP N Rank
Chi-Square 10.295
WIDTH 1 5 10.40
df 3
2 4 6.50
Asymp. Sig. .016
3 5 15.00
4 4 4.50 a. Kruskal Wallis Test
Total 18 b. Grouping Variable: GROUP
There is an overall difference in mean upper central incisor width among the four aboriginal
groups (Kruskal Wallis test, 3 d.f., P=0.016, 2-tailed), which is the same conclusion reached using
a one-way ANOVA model (F=7.289, d.f.1 = 3, d.f.2=14, P=0.004). To find out which groups are
significantly different from the others, you will need to carry out pairwise comparisons using the
Wilcoxon sum rank test or the Mann Whitney U test, since there is not a non-parametric
equivalent to a Multiple Comparison Test (“post-hoc” test). Using the Wilcoxon sum rank tests in
SPSS, it was evident that group 1’s mean incisor width is different from those of groups 3 and 4,
and that group 3’s mean score is different from that of group 4. All the other pairwise
comparisons were not significant at the 5% level.
Suggested readings:
Norman GR, Streiner DL. Biostatistics. The bare essentials (2nd ed.). Hamilton, ON: B.C.
Decker Inc., 2000. Chapters 22 and 23.
Weintraub JA, Douglass CW, Gillings DB. Biostats. Data analysis for dental health care
professionals (2nd ed.). Research Triangle Park, NC: CAVCO Inc., 1985. Chapter 17.
Kim JS, Dailey RJ. Biostatistics for oral healthcare (1st ed.). Ames, IA: Blackwell Pub.
Professional, 2008. Chapter 14.
15
Prepared by Dr. Herenia P. Lawrence