6939
6939
When we want to analyse the data ,we have both qualitative and
quantitative data ,but some are use the frequency presentation : the rate
or proportion so we can use the frequency for hypothesis testing.
or = P( X xmax ) , p P0
p = sample proportion
Po = constant value of proportion
n = sample size
1.2 Comparing the distribution of sample data with constant proportion
(Goodness of Fit Test) by Chi square Test or Kolmogorov Smirnov Test
(K.S. Test)
Ho : All the data distributed by that hypothesized distribution.
H1 : They are not that hypothesized distribution.
43
2
r O i2
or n ; df = (r – 1)
i 1 E i
Oi = Observed frequencies
Ei = Expected frequencies
r = The number of categories of the variable
Criteria :
1. All Ei 5
2.Let the number of E i < 5 is not more than 20% of all
categories cell.
(2) Kolmogorov Smirnov Test (K.S. Test)
D = Max | S – St |
S = The relative cummulative frequencies of the observed data
St = The relative cummulative probability of the hypothesized
distribution
- By Chi – square Test or K.S. test we have the p-value
- Set probability
- Conclusion
p-value < , reject Ho
p-value > , fail to reject Ho
p1 p 2
Z
1 1
p (1 p )( )
n1 n2
p1 , p 2 = proportion/rate of group 1 and 2
n1 , n2 = sample size of group 1 and 2
For 2 x 2 table
Variable Group 1 Group 2 Total
Positive a b R1
Negative c d R2
Total c1 c2 n
n(ad bc)2
2 ; df = 1
R1 . R2 . C1 . C2
r c Oij 2
or 2 n ; df = ( r-1 )( c-1 )
i 1 j 1 Eij
Observed Expected
K
1 Expected
OE
K
1 E
O = Observed agreement
E = Expected agreement
Assumption
1. Related samples
2.The data are nominal scale (dichotomous)
Group/Sample
Case 1 2 3 ... k Total
1 x11 x12 x13 ... x1k R1
2 x21 x22 x33 ... x2k R2
3 x31 x32 x33 ... x3k R3
.
.
.
r xr1 xr2 xr3 ... xrk Rr
C1 C2 C3 ... Ck n
Xijk = 0, 1
(k 1)[k C 2j ( C j ) 2 ]
2
The test statistic; , d.f. = k – 1
k R i R i2
48
i 1 j 1 Eij
For 2 x 2 table
Variable A Variable B Total
Positive Negative
Positive O11 O12 R1
Negative O21 O22 R2
Total c1 c2 n
Variable A Total
Variable B Negative
Positive
Positive a b R1
Negative c d R2
Total c1 c2 n
2
; df = 1
R1 . R2 . C1 . C2
49
R1 ! R2 ! C1 ! C2 !
Prob. =
n! a ! b! c ! d !
50
Example 2 Suppose that 130 death have occurred among male and the
cause of death was cancer in 40 of them. Based on vital statistics
reports, the approximately 20% of all deaths can be attributed to some
form of cancer. Is this result significant greater than statistics reports?
(α = 0.05)
Ho : P ≤ 0.20
H1 : P > 0.20
40
p = 0.31
130
n > 25 ; Z test
p Po .31 .20
Z 3.42
Po (1 Po ) .2(.8)
n 130
r
(Oi Ei )2
2 ; df = (r – 1)
i 1 Ei
2 = 9.423 ; df = 2-1 = 1
r
(Oi Ei )2
2 = 2.0 , df = 3
i 1 Ei
p-value = 0.5799
at = 0.05, p-value >
Fail to reject Ho
Number
Blood Type Oi pi Ei = npi (Oi – Ei)2/Ei
O 43 .448 55.6 2.86
A 66 .397 49.2 5.74
B 11 .113 14.0 0.64
AB 4 .042 5.2 0.28
Total 124 1.000 124.0 9.52
O Ei , df = ( r-1 )
2
r
i
2
i 1 Ei
2 = 9.52 , d.f. = 4 - 1 = 3
p-value = 0.0237
at = 0.05 ; Reject Ho :
82 25 2 12
10.74 26.84 3.28 100
2
=
2 = 3.4221 , d.f. = 6 – 1 – 1 = 4
p-value = 0.4924
at = 0.05 , Fail to reject Ho
All the data are binomial distribution with p = 0.20.
56
Example 6 For the number of emergency cases admit at the hospital per
day, Are the data poisson distribution with the average 3 cases per day or
not ?We collected the data 90 days. (α = 0.05)
2
r O i2
By chi square test; n , df = r – m - 1
i 1 E i
52 142 82
2 = ... 90 = 1.88; df = 7 – 1 – 1 = 5
4.5 13.41 7.56
p-value = 0.8618
at = 0.05; p-value > , Fail to reject Ho
All the data are poisson distribution with = 3.
57
Example 7 Are the data normal distribution with the mean 113.3 and
the variance 277.22 at the = 0.05 ?
Data Oi Pi Ei = npi Zi
< 90 8 .0808 8.08 < -1.4
90 - 100 15 .1311 13.11 -1.4 – (-0.8)
100 - 110 21 .2088 20.88 -0.8 – (-0.2)
110 - 120 23 .2347 23.47 -0.2 – 0.4
120 - 130 16 .1859 18.59 0.4 – 1.0
130 - 140 9 .1039 10.39 1.0 – 1.6
140+ 8 .0548 5.48 1.6+
Ho : X ~ N (113.3, 277.22)
H1 : X ≁ N (113.3, 277.22)
2
r O i2
By chi square test; n , df = r – m - 1
i 1 E i
82 152 82
2
= ... 100 = 1.99, d.f. = 7 – 2 - 1 = 4
8.08 13.11 5.48
p-value = 0.7383
at the = 0.05; Fail to reject Ho
All the data are normal distribution with the mean = 113.3 and
the variance = 277.22.
58
Data Oi F S pi St /S – St /
X = 113.3, S2 = 277.22
D = Max/S – St / = 0.0252
Blood fi Fi s Zi Si /S - Si/
sugar
68 2 2 .0556 -2.00 .0228 .0328
72 2 4 .1111 -1.33 .0918 .0193
75 2 6 .1667 -0.83 .2033 .0366
76 2 8 .2222 -0.67 .2517 .0292
77 6 14 .3889 -0.50 .3085 .0804
78 3 17 .4722 -0.33 .3707 .1015
80 6 23 .6389 0.00 .5000 .1389
81 3 26 .7222 0.17 .5675 .1547*
84 2 28 .7778 0.67 .7486 .0292
86 2 30 .8333 1.00 .8413 .0080
87 2 32 .8889 1.17 .8790 .0099
92 4 36 1.0000 2.00 .9772 .0228
Total 36
X = 80, S = 6
Ho : The data are normal distribution.
H1 : They are not normal distribution.
OC users ; n1 = 5000, p1 = 13/5000 = 0.0026
Non-oc ; n2 = 10000, p2 = 7/10,000 = 0.0007
Ho : P 1 = P2
H1 : P 1 P2
13 7 20
p = = 0.00133
15000 15, 000
p1 p 2
Z
1 1
p (1 p )( )
n1 n2
.0026 .0007
Z =
(.00133)(.99867)(1/ 5000 1/10,000)
Z = 3.02
Ho : P 1 = P2
H1 : P1 P2
By chi square teat;
n (ad bc)2
2 , df = 1
R1 . R2 . C1 . C2
2
, df = 1
20 14,980 5,000 10,000
2 = 9.037 , df = 1
p-value < 0.005
At = 0.05
Reject Ho :
Total 5 37 42
H o : P1 P2
H1 : P1 P2
There are two values of Eij < 5 ; use the Fisher Exact Test
R1 !R2 !C1 !C2 !
prob. =
n!a !b!c!d !
20!22!5!37!
prob. = = 0.1253
42!4!16!1!21!
20!22!5!37!
and prob. = = 0.0182
42!5!15!0!22!
5 15 20 prob.one side = .1253 + .0182 = 0.1435
0 22 22
5 37 42 p-value = 2(0.1435) = 0.2870
at = 0.05 ; Fail to reject H 0
There are no different of the proportions of tooth alignment of children
between two methods feeding significantly at = 0.05
( p-value = 0.2870 )
63
a 0 20 20 1 19 20 2 18 20
5 17 22 4 18 22 3 19 22
5 37 42 5 37 42 5 37 42
3 17 20 4 16 20 5 15 20
2 20 22 1 21 22 0 22 22
5 37 42 5 37 42 5 37 42
a Probability
0 0.0310
1 0.1720
2 0.3440
3 0.3096
4 0.1253
5 0.0182
Total 1.0000
64
Method
8
Not good 1 (3.2) 7 (4.8)
Good 5 (2.8) 2 (4.2) 7
Total 6 9 15
H 0 : P1 P2
H1 : P1 P2
R1 ! R2 ! C1 ! C2 !
All Eij < 5 ; Fisher Exact Test; prob. =
n! a ! b! c ! d !
8!7!6!9!
0 8 8 prob. = = 0.0392
15!5! 2!1!7!
8!7!6!9!
6 1 7 prob. = = 0.0014
15!6!1!0!8!
6 9 15
prob. one side = 0.0392 + 0.0014 = 0.0406
p- value = 0.0406
at = 0.05; Reject H 0
O Eij
2
r c
; df = (r – 1)(c - 1)
2 ij
i 1 j 1 Eij
r c Oij 2
or 2 n ; df = ( r-1 )( c-1 )
i 1 j 1 Eij
2 = 2.40516 , df = 3
p-value = 0.4949
At = 0.05; Fail to reject Ho :
O Eij
2
r c
; df = (r – 1)(c - 1)
2 ij
i 1 j 1 Eij
r c Oij 2
or 2 n ; df = ( r - 1 )( c - 1 )
i 1 j 1 Eij
2 = 26.03 ; df = 3
p-value < 0.0005
At = 0.05 ; Reject Ho :
( | 48 23| 1) 2
2 = 8.1127 , df = 1
48 23
O E 0.5896 0.5497
K 0.0886 ; Poor Agreement
1 E 1 0.5497
Two methods of skin test reaction of Dinitrochloro- Benzence (DNCB)
and Croton oil have poor agreement.
68
8 10 15 13
E 0.5198
23 23 23 23
O E 0.7391 0.5198
K 0.4567 ; Fair Agreement
1 E 1 0.5198
Hospital
Housewife A B C Ri
1 0 0 0 0
2 1 1 0 2
3 0 1 0 1
4 0 0 0 0
5 1 0 0 1
6 1 1 0 2
7 1 1 0 2
8 0 1 0 1
9 1 0 0 1
10 0 0 0 0
11 1 1 1 3
12 1 1 1 3
13 1 1 0 2
14 1 1 0 2
15 1 1 0 2
16 1 1 1 3
17 1 1 0 2
18 1 1 0 2
Cj 13 13 3 29
Preference : yes = 1 ; no = 0
70
C j = 13 + 13 + 3 = 29 , C 2
j = 132 + 132 + 32 = 347
R i = 29 , R i
2
= 02 + 02 + 12 + . . . + 22 + 22 = 63
(k 1)[k C 2j ( C j ) 2 ]
2 d.f. = k – 1
,
k R i R i2
(3 1) 3(347) (29) 2
2
= 16.7 , d.f. = 2
3(29) 63
Computer
Case Lab A B C Ri
1 1 0 0 0 1
2 1 1 1 1 4
3 0 0 0 0 0
4 0 1 1 1 3
5 1 1 1 1 4
6 1 0 0 1 2
7 1 0 1 1 3
8 1 0 0 1 2
9 1 0 0 0 1
10 1 0 0 0 1
11 1 1 1 1 4
Cj 9 4 5 7 25
C j = 25
C 2
j = 92 + 42 + 52 + . . . + 72 = 171
R i = 25
R i
2
= 12 + 42 + 02 + . . . + 42 = 77
72
(k 1)[k C 2j ( C j ) 2 ]
2 d.f. = k – 1
,
k R i R i2
(4 1) [4(171) (25) 2 ]
2
= = 7.6957 , d.f. = 3
4(25) 77
p-value = 0.0537
Example 18 The study the relationship between age at first birth and the
development of breast cancer. (α = 0.05)
Age at first birth
Status < 20 20-24 25-29 30-34 35 Total
Cancer 320 1206 1011 463 220 3220
(416.6) (1348.2) (933.6) (371.9) (149.7)
Normal 1422 4432 2893 1092 406 10,245
(1325.4) (4289.8) (2970.4) (1183.1) (476.3)
O Eij
2
r c
2 ; df = (r – 1)(c - 1)
ij
i 1 j 1 Eij
r c Oij 2
or 2 n ; df = ( r - 1 )( c - 1 )
i 1 j 1 Eij
2 = 130.33 , df = 4
p-value < 0.001
At = 0.05; Reject Ho :
142 452 37 2
2 = ... 546 = 4.4942 ; d.f. = 4
12.4 45.9 42.4
p-value = 0.3603
2
56 14(24) 8(10) 56 / 2
2
, df = 1
24 32 22 34
2 = 5.0675
p-value = 0.024
= 0.05; Reject Ho :
4 10 14
11 12 23
9! 4! 1!12!
8 1 9 prob = = 0.0024
23!8!1! 3!11!
3 11 14
11 12 23
9! 4!1!12!
9 0 9 prob = = 0.0001
23! 9! 0! 2!12!
2 12 14
11 12 23
78
0 9 9 1 8 9 2 7 9
11 3 14 10 4 14 9 5 14
11 12 23 11 12 23 11 12 23
3 6 9 4 5 9 5 4 9
8 6 14 7 7 14 6 8 14
11 12 23 11 12 23 11 12 23
6 3 9 7 2 9 8 1 9
5 9 14 4 10 14 3 11 14
11 12 23 11 12 23 11 12 23
9 0 9
2 12 14
11 12 23
79
R1 ! R2 ! C1 ! C2 ! 9!14!11!12!
prob. = prob. =
n! a ! b! c ! d ! 23! a ! b ! c ! d !
a Probability
0 .0003
1 .0067
2 .0533
3 .1866
4 .3198
5 .2798
6 .1244
7 .0266
8 .0024
9 .0001
Total 1.000
80
Total 6 9 15
R1 ! R2 ! C1 ! C2 !
prob. =
n! a ! b! c ! d !
8!7!6!9!
p.rob = = 0.0392
15!5! 2!1!7!
8!7!6!9!
0 8 8 prob. = = 0.0014
15!6!1!0!8!
6 1 7
p-value = .0392 + .0014 = 0.0406
6 9 15 at α = 0.05, Reject Ho
There are relationship between the radiation treatment and the
result significantly at the = 0.05 ( p-value = 0.0406 )
81
For quantitative data ; when the data are normal distribution, the
descriptive statistics we can use the mean and the standard deviation
of sample.
(5) Conclusion
x o
or t = , df = n – 1 , when 2 unknown
s/ n
o = constant mean
o2 = constant variance
x = sample mean
S2 = sample variance
= standard deviation of population
S = standard deviation of sample
n = sample size
S12
F = ; S12 S22 ; df = n1 – 1 , n2 – 1
S 22
83
x1 x 2
(2) t-test ; t ; 12 , 22 unknown
Sp
s12 s2
when 12 22 ; S p2 2
n1 n2
2
S12 S 22
n n
df 1 2
2
2 2
1 S1 2
1 S 22
n1 1 n1 n2 1 n2
x1 , x 2 sample mean
S12 , S22 sample variance
n1, n2 sample size
12 , 22 population variance
84
H 0 : d 0, d 0, d 0
H1 : d 0, d 0, d 0
d
pair t test; t = , df = n – 1
Sd / n
2
n
di d
The variance of the different d i ; Sd2 i1
n 1
Ho : = 120
H1 : 120
n = 16, x = 110, S = 35
x o
t-tst; t = , d.f. = n – 1
S/ n
110 120
t = = -1.1429 , d.f. = 15
35 / 16
Let = 0.05
p-value >
The serum amylase of diabetic patients are not different from 120
units/100 ml with 0.05 .(p-value = 0.2742)
86
Ho : 50
H1 : < 50
x o
t-test; t = , d.f. = n – 1
S/ n
42 50
t = = - 2.33 , d.f. = 11
11.9 / 12
p-value = 0.0213
Let = 0.05
p-value <
Reject Ho : 50
The new method can screen faster than the standard method
significantly with 95% confidence interval. (p-value = 0.0213)
87
62 62 68 48 51 60 51 57
57 41 62 50 53 34 62 61
54.94 50
t = = 2.228 , d.f. = 15
8.87 / 16
p-value = 0.0219
Reject Ho : 50
H1 : 2 36
(n 1) S 2
2 , df n 1
02
(15 1)(8.87)2
2 ; df 16 1
36
2 32.782; df 15
p-value = 0.005
at α = 0.05; Reject H o : 2 36
Gender n x S
Male 13 584 225
Female 16 695 185
Ho : 12 22
H1 : 12 22
s12 (225) 2
F = = = 1.4792 ; d.f. = 12, 15
s22 (185) 2
x1 x 2
t = d.f. = n1 + n2 - 2
Sp
90
1 1 12(225) 15(185)
2 2
=
13 16 13 16 2
= 5787.9941
Sp = 76.0789
584 695
t = = -1.459 , d.f. = 27
76.0789
Fail to reject Ho : 1 2
Ho : S NS
H1 : S NS
x1 x 2
t = d.f. = n1 + n2 - 2
Sp
1 1 8(4.85) 2 15(4.48) 2
= = 3.6875
9 16 9 16 2
Sp = 1.9203
17.54 12.43
t = = 2.661, d.f. = 9 + 16 – 2 = 23
1.9203
Sample n X s
1 10 435 65
2 12 645 80
Ho : 2 1
H1 : 2 1
x1 x 2
t = d.f. = n1 + n2 - 2
Sp
1 1 9(65) 11(80)
2 2
S p2 = 993.8958
10 12 10 12 2
Sp = 31.526
645 435
t = = 6.661, d.f. = 20
31.526
at = 0.05; Reject Ho : 2 1
Ho : C G
H1 : C G
Sp = 2.0897
96
2
s12 s22
df = n1 n2 2
2 2
1 s12 1 s22
n1 1 n1 n2 1 n2
(4.3667) 2
df = 2 2
2
1 4 1 100
21 20 25 24
= 25.38 ~ 25
x1 x 2
t =
Sp
36 7
t = = 13.8776 , df = 25
2.0897
Reject Ho : C G
cholesterol
Case Before After di d i2
1 201 200 -1 1
2 231 236 5 25
3 221 216 -5 25
4 260 233 -27 729
5 228 224 -4 16
6 237 216 -21 441
7 326 296 -30 900
8 235 195 -40 1600
9 240 207 -33 1089
10 267 247 -20 400
11 284 210 -74 5476
12 201 209 8 64
Total -242 10766
d
d i
242
= -20.17
n 12
1 ( di ) 2 1 (242)2
s d1
2 2
10766
12
n 1
d
n 11
S d2 = 535.06 , Sd = 23.13
98
Ho : d 0
H1 : d 0
d 20.17
t = = -3.02, df = 11
sd / n 23.13 / 12
p-value = 0.006
= 0.05
p-value <
Reject Ho : d 0
Blood pressure
Case Before After di d i2
d i
d 48
4.8
n 10
1 ( d i ) 2 1 48
2
s d1 418 20.8444
2 2
n 1
d
n 9 10
Sd = 4.57
100
Ho : d 0
H1 : d 0
d 4.8
t = 3.321 , df = 9
sd / n 4.57 / 10
p-value = 0.0042
At = 0.05; Reject Ho : d 0
4.8 3.269
=1.531,8.069
1.531 d 8.069
101
EXAMPLE 10 The following data come from a study that examines the
efficacy of saliva cotinine as an indicator for exposure to tobacco smoke.
In one part of the study, seven subjects none of whom were heavy
smokers and all of whom abstained from smoking for at least one week
prior to the study were each required to smoke a single cigarette.
Samples of saliva were taken from all individuals 2, 12, 24, and 48 hours
after smoking the cigarette. The cotinine levels at 12 hours and at 24
hours and shown below. Is it believed that, the cotinine level 24 hours
after smoking must be lower than the cotinine level 12 hours after
smoking at 95% confidence interval ?
1 73 24 -49
2 58 27 -31
3 67 49 -18
4 93 59 -34
5 33 0 -33
6 18 11 -7
7 147 43 -104
102
Ho : d 0
H1 : d 0
n = 7, d 39.429; Sd 31.395
d 39.429
t = 3.323 , df = 6
sd / n 31.395 / 7
p-value = 0.0084
At = 0.05; Reject Ho