0% found this document useful (0 votes)
6 views

6939

The document provides a comprehensive overview of statistical analysis methods for frequencies, including hypothesis testing for one or more groups using tests like Z-test, Chi-square test, and Fisher Exact Test. It outlines the procedures for comparing proportions, distributions, and associations between variables, along with examples illustrating the application of these tests. Additionally, it includes guidelines for interpreting statistical results and determining significance levels.

Uploaded by

erfanriahi80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

6939

The document provides a comprehensive overview of statistical analysis methods for frequencies, including hypothesis testing for one or more groups using tests like Z-test, Chi-square test, and Fisher Exact Test. It outlines the procedures for comparing proportions, distributions, and associations between variables, along with examples illustrating the application of these tests. Additionally, it includes guidelines for interpreting statistical results and determining significance levels.

Uploaded by

erfanriahi80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

42

Statistical Analysis for Frequencies

When we want to analyse the data ,we have both qualitative and
quantitative data ,but some are use the frequency presentation : the rate
or proportion so we can use the frequency for hypothesis testing.

1. For one group of data


1.1 Comparing proportion of sample with constant value of
proportion; Z – test for proportion or Binomial Test.
Ho : P = Po, P ≥ Po, P ≤ Po
H1 : P ≠ Po , P < Po, P > Po
n  25 , use Binomial Test

Prob.one side = P( X  xmin ) , p  P0


or = P( X  xmax ) , p  P0

From Binomial table ; X i is the number of interesting events.


n > 25 ; Z – test for proportion

p  Po
Z 
Po (1  Po )
n
^

p = sample proportion
Po = constant value of proportion
n = sample size
1.2 Comparing the distribution of sample data with constant proportion
(Goodness of Fit Test) by Chi square Test or Kolmogorov Smirnov Test
(K.S. Test)
Ho : All the data distributed by that hypothesized distribution.
H1 : They are not that hypothesized distribution.
43

The test statistic


r
(Oi  Ei )2
(1) Chi square test; 2   ; df = (r – 1)
i 1 Ei

2
r O i2
or    n ; df = (r – 1)
i 1 E i

Oi = Observed frequencies
Ei = Expected frequencies
r = The number of categories of the variable
Criteria :
1. All Ei  5
2.Let the number of E i < 5 is not more than 20% of all
categories cell.
(2) Kolmogorov Smirnov Test (K.S. Test)
D = Max | S – St |
S = The relative cummulative frequencies of the observed data
St = The relative cummulative probability of the hypothesized

distribution
- By Chi – square Test or K.S. test we have the p-value
- Set  probability
- Conclusion
p-value <  , reject Ho
p-value >  , fail to reject Ho

2. For two and more than two groups of data


2.1 To compare proportions of two independent groups
2.1.1 Z test for proportion
Ho : P1 = P2, P1 ≥ P2, P 1 ≤ P2
H1 : P1 ≠ P2 , P1 < P2, P 1 > P2
44

 
p1  p 2
Z 
1 1
p (1  p )(  )
n1 n2
 
p1 , p 2 = proportion/rate of group 1 and 2
n1 , n2 = sample size of group 1 and 2

2.1.2 Chi square test or Fisher Exact Test


Ho : P1 = P2
H1 : P1 ≠ P2
For 2 x 2 table
Variable Group 1 Group 2 Total
Positive O11 O12 R1
Negative O21 O22 R2
Total c1 c2 n
O  Eij 
2
2 2
  
ij
2
; df = 1
i 1 j 1 Eij

For 2 x 2 table
Variable Group 1 Group 2 Total
Positive a b R1
Negative c d R2
Total c1 c2 n

n(ad  bc)2
2  ; df = 1
R1 . R2 . C1 . C2

If we have some E i < 5, use Fisher exact test;


R1 ! R2 ! C1 ! C2 !
Prob. =
n! a ! b! c ! d !
45

2.2 To compare the distributions of many (c) independent groups.


(Test of Homogeneity) : Chi square test
Ho : There are no difference between the proportions/distributions
of c groups.
H1 : At least two groups are difference.
For r x c table ;
O  Eij 
2
r c
  
ij
2
; df = ( r-1 )( c-1 )
i 1 j 1 Eij

r c Oij 2
or  2   n ; df = ( r-1 )( c-1 )
i 1 j 1 Eij

2.3 To compare the distribution of dependent/related groups


2.3.1 For two groups , Mc Nemar Test or Binomial Test
Control gr. Experiment gr. Total
+ -
+ a b R1
- c d R2
Total c1 c2 n

b + c  25 ; use Binomial Test: Binomial probability ; n =b+c , p = 0.5


p  value  2P( X  X min )

or p  value  2P( X  X max )

b + c > 25 ; use Mc Nemar Test


(| b  c | 1) 2
2  , df = 1
bc
46

2.3.2 Measuring agreement between two methods by Kappa statistic.


The statistic used to measure agreement between two observers on a
binary variable is Kappa (K), defined as the agreement beyond chance
divided by the amount of possible agreement beyond chance.

 Observed  Expected 
K  
 1  Expected 

OE 
K  
 1 E 

O = Observed agreement
E = Expected agreement

Guidelines for interpreting kappa (K) :


0.93-1.00 Excellence agreement
0.81-0.92 Very good agreement
0.61-0.80 Good agreement
0.41-0.60 Fair agreement
0.21-0.40 Slight agreement
0.01-0.20 Poor agreement
≤ 0.00 No agreement

When K is zero, agreement is only at the level expected by chance. When


K is negative, the observed level of agreement is less than we would
expect by chance alone.
47

2.3.3 To compare more than two dependent / related groups


Cochran Q’s Test for Related Observations

Assumption

1. Related samples
2.The data are nominal scale (dichotomous)

Group/Sample
Case 1 2 3 ... k Total
1 x11 x12 x13 ... x1k R1
2 x21 x22 x33 ... x2k R2
3 x31 x32 x33 ... x3k R3
.
.
.
r xr1 xr2 xr3 ... xrk Rr
C1 C2 C3 ... Ck n
Xijk = 0, 1

Ho : There are no difference between the proportions k groups.


H1 : At least two proportions are difference .

(k  1)[k  C 2j  ( C j ) 2 ]
2
The test statistic;   , d.f. = k – 1
k  R i   R i2
48

3. To test association between two variables.


By Chi Square Test or Fisher Exact Test
Ho : There is no association between two variables.
H1 : There is association between two variables.
O  Eij 
2
r c
3.1 r x c Table ;    ; df = (r – 1)(c - 1)
2 ij

i 1 j 1 Eij

For 2 x 2 table
Variable A Variable B Total
Positive Negative
Positive O11 O12 R1
Negative O21 O22 R2
Total c1 c2 n

3.2 For 2 x 2 table ; Chi square corrected for continuity

r c [ | Oij  Eij | .5]2


 2   ; df = 1
i 1 j 1 Eij

Variable A Total
Variable B Negative
Positive
Positive a b R1
Negative c d R2
Total c1 c2 n

For 2 x 2 table ; Chi square corrected for continuity


n  | ad  bc | n / 2
2

 
2
; df = 1
R1 . R2 . C1 . C2
49

3.3 Fisher Exact Test


If we have any Ei < 5, use Fisher exact test

R1 ! R2 ! C1 ! C2 !
Prob. =
n! a ! b! c ! d !
50

Example 1 Suppose that 13 death have occurred among 55-64-year-old


male workers in a nuclear power plant and that the cause of death was
cancer in 5 of them. Based on vital statistics reports, the approximately
20% of all deaths can be attributed to some form of cancer. Is this result
significant? (α = 0.05)
Ho : P = 0.20
H1 : P ≠ 0.20
5
P0 = 0.20, p  = 0.38 , n = 13 ; p > Po
13

n < 25 use Binomial Test; p-value = 2 P( X  xmax )


From Binomial Table ; n = 13 , P = 0.2
p-value = 2P(x  5) = 2(0.0991) = 0.1982
 = 0.05
Fail to reject Ho : P = 0.20

The rate of cancer is not different from 20% with α = 0.05


(p-value = 0.1982)
51

Example 2 Suppose that 130 death have occurred among male and the
cause of death was cancer in 40 of them. Based on vital statistics
reports, the approximately 20% of all deaths can be attributed to some
form of cancer. Is this result significant greater than statistics reports?
(α = 0.05)

Ho : P ≤ 0.20
H1 : P > 0.20
40
p = 0.31
130

n > 25 ; Z test

p  Po .31  .20
Z   3.42
Po (1  Po ) .2(.8)
n 130

p-value < 0.001

at  = 0.05 ; Reject Ho : P ≤ 0.20

The cause of death of cancer is significant greater than 20% with


α = 0.05 .(p-value < 0.001)
52

By Chi square test

Cause of death Oi pi Ei (Oi  Ei ) 2


Ei

Cancer 40 0.2 26 7.538


Not cancer 90 0.8 104 1.885
Total 130 1.0 130 9.423

H 0 : Cancer : Not cancer = 20 : 80

H1 : Cancer : Not cancer ≠ 20 : 80

r
(Oi  Ei )2
2   ; df = (r – 1)
i 1 Ei

 2 = 9.423 ; df = 2-1 = 1

p-value < 0.005

at 𝛂 = 0.05, Reject H 0 : Cancer : Not cancer = 20 : 80

The cause of death of cancer is not 20% significantly with α = 0.05 .


(p-value < 0.005)
53

Example 3 To survey in Bangkok metropolitan, we want to know the


preference of watching the TV chanels for news are equally or not.
We survey 2000 households, interview the prevalence of watching the
four TV chanels for news.(α = 0.05)

TV Chanel Number (Oi  Ei ) 2


Ei
Oi Ei
A 520 500 0.8
B 510 500 0.2
C 490 500 0.2
D 480 500 0.8
Total 2000 2000 2.0

H 0 : No different of the preference between 4 TV chanels

H1 : There are different of the preference between 4 TV chanels.

r
(Oi  Ei )2
2   = 2.0 , df = 3
i 1 Ei

p-value = 0.5799
at  = 0.05, p-value > 
Fail to reject Ho

There are no different between the preference of four TV. Chanels in


Bangkok metropolitan with 95% confidence interval .
(p-value = 0.5799)
54

Example 4 Among the cancer patients, we want to know the


distribution of blood type the same as the normal person or not.
[Among normal person the distribution of O, A, B and AB Type are
44.8%, 39.7%, 11.3% and 4.2% respectively.] (α = 0.05)

Number
Blood Type Oi pi Ei = npi (Oi – Ei)2/Ei
O 43 .448 55.6 2.86
A 66 .397 49.2 5.74
B 11 .113 14.0 0.64
AB 4 .042 5.2 0.28
Total 124 1.000 124.0 9.52

Ho : The distribution O : A : B : AB = 44.8 : 39.7 : 11.3 : 4.2


H1 : The distribution O : A : B : AB ≠ 44.8 : 39.7 : 11.3 : 4.2

 O  Ei  , df = ( r-1 )
2
r
  i
2

i 1 Ei

 2 = 9.52 , d.f. = 4 - 1 = 3

p-value = 0.0237
at  = 0.05 ; Reject Ho :

The distribution of blood type of cancer patients are not the


distribution as normal person significantly at the α = 0.05.
(p-value = 0.0237)
55

Example 5 The screening of the dental problem in school, we random


number of students who have dental problem for 10 students from each
classroom ; from 100 classrooms we count the each classroom by the
table below. Are these data belong to binomial distribution with
p = 0.2 at  = 0.05 ?
Number of student Number of
have dental problem classroom
Oi Pi Ei = npi
0 8 .1074 10.74
1 25 .2684 26.84
2 32 .3020 30.20
3 24 .2013 20.13
4 10 .0881 8.81
5+ 1 .0328 3.28
Total 100 1.0000 100.00

Ho: All the data are binomial distribution with p = 0.20.


H1: All the data are not binomial distribution with p = 0.20.
r O i2
By Chi square test; 2   n , df = r – m - 1
i 1 E i

 82 25 2 12 


 10.74  26.84    3.28   100
2
=
 
2 = 3.4221 , d.f. = 6 – 1 – 1 = 4
p-value = 0.4924
at  = 0.05 , Fail to reject Ho
All the data are binomial distribution with p = 0.20.
56

Example 6 For the number of emergency cases admit at the hospital per
day, Are the data poisson distribution with the average 3 cases per day or
not ?We collected the data 90 days. (α = 0.05)

Emergency Number of day Pi Ei

cases per day


0 5 .050 4.50
1 14 .149 13.41
2 15 .224 20.16
3 23 .224 20.16
4 16 .168 15.12
5 9 .101 9.09
6 3 .050 4.50
7 3 .022 1.98
8 1 8 .008 0.72 7.56
9 1 .003 0.27
10+ 0 .001 0.09
Total 90 1.000 90.00

Ho: All the data are poisson distribution with   3


H1: All the data are not poisson distribution with   3

2
r O i2
By chi square test;    n , df = r – m - 1
i 1 E i

 52 142 82 
2 =    ...    90 = 1.88; df = 7 – 1 – 1 = 5
 4.5 13.41 7.56 

p-value = 0.8618
at  = 0.05; p-value >  , Fail to reject Ho
All the data are poisson distribution with  = 3.
57

Example 7 Are the data normal distribution with the mean 113.3 and
the variance 277.22 at the  = 0.05 ?

Data Oi Pi Ei = npi Zi
< 90 8 .0808 8.08 < -1.4
90 - 100 15 .1311 13.11 -1.4 – (-0.8)
100 - 110 21 .2088 20.88 -0.8 – (-0.2)
110 - 120 23 .2347 23.47 -0.2 – 0.4
120 - 130 16 .1859 18.59 0.4 – 1.0
130 - 140 9 .1039 10.39 1.0 – 1.6
140+ 8 .0548 5.48 1.6+

Total 100 1.0000 100.00

Ho : X ~ N (113.3, 277.22)

H1 : X ≁ N (113.3, 277.22)

2
r O i2
By chi square test;    n , df = r – m - 1
i 1 E i

 82 152 82 

2
=    ...    100 = 1.99, d.f. = 7 – 2 - 1 = 4
 8.08 13.11 5.48 

p-value = 0.7383
at the  = 0.05; Fail to reject Ho

All the data are normal distribution with the mean = 113.3 and
the variance = 277.22.
58

Data Oi F S pi St /S – St /

< 90 8 8 .08 .0808 .0808 .0008


90-100 15 23 .23 .1311 .2119 .0181
100-110 21 44 .44 .2088 .4207 .0193
110-120 23 67 .67 .2347 .6554 .0146
120-130 16 83 .83 .1859 .8413 .0113
130-140 9 92 .92 .1039 .9452 .0252*
140+ 8 100 1.00 .0548 1.0000 .0000

Total 100 1.0000

X = 113.3, S2 = 277.22

Ho : All the data are normal distribution.


H1 : All the data are not normal distribution.

By Kolmogorov Smirnov test;

D = Max/S – St / = 0.0252

p-value > 0.20

at the  = 0.05; p-value >  , Fail to reject Ho

All the data are normal distribution.


59

Example 8 Are the blood sugar for 36 males normal distribution?


(α = 0.05)

Blood fi Fi s Zi Si /S - Si/
sugar
68 2 2 .0556 -2.00 .0228 .0328
72 2 4 .1111 -1.33 .0918 .0193
75 2 6 .1667 -0.83 .2033 .0366
76 2 8 .2222 -0.67 .2517 .0292
77 6 14 .3889 -0.50 .3085 .0804
78 3 17 .4722 -0.33 .3707 .1015
80 6 23 .6389 0.00 .5000 .1389
81 3 26 .7222 0.17 .5675 .1547*
84 2 28 .7778 0.67 .7486 .0292
86 2 30 .8333 1.00 .8413 .0080
87 2 32 .8889 1.17 .8790 .0099
92 4 36 1.0000 2.00 .9772 .0228
Total 36

X = 80, S = 6
Ho : The data are normal distribution.
H1 : They are not normal distribution.

By Kolmogorov Smirnov Test


D = Max | S – St | = 0.1547
p-value > 0.20
At  = 0.05
Fail to reject Ho :
All the blood sugar data are normal distribution.
60

Example 9 A study was conducted to look at the effects of oral


contraceptive (OC) on heart disease in woman 40-44 years of age. It is
found that among 5000 current OC users at baseline, 13 women develop a
myocardial infarction (MI), and for 10,000 non OC users, 7 women
develop MI.. Is there any different of the proportion developed
MI.between the women users and nonusers oral contraceptive
significance of the results. (α = 0.05)


OC users ; n1 = 5000, p1 = 13/5000 = 0.0026

Non-oc ; n2 = 10000, p2 = 7/10,000 = 0.0007
Ho : P 1 = P2
H1 : P 1  P2
13  7 20
p =  = 0.00133
15000 15, 000

 
p1  p 2
Z
1 1
p (1  p )(  )
n1 n2

.0026  .0007
Z =
(.00133)(.99867)(1/ 5000  1/10,000)

Z = 3.02

p-value = 2(0.0013) = 0.0026


At  = 0.05
Reject Ho :

There is a significant difference between MI incidence rate of OC


and non- OC users with 95% confidence interval (p-value = 0.0026).
61

MI Status OC users Non-OC users Total


Yes 13 (6.67) 7 (13.33) 20

No 4987 (4993.33) 9993 (9986.67) 14,980


Total 5000 10,000 15,000

Ho : P 1 = P2
H1 : P1  P2
By chi square teat;
n (ad  bc)2
2  , df = 1
R1 . R2 . C1 . C2

15,000 13(9993)  7(4987) 


2

 
2
, df = 1
20 14,980  5,000 10,000

2 = 9.037 , df = 1
p-value < 0.005
At  = 0.05
Reject Ho :

The proportion of MI of OC users and non-OC users are different


significantly at the α = 0.05 . (p-value < 0.005)
62

Example 10 To compare the proportions of tooth alignment of children


between two methods of milk feeding .
Feeding method Tooth alignment
Good Not good Total

Breast 4 (2.38) 16 (17.62) 20

Bottle 1 (2.62) 21 (19.38) 22

Total 5 37 42

H o : P1  P2

H1 : P1  P2

There are two values of Eij < 5 ; use the Fisher Exact Test
R1 !R2 !C1 !C2 !
prob. =
n!a !b!c!d !

20!22!5!37!
prob. = = 0.1253
42!4!16!1!21!
20!22!5!37!
and prob. = = 0.0182
42!5!15!0!22!
5 15 20 prob.one side = .1253 + .0182 = 0.1435
0 22 22
5 37 42 p-value = 2(0.1435) = 0.2870
at  = 0.05 ; Fail to reject H 0
There are no different of the proportions of tooth alignment of children
between two methods feeding significantly at  = 0.05
( p-value = 0.2870 )
63

All possible probability

a 0 20 20 1 19 20 2 18 20

5 17 22 4 18 22 3 19 22

5 37 42 5 37 42 5 37 42

3 17 20 4 16 20 5 15 20

2 20 22 1 21 22 0 22 22

5 37 42 5 37 42 5 37 42

R1 !R2 !C1 !C2 ! 20!22!5!37!


prob. = =
n!a !b!c!d ! 42!a !b!c!d !

a Probability

0 0.0310

1 0.1720

2 0.3440

3 0.3096

4 0.1253

5 0.0182

Total 1.0000
64

Example 11 To compare two methods treatment of cancer.

Method

Result Radiation No radiation Total

8
Not good 1 (3.2) 7 (4.8)
Good 5 (2.8) 2 (4.2) 7

Total 6 9 15

To compare the proportion good result.


 5
Radiation method; n1  6 , p1   0.833
6
 2
No radiation method; n2  9, p2   0.222
9

H 0 : P1  P2
H1 : P1  P2

R1 ! R2 ! C1 ! C2 !
All Eij < 5 ; Fisher Exact Test; prob. =
n! a ! b! c ! d !

8!7!6!9!
0 8 8 prob. = = 0.0392
15!5! 2!1!7!
8!7!6!9!
6 1 7 prob. = = 0.0014
15!6!1!0!8!
6 9 15
prob. one side = 0.0392 + 0.0014 = 0.0406

p- value = 0.0406

at  = 0.05; Reject H 0

The proportion good result of radiation method is greater than no


radiationmethod significantly at  = 0.05 . ( p-value = 0.0406 )
65

Example 12 To compare the distribution of blood type between chronic


carriers and noncarriers. (α = 0.05)

Blood type Carriers Noncarriers Total


O 72 (69.69) 230 (232.31) 302
A 54 (56.77) 192 (189.23) 246
B 16 (18.23) 63 (60.77) 79
AB 8 (5.31) 15 (17.69) 23
Total 150 500 650

Ho : There are no different between the distribution of blood type


of carriers and noncarriers group.
H1 : There are different distribution.

O  Eij 
2
r c
   ; df = (r – 1)(c - 1)
2 ij

i 1 j 1 Eij

r c Oij 2
or  2   n ; df = ( r-1 )( c-1 )
i 1 j 1 Eij

 722 2302 542 152 


2      ...    650 ; df = (4-1)(2-1)
 69.69 232.31 56.77 17.69 

2 = 2.40516 , df = 3

p-value = 0.4949
At  = 0.05; Fail to reject Ho :

The distribution of blood type of carriers and noncarriers are not


different at  = 0.05 (p-value = 0.4949) .
66

Example 13 To compare the proportion of hypertension in 4 regions of


the country. (α = 0.05)
Hypertension North Northeast Central South Total
Have 20 50 75 15 160
(35.56) (44.44) (53.33) (26.67)
Not 180 200 225 135 740
(164.44) (205.56) (246.67) (123.33)
Total 200 250 300 150 900

Ho : No different between 4 regions


H1 : They are different

O  Eij 
2
r c
   ; df = (r – 1)(c - 1)
2 ij

i 1 j 1 Eij

r c Oij 2
or  2   n ; df = ( r - 1 )( c - 1 )
i 1 j 1 Eij

 202 502 752 1352 


2      ...    900 , df = (2-1)(4-1)
 35.56 44.44 53.33 123.33 

2 = 26.03 ; df = 3
p-value < 0.0005
At  = 0.05 ; Reject Ho :

There are different of hypertension disease between 4 regions


significantly with  = 0.05 (p-value < 0.0005)
67

Example 14 To compare the two methods of skin test reaction of


Dinitrochloro- Benzence (DNCB) and Croton oil in people with cancer.
(α = 0.05)
DNCB
Croton oil + - Total
+ 81 48 129
- 23 21 44
Total 104 69 173

Ho : The rate of change of two tests are equally.


H1 : The rate of change of two tests are not equally.
b + c > 25, Mc Nemar’s test for change
(| b  c |  1)2
2  , df = 1
bc

( | 48  23| 1) 2
2  = 8.1127 , df = 1
48  23

p-value < 0.005


at  = 0.05 , Reject Ho
Two methods of skin test are different result significantly with  = 0.05
( p-value < 0.005 )
To find agreement between two methods of skin test.
OE
By Kappa statistic ; K
1 E
81  21
O  0.5896
173
 129 104   44 69 
E       0.5497
 173 173   173 173 

O  E 0.5896  0.5497
K   0.0886 ; Poor Agreement
1 E 1  0.5497
Two methods of skin test reaction of Dinitrochloro- Benzence (DNCB)
and Croton oil have poor agreement.
68

Example 15 To compare the result of x-ray film from two radiologists


(α = 0.05)
Radiologist B
Radiologist A + - Total
+ 6 2 8
- 4 11 15
Total 10 13 23

Ho : The rate of change of two diagnosis is equally.


H1 : The rate of change of two diagnosis is not equally.

b + c < 25 ; use Binomial Test


By Binomial Prob. at n = 6 , p = 0.5
p-value = 2P(x  2) = 2(0.3438)
p-value = 0.6876
at  = 0.05 , Fail to reject Ho
There are no different diagnosis between two radiologists at the
 = 0.05 (p-value = 0.6876)
To find agreement of diagnosis result from x-ray film of two radiologists.
OE
By Kappa statistic ; K
1 E
6  11 17
O   0.7391
23 23

 8 10   15 13 
E          0.5198
 23 23   23 23 

O  E 0.7391  0.5198
K   0.4567 ; Fair Agreement
1 E 1  0.5198

Two radiologists have fair agreement.


69

Example 16 To interview the housewife for the preference of hospital,


when they have anyone in family get sick and want to see the doctor.
(α = 0.05)

Hospital
Housewife A B C Ri
1 0 0 0 0
2 1 1 0 2
3 0 1 0 1
4 0 0 0 0
5 1 0 0 1
6 1 1 0 2
7 1 1 0 2
8 0 1 0 1
9 1 0 0 1
10 0 0 0 0
11 1 1 1 3
12 1 1 1 3
13 1 1 0 2
14 1 1 0 2
15 1 1 0 2
16 1 1 1 3
17 1 1 0 2
18 1 1 0 2
Cj 13 13 3 29

Preference : yes = 1 ; no = 0
70

C j = 13 + 13 + 3 = 29 , C 2
j = 132 + 132 + 32 = 347

R i = 29 , R i
2
= 02 + 02 + 12 + . . . + 22 + 22 = 63

Ho : There are no difference of the preference proportions between 3


hospitals
H1 : There are difference of the preference proportions between 3
hospitals

(k  1)[k  C 2j  ( C j ) 2 ]
2 d.f. = k – 1
  ,
k  R i   R i2

(3  1) 3(347)  (29) 2 

2
 = 16.7 , d.f. = 2
3(29)  63

p-value < 0.005 ; at  = 0.05 ; Reject Ho

There are difference of the preference proportions between 3 hospitals


significantly with  = 0.05 . ( p-value < 0.005 ).
71

Example 17 To compare the screening for hypothyroid by laboratory


test and three computer screening tests. (α = 0.05)

Computer
Case Lab A B C Ri
1 1 0 0 0 1
2 1 1 1 1 4
3 0 0 0 0 0
4 0 1 1 1 3
5 1 1 1 1 4
6 1 0 0 1 2
7 1 0 1 1 3
8 1 0 0 1 2
9 1 0 0 0 1
10 1 0 0 0 1
11 1 1 1 1 4
Cj 9 4 5 7 25

Hypothyroid test : Have = 1 ; Have not = 0

C j = 25

C 2
j = 92 + 42 + 52 + . . . + 72 = 171

R i = 25

R i
2
= 12 + 42 + 02 + . . . + 42 = 77
72

Ho : There are no difference between 4 screening methods.


H1 : There are difference between 4 screening methods.

(k  1)[k  C 2j  ( C j ) 2 ]
2 d.f. = k – 1
  ,
k  R i   R i2

(4  1) [4(171)  (25) 2 ]

2
= = 7.6957 , d.f. = 3
4(25)  77

p-value = 0.0537

at  = 0.05 ; Not reject Ho

There are no difference between 4 screening methods at  = 0.05


( p-value = 0.0537 ).
73

Example 18 The study the relationship between age at first birth and the
development of breast cancer. (α = 0.05)
Age at first birth
Status < 20 20-24 25-29 30-34  35 Total
Cancer 320 1206 1011 463 220 3220
(416.6) (1348.2) (933.6) (371.9) (149.7)
Normal 1422 4432 2893 1092 406 10,245
(1325.4) (4289.8) (2970.4) (1183.1) (476.3)

Total 1742 5638 3904 1555 626 13,465

Ho : There is no relationship between age at first birth and breast cancer.


H1 : There is relationship between age at first birth and breast cancer.

O  Eij 
2
r c
 2   ; df = (r – 1)(c - 1)
ij

i 1 j 1 Eij

r c Oij 2
or  2   n ; df = ( r - 1 )( c - 1 )
i 1 j 1 Eij

 3202 12062 4062 


 
2
  ...    13, 465 , df = (2-1)(5-1)
 416.6 1348.2 476.3 

2 = 130.33 , df = 4
p-value < 0.001
At  = 0.05; Reject Ho :

There is relationship between age at first birth and the development


of breast cancer at the α = 0.05. (p-value < 0.001).
74

Example 19 To test association between mathematics score and


nutrition status for the student in elementary school.

Nutrition status Mathematics score Total


0-49 50-59 60-69 70-79 80+

First degree 12(11.7) 43(43.2) 23(25.8) 16(18.5) 16(10.9) 110

Second degree 2(0.7) 2(2.7) 1(1.6) 1(1.2) 1(0.7) 7

Normal 44(45.6) 169(168.1) 104(100.6) 75(72.3) 37(42.4) 429

Total 58 214 128 92 54 546

There are Ei < 5; 5 cases among 15 ( 33.3%)

Nutrition Mathematics score


status Total
+
0-49 50-59 60-69 70-79 80

Nonnormal 14(12.4) 45(45.9) 24(27.4) 17(19.7) 17(11.6) 117

Normal 44(45.6) 169(168.1) 104(100.6) 75(72.3) 37(42.4) 429

Total 58 214 128 92 54 546

There are no Eij <5


75

Ho : There are no association between nutrition status and mathematics


score

H1 : There are association between nutrition status and mathematics


score
O ij2
2
    n , d.f. = (r – 1)(c – 1)
E ij

 142 452 37 2 
2 =    ...    546 = 4.4942 ; d.f. = 4
 12.4 45.9 42.4 

p-value = 0.3603

at  = 0.05; Fail to reject Ho

There are no association between nutrition status and mathematics score


at  = 0.05. ( p-value = 0.3603 )
76

Example 20 To find the association between smoking habit and lung


cancer .(α = 0.05)
Lung cancer
Smoking + - Total
Smoker 14 (9.43) 10 (14.57) 24
Nonsmoker 8 (12.57) 24 (19.43) 32
Total 22 34 56

Ho : There is no association between smoking habit and lung cancer.


H1 : There is association between smoking habit and lung cancer.

Two by two table, Chi square corrected for continuity


n [| ad  bc |  n / 2]2
2  , df = 1
R1 . R2 . C1 . C2

2
56  14(24)  8(10)  56 / 2
 
2
, df = 1
24  32  22  34
2 = 5.0675
p-value = 0.024
 = 0.05; Reject Ho :

There is association between smoking habit and lung cancer with


α = 0.05. (p-value = 0.024).
77

Example 21 To find the association between smoking habit and lung


cancer (α = 0.05)
Lung cancer
Smoking + - Total
Smoke 7 2 9
Nonsmoke 4 10 14
Total 11 12 23

Ho : There is no association between smoking habit and lung cancer


H1 : There is association
Two Ei are less than 5 use Fisher exact test
R1 ! R2 !C1 !C2 !
p
n! a ! b! c ! d !

p-value = 0.0265 + 0.0024 + 0.0001 = 0.0287


at  = 0.05, Reject Ho :
There is association between smoking habit and lung cancer with 95%
confidence interval (p-value = 0.02914)
9! 4!1!12!
7 2 9 prob = = 0.0265
23! 7! 2! 4!10!

4 10 14

11 12 23

9! 4! 1!12!
8 1 9 prob = = 0.0024
23!8!1! 3!11!

3 11 14
11 12 23

9! 4!1!12!
9 0 9 prob = = 0.0001
23! 9! 0! 2!12!

2 12 14
11 12 23
78

All possible outcome

0 9 9 1 8 9 2 7 9

11 3 14 10 4 14 9 5 14

11 12 23 11 12 23 11 12 23

3 6 9 4 5 9 5 4 9

8 6 14 7 7 14 6 8 14

11 12 23 11 12 23 11 12 23

6 3 9 7 2 9 8 1 9

5 9 14 4 10 14 3 11 14

11 12 23 11 12 23 11 12 23

9 0 9

2 12 14

11 12 23
79

R1 ! R2 ! C1 ! C2 ! 9!14!11!12!
prob. = prob. =
n! a ! b! c ! d ! 23! a ! b ! c ! d !

a Probability
0 .0003
1 .0067
2 .0533
3 .1866
4 .3198
5 .2798
6 .1244
7 .0266
8 .0024
9 .0001
Total 1.000
80

Example 22 To study the effect of radiation treatment on the cancer


at 𝛂 = 0.05.
Radiation

Result Have Not have Total


8
Good 1 7
Not good 5 2 7

Total 6 9 15

Ho : There are no association between the radiation and result.


H1 : There are association.

All Eij are less than 5, use Fisher Exact Test

R1 ! R2 ! C1 ! C2 !
prob. =
n! a ! b! c ! d !

8!7!6!9!
p.rob = = 0.0392
15!5! 2!1!7!

8!7!6!9!
0 8 8 prob. = = 0.0014
15!6!1!0!8!

6 1 7
p-value = .0392 + .0014 = 0.0406
6 9 15 at α = 0.05, Reject Ho
There are relationship between the radiation treatment and the
result significantly at the  = 0.05 ( p-value = 0.0406 )
81

Analysis for Means and Variances

For quantitative data ; when the data are normal distribution, the
descriptive statistics we can use the mean and the standard deviation
of sample.

To test normality: use Kolmogorov Smirnov Test.


When the data are normal distribution, we have hypothesis testing for
the population statistic (parameter) : Mean (  ) , Variance ( 2 )

Hypothesis testing for the statistic one sample;


(1) Hypothesis Ho :  =  o
H1 :  ≠  o (Two sides test)
or Ho : θ ≥  o, θ ≤ o
H1 : θ <  o, θ >  o (One side test)
θ are mean (μ) or variance (  2 )
(2) Select the statistic test

(3) Compute the statistic test, find the p-value

(4) Set the 

(5) Conclusion

If p-value >  ; Fail to reject H 0


If p-value <  ; Reject H 0
82

1. For one sample


1.1Hypothesis testing for mean with constant mean ( o )
Ho : 𝜇 = 𝜇o, μ ≥ μo, μ ≤ μo
H1 : μ ≠ μo, μ < μo, μ > μo
x  o
The statistic test ; Z = , when  2 known
/ n

x  o
or t = , df = n – 1 , when  2 unknown
s/ n

1.2Hypothesis testing for variance with constant variance (  o2 )


Ho :  2 =  02 ,  2 ≥  02 ,  2 ≤  02
H1 :  2 ≠  02 ,  2 <  02 ,  2 >  02
(n  1) S 2
The statistic test; 2  , df = (n – 1)
 o2

 o = constant mean

 o2 = constant variance

x = sample mean
S2 = sample variance
 = standard deviation of population
S = standard deviation of sample
n = sample size

2. For two samples

2.1Hypothesis testing for variances


H 0 :  12   22 ,  12   22 ,  12   22
H1 :  12   22 ,  12   22 ,  12   22

S12
F = ; S12  S22 ; df = n1 – 1 , n2 – 1
S 22
83

2.2Hypothesis testing for means of two independent samples


H 0 : 1  2 , 1  2 , 1  2
H1 : 1  2 , 1  2 , 1  2
x1  x 2
(1) Z-test ; Z  ; 12 ,  22 known
 2
 2
1
 2
n1 n2

x1  x 2
(2) t-test ; t ;  12 ,  22 unknown
Sp

 (n1  1) s12  (n2  1) s22   1 1 


when  12   22 ; s 2p       ; df = n1 + n2 - 2
 n1  n2  2   n1 n2 

s12 s2
when  12   22 ; S p2   2
n1 n2

2
 S12 S 22 
n  n 
df   1 2 
2
2 2
1  S1 2
1  S 22 
    
n1  1  n1  n2  1  n2 

x1 , x 2 sample mean
S12 , S22 sample variance
n1, n2 sample size
 12 ,  22 population variance
84

2.3Hypothesis testing two dependent samples

H 0 : d  0, d  0, d  0
H1 : d  0, d  0, d  0

d
pair t test; t = , df = n – 1
Sd / n

di = the different for each pair

d = the mean of the different

2
 

n

  di  d 
The variance of the different d i ; Sd2  i1  
n 1

n = the number of pairs


85

EXAMPLE 1 The serum amylase for 16 diabetic patients, with the


mean 110 units / 100 ml and the standard deviation 35 units / 100 ml. Are
they different from the standard normal 120 units / 100 ml. with   0.05 .

Ho :  = 120
H1 :   120

n = 16, x = 110, S = 35
x  o
t-tst; t = , d.f. = n – 1
S/ n

110  120
t = = -1.1429 , d.f. = 15
35 / 16

From t table ; prob.one side = 0.1371

p- value = 2(0.1371) = 0.2742

Let  = 0.05

p-value > 

Fail to reject Ho :  = 120

The serum amylase of diabetic patients are not different from 120
units/100 ml with   0.05 .(p-value = 0.2742)
86

EXAMPLE 2 The average time for bacteria screening by standard


method is 50 second.For the new method of screening, the sample of 12
cases, the average and the standard deviation of screening time are 42
and 11.9 second respectively. Can we make the conclusion that ,the new
method can do faster than the standard method at the  = 0.05?

Ho :   50
H1 :  < 50

x  o
t-test; t = , d.f. = n – 1
S/ n

42  50
t = = - 2.33 , d.f. = 11
11.9 / 12

p-value = 0.0213

Let  = 0.05

p-value < 

Reject Ho :   50

The new method can screen faster than the standard method
significantly with 95% confidence interval. (p-value = 0.0213)
87

EXAMPLE 3 One study examined a sample of 16 subjects with open-


angle glaucoma and unilateral hemifield defects. The age (year) of the
subjects were:

62 62 68 48 51 60 51 57
57 41 62 50 53 34 62 61

1) Can we conclude that the mean age is greater than 50 year at


the α = 0.05 ?
2) Can we conclude that the standard deviation is greater than 6 year
at the α = 0.05 ?
n = 16 , X = 54.94 , S = 8.87 ; S 2  78.6769

Test for mean; Ho :   50


H1 :  > 50
x  o
t-test; t = , d.f. = n – 1
S/ n

54.94  50
t = = 2.228 , d.f. = 15
8.87 / 16

p-value = 0.0219

Let  = 0.05; p-value < 

Reject Ho :   50

The mean age is greater than 50 year significantly with α = 0.05.


(p-value = 0.0219)
88

Test for variance; H o :  2  36

H1 :  2  36

(n  1) S 2
2  , df  n  1
 02

(15  1)(8.87)2
2  ; df  16  1
36
 2  32.782; df  15

p-value = 0.005

at α = 0.05; Reject H o :  2  36

The standard deviation is greater than 6 significantly at the α = 0.05.


(p-value = 0.005)
89

EXAMPLE 4 Are there any different of the level eiosinophil in blood


between male and female of ashma patients? In the study we are
sampling 2 groups .

Gender n x S
Male 13 584 225
Female 16 695 185

1. To compare two variances

Ho :  12   22
H1 :  12   22

s12 (225) 2
F = = = 1.4792 ; d.f. = 12, 15
s22 (185) 2

p-value > 0.10

Let  = 0.05 ; p-value > 

Fail to reject Ho :  12   22 ( Two variances are equally.)

2. To compare two independent means, when  12   22


Ho : 1  2
H1 : 1  2

x1  x 2
t = d.f. = n1 + n2 - 2
Sp
90

 1 1   (n  1) s12  (n2  1) s22 


s 2p      1 
 n1 n2   n1  n2  2 

 1 1  12(225)  15(185) 
2 2
=    
 13 16   13  16  2 

= 5787.9941

Sp = 76.0789

584  695
t = = -1.459 , d.f. = 27
76.0789

prob. one side = 0.0814 , p-value = 0.1628 ,

Let  = 0.05 ; p-value > α

Fail to reject Ho : 1  2

There are not significanly differences of level eiosinophil in


blood between male and female with α = 0.05. ( p-value = 0.1628 )
91

EXAMPLE 5 The purpose of a study was to investigate the nature of


lung destruction in cigarette smokers before the development of marked
emphy- sema. Three lung destructive index measurements were made on
the lungs of lifelong nonsmokers and smokers who died suddenly
outside the hospital of nonrespiratory causes. A larger score indicates
greater lung damage.For one of the indexes the scores yielded by the
lungs of a sample of nine nonsmokers and a sample of 16 smokers and
shown in Table. We wish to know if we may conclude, on the basis of
these data, that smokers, in general, have greater lung damage as
measured by this destructive index than do nonsmokers. (  = 0.05)
Table Lung Destructive Index Scores.

Nonsmokers: 18.1, 6.0, 10.8, 11.0, 7.7, 17.9, 8.5, 13.0,


18.9
Smokers: 16.6, 13.9, 11.3, 26.5, 17.4, 15.3, 15.8, 12.3,
18.6, 12.0, 24.1, 16.5, 21.8, 16.3, 23.4, 18.8
Group n x S
Nonsmokers 9 12.43 4.85
Smokers 16 17.54 4.48

1. To compare two variances


Ho :  12   22
H1 :  12   22
s12 (4.85) 2
F = = = 1.17 ; d.f. = 8, 15
s22 (4.48) 2

p-value > 0.10


At  = 0.05; p-value > 
Not reject Ho :  12   22
They are equal variances.
92

2. To compare two independent means, when  12   22

Ho : S  NS
H1 : S  NS

x1  x 2
t = d.f. = n1 + n2 - 2
Sp

 1 1   (n1  1) s12  (n2  1) s22 


s   
2

n1  n2  2
p
 n1 n2   

1 1  8(4.85) 2  15(4.48) 2 
=      = 3.6875

9 16 9  16  2
 

Sp = 1.9203

17.54  12.43
t = = 2.661, d.f. = 9 + 16 – 2 = 23
1.9203

prob. one side = 0.0074 ; p-value = 0.0074

at  = 0.05; Reject Ho : S  NS

The lung destructive index scores of smoker is greater than


nonsmoker significantly at the  = 0.05 (p-value = 0.0074).
93

EXAMPLE 6 Cortisol level determinations were made on two samples


of women at childbirth.
Group 1 subjects underwent emergency cesarean section following
induced labor.
Group 2 subjects delivered by either cesarean section or the vaginal
Route following spontaneous labor.
The sample sizes, mean cortisol levels, and standard deviations were as
follows:

Sample n X s

1 10 435 65
2 12 645 80

We wish to know if we can conclude, on the basis of these data, that,


on the cortisol level determinations of the subjects delivered by either
cesarean section or the vaginal route following spontaneous labor are
more than subjects underwent emergency cesarean section following
induced labor. (∝ = 0.05)
Hypothesis testing for two variances
Ho :  12   22
H1 :  12   22
s12 80 2
F = = = 1.515 , d.f. = 11, 9
s22 652

p-value > 0.10


At ∝ = 0.05 ; Fail to reject Ho :  12   22

Two groups are equal variances.


94

Hypothesis testing two means when 12   22

Ho : 2  1

H1 : 2  1

x1  x 2
t = d.f. = n1 + n2 - 2
Sp

 1 1   (n1  1) s12  (n2  1) s22 


s   
2

n1  n2  2
p
 n1 n2   

 1 1   9(65)  11(80) 
2 2
S p2       = 993.8958
 10 12   10  12  2 

Sp = 31.526

645  435
t = = 6.661, d.f. = 20
31.526

p-value < 0.0005

at  = 0.05; Reject Ho : 2  1

The average of cortisol level determinations of the subjects delivered by


either cesarean section or the vaginal route following spontaneous labor
are more than subjects underwent emergency cesarean section following
induced labor at the ∝ = 0.05.( p-value < 0.0005 )
95

EXAMPLE 7 For 20 general patients, the average of the day resting in


hospital 7 days with the standard deviation 2 days. For 24 cancer
patients, the average of the time 36 days with the standard deviation 10
days. Did the resting time of cancer patient be longer than general
patient ? (  = 0.05)

Hypothesis testing for two variances


Ho :  12   22
H1 :  12   22
s12 10 2
F = 2 = = 25, d.f. = 23, 19
s2 22

p-value < 0.005


Reject Ho :  12   22

Two variances are unequal.

Hypothesis testing two means when  12   22

Ho : C  G

H1 : C  G

S12 S22 22 102


S p2     = 4.3667
n1 n2 20 24

Sp = 2.0897
96

2
 s12 s22 
  
df =  n1 n2  2
2 2
1  s12  1  s22 
    
n1  1  n1  n2  1  n2 

(4.3667) 2
df = 2 2
2
1 4 1 100 
 
21  20  25  24 
 

= 25.38 ~ 25

x1  x 2
t =
Sp

36  7
t = = 13.8776 , df = 25
2.0897

p-value < 0.005

Let  = 0.05, p-value < 

Reject Ho : C  G

The average resting time of cancer patient is greater than general


patient significantly with  = 0.05 . (p-value < 0.005)
97

EXAMPLE 8 To evaluate the nutrition program for releasing the


cholesterol , by experimental research and measurement two times
(before and after).

cholesterol
Case Before After di d i2

1 201 200 -1 1
2 231 236 5 25
3 221 216 -5 25
4 260 233 -27 729
5 228 224 -4 16
6 237 216 -21 441
7 326 296 -30 900
8 235 195 -40 1600
9 240 207 -33 1089
10 267 247 -20 400
11 284 210 -74 5476
12 201 209 8 64
Total -242 10766

d
d i

242
= -20.17
n 12

1  ( di ) 2  1  (242)2 
s   d1   
2 2
10766
12 
n  1 
d
n  11 

S d2 = 535.06 , Sd = 23.13
98

Ho : d  0
H1 : d  0

d 20.17
t =  = -3.02, df = 11
sd / n 23.13 / 12

p-value = 0.006

 = 0.05

p-value < 

Reject Ho : d  0

This nutrition program can release the cholesterol significantly


with  = 0.05. (p-value = 0.006)
99

EXAMPLE 9 To evaluate the blood pressure before and after oral


contraceptive use , by experimental research. (  = 0.05)

Blood pressure
Case Before After di d i2

1 115 128 13 169


2 112 115 3 9
3 107 106 -1 1
4 119 128 9 81
5 115 122 7 49
6 138 145 7 49
7 126 132 6 36
8 105 109 4 16
9 104 102 -2 4
10 115 117 2 4
Total 48 418

d i 
d 48
 4.8
n 10

1  ( d i ) 2  1   48 
2

s   d1     418    20.8444
2 2

n 1 
d
n  9  10 

Sd = 4.57
100

Ho : d  0
H1 : d  0

d 4.8
t =   3.321 , df = 9
sd / n 4.57 / 10

p-value = 0.0042

At  = 0.05; Reject Ho : d  0

The blood pressure after contraceptive use are increased


significantly with  = 0.05 (p-value = 0.0042)

Estimate d at 90% confidence interval;



d  d  t /2 sd / n  4.8  2.262(4.57) / 10

 4.8  3.269
=1.531,8.069
1.531  d  8.069
101

EXAMPLE 10 The following data come from a study that examines the
efficacy of saliva cotinine as an indicator for exposure to tobacco smoke.
In one part of the study, seven subjects none of whom were heavy
smokers and all of whom abstained from smoking for at least one week
prior to the study were each required to smoke a single cigarette.
Samples of saliva were taken from all individuals 2, 12, 24, and 48 hours
after smoking the cigarette. The cotinine levels at 12 hours and at 24
hours and shown below. Is it believed that, the cotinine level 24 hours
after smoking must be lower than the cotinine level 12 hours after
smoking at 95% confidence interval ?

Cotinine Levels (nmol/I)


Subject After 12 Hours After 24 Hours di

1 73 24 -49
2 58 27 -31
3 67 49 -18
4 93 59 -34
5 33 0 -33
6 18 11 -7
7 147 43 -104
102

Ho : d  0
H1 : d  0

n = 7, d  39.429; Sd  31.395
d 39.429
t =   3.323 , df = 6
sd / n 31.395 / 7

p-value = 0.0084

At  = 0.05; Reject Ho

The cotinine level 24 hours are lower than 12 hours significantly at


 = 0.05. (p-value = 0.0084).

You might also like