0% found this document useful (0 votes)

39 views

Types of Data, Descriptive Statistics, and Statistical Tests For Nominal Data

The document discusses nonparametric statistics and statistical tests for nominal data. It defines key terms like parametric vs nonparametric statistics and dependent and independent variables. It then covers various nonparametric tests like the Mann-Whitney U test, Wilcoxon signed-rank test, Kruskal-Wallis test, chi-squared test, Fisher's exact test and McNemar's test. It provides examples of when to use each test and how to interpret the results. In particular, it demonstrates how to use the chi-squared test to analyze a contingency table examining the association between tryptophan supplements and eosinophilia-myalgia syndrome.

Uploaded by

Wong Wei Hong

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

Types of Data, Descriptive Statistics, and Statistical Tests For Nominal Data

Uploaded by

Wong Wei Hong

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Types of Data, Descriptive Statistics, and

Statistical Tests for Nominal Data

Patrick F. Smith, Pharm.D.

University at Buffalo
Buffalo, New York

~..
1

\
NONPARAMETRIC

STATISTICS

DEFINITIONS
A. Parametric statistics
1. Variable of interest is a measured quantity.
2. Assumes that the data follow some distribution which can be described by specific parameters
a. Typically a normal distribution
3. Example: There are an infinite number of normal distributions, all which can be uniquely
defined by a mean and standard deviation (SD).
B. Nonparametric statistics
1. Variable of interest is not measured quantity. Mean and SD have little meaning.
2. Does not make any assumptions about the distribution of the data
3. "Distribution-free" statistics
C. Dependent variable
1. The variable of interest, the outcome of which is dependent on something else
D. Independent variable
1. The variable that is being tested for an effect on the dependent variable
E. Example
1. Does high-dose ciprofloxacin lead to seizures?
a. Seizures = dependent variable
b. Dose = independent variable

II.

PARAMETRIC STATISTICS
A. Developed primarily to deal with categorical data (non-continuous data)
1. Example: disease vs no disease; dead vs alive
B. Nonparametric statistical tests may be used on continuous data sets.
1. Removes the requirement to assume a normal distribution
2. However, it also throws out some information, as continuous data contains information in the
way that variables are related.

Some Commonly Used Statistical Tests

Normal theory-based tests
t test for independent samples
Paired t test
Pearson correlation coefficient
One-way analysis of
variance (F test)
Two-way analysis of vanance

Corresponding
nonparametric tests
Mann-Whitney U test;
Wilcoxon rank sum test
Wilcoxon matched pairs signed.,
rank test
Spearman rank correlation
coefficient
Kruskal-Wallis analysis of
variance by ranks
Friedman two-way analysis
of variance

Purpose of test.
Compares two independent
samples
Examines a set of differences
Assesses the linear association
between two variables
Compares three or more
groups
Compares groups classified by
two different factors

---- \

III.

NONP ARAMETRIC PROS AND CONS

A. Nonparametric pros
1. Nonparametric tests make less stringent demands ofthe data.
a. For a parametric test to be valid, certain underlying assumptions must be met.
i. example: For a paired t test, assume that: data are drawn ITomnormal distribution;
every observation is independent of each other, and the SDs of the two populations are
equal. Data are continuous.
b. Nonparametric tests do not require these assumptions.
i. can be used to evaluate data that are not continuous
ii. no assumptions about distributions, independence, etc.
B. Nonparametric cons
1. If using for a continuous data set, nonparametric tests throw information inherent in
continuous data.
2. Reduces power to detect a statistical difference
a. A more conservative approach
3. Example: For data IToma normally distributed population, if the Wilcoxon signed-rank test
requires 1000 observations to demonstrate statistical significance, a t test will only
require 955.
IV.

CONTINGENCY

TABLES

A. Contingency tables are used to examine the relationship between subjects' scores on two qualitative
or categorical variables.
B. One variable determines the row categories; the other variable defines the column categories.
C. Example: In studying the association between smoking and disease, the row categories in the
figure below denote the categories of smoking status while the columns denote the presence or
absence of disease.

Smoke

Yes
No

A
Disease
Yes
No
13
37
6
144

B
Disease
Yes
No
26% 74%
4% 96%

100%
100%

cm-SQUARED TEST
A. Commonly used procedure, uses contingency tables
B. Used to evaluate unpaired samples (unrelated groups)
C. Often used to evaluate proportions
D. Is there a difference in the proportion of viral infections in patients administered a
vaccine? (12/100 vs. 2/100)
E. Assumes nominal data (no ordering between variable groups)

j
F. Limited when the numbers of subjects in any "cell" is low (rule of thumb, <5)
G. Generallogic
1. Given two groups (vaccine vs control), the EXPECTED infection rate if the vaccine has no
effect would be equal among the two groups. This is the null hypothesis. The chi-squared test
compares the EXPECTED frequency of a particular event to the OBSERVED frequency in the
population of interest.
H. Formulas

= L (0-E)2
E

with df= (r -l)(c -1)

ExpectedFrequencies(E) for eachcell:

. . Ti X T
E1J
=
N J

Distribution

18
16
14
12
10
08
06
04
02
0
0

Chi-Square distribution

Chi-squared, by strict definition, is not a true nonparametric test. It assumes a

distribution that can be described by a single parameter, degrees of freedom.
J.

Chi-squared example problems (refer to Example Problem handout)

~
J.

Chi-squared example problems (refer to Example Problem handout)

FISHER'S EXACT TEST

VI.

A. Alternative to chi-squared for 2 x 2 contingency tables

1. Improves accuracy when expected frequencies are small 5) or sample size is small (n=20)
2. Calculates exact probabilities

b
d
(b+d)

a
c
(a + c)

(a+b)!
p(outcome)=

VII.

(a +b)
(c + d)
N

(c+d)! (a+c)! (b+d)!

N! a! b! c! d!

MCNEMAR'S TEST OF SYMMETRY

A. Chi-squared test requires samples to be independent of each other.

B. McNemar's test is used when samples are related (similar to paired t test).
C. There.are often times where measures may be repeated.

D. Example. Does drug X cause insomnia?

1. Patients may be questioned about insomnia before and after starting the drug.
2. The researcher asks the question, "Do more patients have insomnia since starting the drug?"
E. Refer to Example Problems handout
VIII.

KRUSKAL-W ALLIS TEST

A. Compares two independent samples

B. Values of a variable are transformed to ranks.
1. Tests that there is no shift in the center of the groups (that is, the centers do not differ)
C. If there are only two groups, the procedure reduces to the Mann-Whitney test-the analogue of the
unpaired t test.

IX.

WILCOXON SIGNED-RANK TEST

A. Nonparametric analogue of the paired t test
B. Compares the rank values of variables pair-by-pair
1. The sum of the ranks associated with positive and negative differences is computed.
2. The test statistic is the lesser of the two sums of ranks.
C. Refer to Example Problems handout

=:;

J.
VI.

Chi-squared example problems (refer to Example Problem handout)

FISHER'S EXACT TEST'

A. Alternative to chi-squared for 2 x 2 contingency tables
1. Improves accuracy when expected frequencies are small 5) or sample size is small (n=20)
2. Calculates exact probabilities

a
c
(a + c)

b
d
(b + d)

(a+b)!

p(outcome)

VII.

(a +b)
(c + d)
N

(c+d)! (a+c)! (b+d)!

N! a! b! c! d!

MCNEMAR'S TEST OF SYMMETRY

A. Chi-squared test requires samples to be independent of each other.

B. McNemar's test is used when samples are related (similar to paired t test).
C. There' are often times where measures may be repeated.

D. Example. Does drug X cause insomnia?

KRUSKAL-WALLIS TEST

A. Compares two independent samples

WILCOXON SIGNED-RANK TEST

:::;-

X. SPEARMAN RANK CORRELATION COEFFICIENT

A. Nonparametric analogue oflinear regression and the correlation coefficient

Nonparametric analogue oflinear regression

and the correlation coefficient (r)

=1- 6L:d2
n 3 -n

d = difference of ranks at each point

B.
Height
31
32
33
34
35
35
Rs = 6(-e+

Rank
1
2
3
4
5.5
5.5

Weight
7.7
8.3
7.6
9.1
9.6
9.9

Rank
2
3
1
4
5
6

d
-1
-1
2
0
0.5
-0.5

-12+ 22+ 0 + 0.52+- 0.52)/63 - 6) = 0.81

For statistical significance, can look up critical values from table or obtain from software
package.

-s:

.-=
rt
Example Problem 1: Association between tryptophan dietary supplements and eosinophiliamyalgia syndrome (EMS). A number of subjects from a particular area are evaluated; 80
patients with EMS were identified, along with 200 matched controls. Is there a statistically
significant association between tryptophan use and EMS?

Unrelated groups, categorical (yes/no) data - chi-squared is appropriate

Observed

Results:
EMS

42
38
80

Yes
I

Tryptophan use

Total

No EMS

34
166
200

Total

76
204
280

(42 of76 patients taking tryptophan had EMS, compared to 38 of 204 not taking tryptophan)
Expected values if no association exists (null hypothesis):

Yes
No

Tryptophan use

Total

EMS
21.7
58.3
80

No EMS
54.3
145.7
200

Total
76
204
280

The rate of EMS in the overall population, assuming no effect, would be 80/280 (28.6%).
(.286*76 = 21.7; .286x204 = 58.3). The No EMS cells can then be calculated from subtracting
the total (ex: 76 - 21.7 = 54.3).
E 11-- 76x80
280

E21

= 204x80
280

E 12 -- 76x200
280

E22

= 204x200
280

To evaluate significance,one needs a mean and measu:eof dispersion(ex. - standard deviation,

standard error, variance, etc.). The chi-squared test is based on a Poisson distribution, where
mean = variance); therefore,the chi-squaredtest assumes that the variance is equal to the expected
mean value.

x2
X2

= I, (0-E)2
E

Therefore, in this example:

= (42/21.7i/21.7 + (34-54.3i/54.3 + (38-58.3i/58.3 + (166-145.7)2/145;7= 36.4

-7 Look up the result in a chi-squared table (a 2 x 2 contingency table has 1 degree of

freedom). To be significant at the 0.05 level, X2must be > 3.84. Since 36.4 3.84, the
result is highly significant.

- (

Critical Values for the Chi-Squared

df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

0.10
2.7055
4.6052
6.2514
7.7794
9.2363
10.6446
12.017
13.3616
14.6837
15.9872
17.275
18.5493
19.8119
21.0641
22.3071
23.5418
24.769
25.9894
27.2036
28.412
29.6151
30.8133
32.0069
33.1962
34.3816
35.5632
36.7412
37.9159
39.0875
40.256
41. 4217
42.5847
43.7452
44.9032
46.0588
47.2122
48.3634
49.5126
50.6598
51.805
52.9485
54.0902
55.2302
56.3685
57.5053
58.6405
59.7743
60.9066
62.0375
63.1671

Significance Level
0.05
0.025
5.0239
3.8415
7.3778
5.9915
7.8147
9.3484
9.4877
11.1433
11.0705
12.8325
12.5916
14.4494
16.0128
14.0671
15.5073
17.5345
16.919
19.0228
18.307
20.4832
19.6752
21.92
21.0261
23.3367
24.7356
22.362
26.1189
23.6848
24.9958
27.4884
28.8453
26.2962
27.5871
30.191
28.8693
31.5264
32.8523
30.1435
31.4104
34.1696
32.6706
35.4789
36.7807
33.9245
35.1725
38.0756
36.415
39.3641
37.6525
40.6465
41.9231
38.8851
43.1945
40.1133
41.3372
44.4608
45.7223
42.5569
43.773
46.9792
44.9853
48.2319
46.1942
49.4804
50.7251
47.3999
48.6024
51.966
49.8018
53.2033
54.4373
50.9985
52.1923
55.668
53.3835
56.8955
54.5722
58.1201
59.3417
55.7585
56.9424
60.5606
58.124
61.7767
59.3035
62.9903
60.4809
64.2014
61.6562
65.4101
66.6165
62.8296
64.0011
67.8206
69.0226
65.1708
66.3387
70.2224
67.5048
71. 4202

Distribution
0.01
6.6349
9.2104
11.3449
13 .2767
15.0863
16.8119
18.4753
20.0902
21.666
23.2093
24.725
26.217
27.6882
29.1412
30.578
31.9999
33.4087
34.8052
36.1908
37.5663
38.9322
40.2894
41.6383
42.9798
44.314
45.6416
46.9628
48.2782
49.5878
50.8922
52.1914
53.4857
54.7754
56.0609
57.342
58.6192
59.8926
61.162
62.4281
63.6908
64.95
66.2063
67.4593
68.7096
69.9569
71.2015
72.4432
73.6826
74.9194
76.1538

0.005
7.8794
10.5965
12.8381
14.8602
16.7496
18.5475
20.2777
21.9549
23.5893
25.1881
26.7569
28.2997
29.8193
31.3194
32.8015
34.2671
35.7184
37.1564
38.5821
39.9969
41.4009
42.7957
44.1814
45.5584
46.928
48.2898
49.645
50.9936
52.3355
53.6719
55.0025
56.328
57.6483
58.9637
60.2746
61.5811
62.8832
64.1812
65.4753
66.766
68.0526
69.336
70.6157
71.8923
73.166
74.4367
75.7039
76.9689
78.2306
79.4898

Eample Problem 2:
A sociological study evaluated the characteristics of marriage by religion; 256 people were
surveyed for religion and marital status. The results were as follows:

Protestant

Never
Married
Divorced

Separated
Total

Jewish
8
11
3
1
23

Catholic

29
75
21
8
133

16
21
6
3
46

None
20
19
13
0
52

Other
0
1
0
1
2

Total
73
127
43
13
256

Is there a relationship between marital status and religion?

SYSTAT

WARNING:

chi-squared output

More than one-fifth

Significance

tests

of fitted cells
computed

Test statistic
Pearson

this

are sparse
table

Value

chi-squared

22.718

are

(frequency

5).

suspect.

df
12.000

Prob
0.030

What happened??

Omitting sparse cells: Leave out 'other' and 'separated':

Protestant

Catholic

29
75
21
125

Never
Married
Divorced

Total

Test statistic
Pearson

chi-sguared

16
21
6
43

Value
10.368

Jewish
8
11
3
22

df
6.000

None
20
19
13
52

Total
73
126
43
242

prob
0.110

There is no statistically significant difference between the groups (p=O.11)

Example Problem 3: McNemar Test of Symmetry

In November of 1993, the U.S. Congress approved the North American Free Trade Agreement
(NAFTA). Let's say that two months before the approval and before the televised debate
between Vice President Al Gore and businessman Ross Perot, political pollsters queried a sample
of 350 people, asking "Are you for, unsure, or against NAFTA?" Immediately after the debate,
the pollsters contacted the same people and asked the question a second time. Here are the
results:

BEFORE$

(rows)

AFTER$

for
51
46
52
149

for
unsure
against
Total

Percents
BEFORE$

total

(rows)

for
unsure
against
Total
N

(columns)

unsure
22
18
49
89

Total
101
91
158
350

against
28
27
57
112

count
by

AFTER$

(columns)

for
14.571
13.143
14.857
42.571
149

unsure
6.286
5.143
14.000
25.429
89

AFTER
against
8.000
7.714
16.286
32.000
112

Test statistic
McNemar

Pearson
Symmetry

chi-squared
chi-squared

Value
11.473
22.039

N
101
91
158

Total
28.857
26.000
45.143
100.000

350

Prob
4.000
3.000

0.022
0.000

The McNemar test of symmetry focuses on the counts in the off-diagonalcells (those along the
diagonal are not used in the computations). We are investigating the direction of change in
opinion. First, how many respondentsbecame more negative aboutNAFTA?
Among those who initially responded For, 22 (6.29%) are now Unsure and 28 (8%) are now
Against. Among those who were Unsure before the debate, 27 (7.71%) answered Against
afterwards. The three cells in the upper right contain counts for those who became more
unfavorable and comprise 22% (6.29 + 8.00 + 7.71) of the sample. The three cells in the lower
left contain counts for people who became more positive about NAFTA (46, 52, and 49) or 42%
of the sample.
The null hypothesis for the McNemar test is that the changes in opinion are equal. The chisquared statistic for this test is 22.039 with 3 df and p<0.0005. You reject the null hypothesis.
The pro-NAFTA shift in opinion is significantly greater than the anti-NAFTA shift.

-r

Example Problem 4: Wilcoxon Signed-Rank Test

Evaluate the effect of a diuretic in healthy volunteers:

Subject

No drug

1
2
3
4
5
6

1600
1850
1300
1500
1400
1010

Daily UOP
+ Drug
1490
1300
1400
1410
1350
1000

Difference
-110
-550
+100
-90
-50
-10

Rank of
difference
5
6
4
3
2
1

Signedrank
of difference
-5
-6
+4
-3
-2
-1

W = sum of signed ranks = -13

If the drug has no effect, the ranks associated with a positive change should be similar to the
ranks associated with a negative change; hence, the sum (W) should = O.
How large must W be to call this a statistically significant difference? Refer to Critical Values
table:
N
5
6
7
8
9
10
11
12
13
14
15

Critical Value
15
21
19
28
24
32
28
39
33
45
39
52
44
58
50
65
57
73
63
80
70

P
.062
.032
.062
.016
.046
.024
.054
.020
.054
.02
.048
.018
.054
.02
.052
.022
.048
.02
.05
.022
.048

*Due to the nature of discrete possible values ofW, p values at traditional breakpoints are usually
not possible (ex.: p=0.05).

Introduction To Inferential Statistics & Important Statistical Tests
100% (1)
Introduction To Inferential Statistics & Important Statistical Tests
55 pages
FINALSTATEMCQ
No ratings yet
FINALSTATEMCQ
40 pages
Choice of Statistical Method Flow Diagram
No ratings yet
Choice of Statistical Method Flow Diagram
1 page
Types of Data, Descriptive Statistics, and Statistical Tests For Nominal Data
No ratings yet
Types of Data, Descriptive Statistics, and Statistical Tests For Nominal Data
13 pages
Nonparametric Test: DR - Dr. Siswanto, MSC
No ratings yet
Nonparametric Test: DR - Dr. Siswanto, MSC
44 pages
Parametric & Non-Parametric Tests
No ratings yet
Parametric & Non-Parametric Tests
34 pages
Parametric & Non-Parametric Tests
No ratings yet
Parametric & Non-Parametric Tests
34 pages
Dr. SK Ahammad Basha Non Parametric Tests 1
No ratings yet
Dr. SK Ahammad Basha Non Parametric Tests 1
37 pages
Non Parametric Tests
100% (1)
Non Parametric Tests
49 pages
pogi dem
No ratings yet
pogi dem
9 pages
Last Meeting Incomplete
No ratings yet
Last Meeting Incomplete
6 pages
Non Parametric Test: Business Research Methods
No ratings yet
Non Parametric Test: Business Research Methods
26 pages
Chapter 4. Non-Parametric Test: Second Semester 2019 - 2020
No ratings yet
Chapter 4. Non-Parametric Test: Second Semester 2019 - 2020
7 pages
Statistics For Health Research: Non-Parametric Methods
No ratings yet
Statistics For Health Research: Non-Parametric Methods
56 pages
Choosing the Right Statistical Test
No ratings yet
Choosing the Right Statistical Test
10 pages
Parametric and Nonparametric Test
No ratings yet
Parametric and Nonparametric Test
24 pages
Non Parametric Tests
No ratings yet
Non Parametric Tests
37 pages
Chi-Square & Non-Para
No ratings yet
Chi-Square & Non-Para
2 pages
Non Parametric Statistics
No ratings yet
Non Parametric Statistics
96 pages
Articulo Sobre Estadística
No ratings yet
Articulo Sobre Estadística
7 pages
SPSS 4
No ratings yet
SPSS 4
22 pages
SM 38
No ratings yet
SM 38
58 pages
Categorical and Discrete Data Non Parametric Tests
No ratings yet
Categorical and Discrete Data Non Parametric Tests
162 pages
Analysis Tests
No ratings yet
Analysis Tests
49 pages
An Overview of Parametric and NonParametric Tests With Anno
No ratings yet
An Overview of Parametric and NonParametric Tests With Anno
28 pages
UNIT 8Research
No ratings yet
UNIT 8Research
18 pages
English Papers
No ratings yet
English Papers
6 pages
Analysis Tests
No ratings yet
Analysis Tests
50 pages
Inferential Analyses
No ratings yet
Inferential Analyses
49 pages
Different Statistical Tests With Uses and Examples
No ratings yet
Different Statistical Tests With Uses and Examples
7 pages
Nonparametrictest 140723051620 Phpapp02 PDF
No ratings yet
Nonparametrictest 140723051620 Phpapp02 PDF
51 pages
Nonparametric Test
No ratings yet
Nonparametric Test
29 pages
Sign Mann Wilcoxon Kruskal - PPT - Compatibility Mode
No ratings yet
Sign Mann Wilcoxon Kruskal - PPT - Compatibility Mode
28 pages
Choosing a test
No ratings yet
Choosing a test
19 pages
DS-2, Week 4 - Lectures
No ratings yet
DS-2, Week 4 - Lectures
4 pages
Biostatistics: Descriptive Analysis
No ratings yet
Biostatistics: Descriptive Analysis
19 pages
5 How To Analyze Your Data: Nonparametric Parametric
No ratings yet
5 How To Analyze Your Data: Nonparametric Parametric
18 pages
Non-Parametric-G4 20240222 114636 0000
No ratings yet
Non-Parametric-G4 20240222 114636 0000
28 pages
Lecture 4 - How To Choose A Statistical Test
No ratings yet
Lecture 4 - How To Choose A Statistical Test
18 pages
Chapter 5
No ratings yet
Chapter 5
12 pages
Non - Parametric Test
No ratings yet
Non - Parametric Test
6 pages
Non-Parametric Tests
100% (1)
Non-Parametric Tests
55 pages
Chi Square Test Goodness of Fit Test
No ratings yet
Chi Square Test Goodness of Fit Test
42 pages
NON-PARAMETRIC TESTS
No ratings yet
NON-PARAMETRIC TESTS
11 pages
PUGAT - Parametric and Non-Parametric Tools
No ratings yet
PUGAT - Parametric and Non-Parametric Tools
2 pages
T-Test For One Mean
No ratings yet
T-Test For One Mean
8 pages
Chapter 1
No ratings yet
Chapter 1
50 pages
Unit-3-Non Parametric test-NOTES
No ratings yet
Unit-3-Non Parametric test-NOTES
39 pages
Non-Parametric Analysis - 20241029 - 033906 - 0000
No ratings yet
Non-Parametric Analysis - 20241029 - 033906 - 0000
79 pages
Exp 3
No ratings yet
Exp 3
35 pages
L-10 TESTS OF SIGNIFICANCE-2
No ratings yet
L-10 TESTS OF SIGNIFICANCE-2
41 pages
Week7-Inferentionalstat - (Grup Differences)
No ratings yet
Week7-Inferentionalstat - (Grup Differences)
32 pages
Statistical Tests
No ratings yet
Statistical Tests
10 pages
ADA Binder
No ratings yet
ADA Binder
171 pages
Statistical Analysis: Parametric Non Parametric
No ratings yet
Statistical Analysis: Parametric Non Parametric
10 pages
Non Parametric Testing
No ratings yet
Non Parametric Testing
42 pages
Stat Chapter-14 Part-2 PPT
No ratings yet
Stat Chapter-14 Part-2 PPT
43 pages
Choosing The Right Statistical Test: Source
No ratings yet
Choosing The Right Statistical Test: Source
4 pages
BUP-08-Nonparametric tests
No ratings yet
BUP-08-Nonparametric tests
9 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
256
No ratings yet
256
8 pages
256
No ratings yet
256
8 pages
Ding Dissertation
No ratings yet
Ding Dissertation
332 pages
Importance of Equivalent Fractions
No ratings yet
Importance of Equivalent Fractions
12 pages