0% found this document useful (0 votes)
87 views

Nciph ERIC2

This document provides an overview of common statistical tests and applications in epidemiological literature. It discusses: 1) Types of data - nominal, ordinal, and continuous data. Nominal data have no order, ordinal data have a limited order, and continuous data have an infinite number of evenly spaced values. 2) Descriptions of data distribution - frequencies or probabilities of values in a population. Discrete data are often shown in bar graphs, continuous data assume a symmetric bell curve. 3) Hypothesis testing - involves testing a null hypothesis for a population parameter. The null hypothesis is assumed true until evidence suggests otherwise. A failure to reject the null means results were not significant but does not prove the null is true

Uploaded by

bejarhasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Nciph ERIC2

This document provides an overview of common statistical tests and applications in epidemiological literature. It discusses: 1) Types of data - nominal, ordinal, and continuous data. Nominal data have no order, ordinal data have a limited order, and continuous data have an infinite number of evenly spaced values. 2) Descriptions of data distribution - frequencies or probabilities of values in a population. Discrete data are often shown in bar graphs, continuous data assume a symmetric bell curve. 3) Hypothesis testing - involves testing a null hypothesis for a population parameter. The null hypothesis is assumed true until evidence suggests otherwise. A failure to reject the null means results were not significant but does not prove the null is true

Uploaded by

bejarhasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ERIC NOTEBOOK SERIES

Second Edition

Common Statistical Tests and Applications in


Epidemiological Literature
Any individual in the medical field distribution, location and variation.
Second Edition Authors:
will, at some point, encounter
There are three different types of
Lorraine K. Alexander, DrPH instances when epidemiological
data: nominal, ordinal, and
methods and statistics will be
Brettania Lopes, MPH continuous data. Nominal data do
valuable tools in addressing
not have an established order or rank
Kristen Ricchetti-Masterson, MSPH research questions of interest.
and contain a finite number of values.
Examples of such questions might Gender and race are examples of
Karin B. Yeatts, PhD, MS
include: nominal data. Ordinal data have a
limited number of values between
Will treatment with a new anti-
which no other possible values exist.
hypertensive drug significantly
Number of children and stage of
lower mean systolic blood
disease are good examples of ordinal
pressure?
data. It should be noted that ordinal
Is a visit with a social worker, in data do not have to have evenly
addition to regular medical spaced values as occurs with
visits, associated with greater continuous data, however, there is an
satisfaction of care for cancer implied underlying order. Since both
patients as compared to those ordinal and nominal data have a finite
who only have regular medical number of possible values, they are
visits? also referred to as discrete data. The
last type of data is continuous data
There are a number of steps in
which are characterized by having an
evaluating data before actually
infinite number of evenly spaced
addressing the above questions.
values. Blood pressure and age fall
These steps include description of
into this category. It should be noted
your data as well as determining
for data collection and analysis that
what the appropriate tests are for
continuous, ordinal, or nominal values
your data.
can be grouped. Grouped data are
Description of data often referred to as categorical.
The type of data one has determines Possible categories might include:
the statistical procedures that are low, medium, high, or those
utilized. Data are typically described representing a numerical range.
in a number of ways: by type,

ERIC at the UNC CH Department of Epidemiology Medical Center


ERIC NOTEBOOK PA G E 2

A second characteristic of data description, distribution, refers the alternative hypothesis is that there is a difference in
to the frequencies or probabilities with which values occur the mean blood pressure of the standard treatment and
within our population. Discrete data are often represented new drug group following therapy. The alternative
graphically with bar graphs like the one below (Figure 1). hypothesis might also be described as your "best guess"
as to what the values are.

However, in statistical analysis, the null hypothesis is the


main interest, and is the one actually being tested. In
statistical testing, we assume that the null hypothesis is
correct and determine how likely we are to have obtained
the sample (or values) we actually obtained in our study
under the condition of the null. If we determine that the
probability of obtaining the sample we observed is
sufficiently small, then we can reject the null hypothesis.
Since we are able to reject the null hypothesis, we have
Figure 1. Bar graph
evidence that the alternative hypothesis may be true.
Continuous data are commonly assumed to have a symmetric,
On the other hand, if the probability of obtaining our study
bell-shaped curve as shown below (Figure 2). This is known as a
results is not small, we fail to reject the assumption that
Gaussian distribution, the most commonly assumed
the null hypothesis is true. It should be noted that we are
distribution in statistical analysis.
not concluding that the null is true. This is a small, but
important distinction. A test that fails to reject the null
hypothesis should be considered inconclusive. An
example will help to illustrate this point.

In a sealed bag, we have 100 blue marbles and 20 red


marbles. (This bag is essentially representing the entire
population). One individual formulates the null
hypothesis that all the marbles are blue, and the
alternative which is all the marbles are not blue. To test
this hypothesis, 10 marbles are sampled from the bag. All
Figure 2. Gaussian distribution ten marbles selected are indeed blue. Thus the
individual has failed to reject the null that all the marbles
Hypothesis testing in the bag are blue. However, because all of the marbles
Hypothesis testing, also known as statistical inference or were not sampled, you cannot conclude that all the
significance testing, involves testing a specified hypothesized marbles in the bag are blue. (We happen to know this is
condition for a populations parameter. This condition is best not true, but it is impossible to know in the real world with
described as the null hypothesis. For example, in a clinical trial populations too large to fully evaluate). If another
of a new anti-hypertensive drug, the null hypothesis would state individual selects 10 marbles from the bag and finds that
that there is no difference in effect when comparing the new 8 are blue and 2 are red, we can reject the null hypothesis
drug to the current standard treatment. Contrary to the null is that all the marbles are blue since we have selected at
the alternative hypothesis, which generally defines the possible least one red marble.
values for a parameter of interest. For the previous example,

ERIC at the UNC CH Department of Epidemiology Medical Center


ERIC NOTEBOOK PA G E 3

Error in statistical testing Example

Earlier, we indicated that we can reject the null hypothesis To evaluate if drug Z reduces mean systolic blood
pressure, a randomized clinical trial will be performed
if the probability of obtaining a sample like the one where 12 individuals receive drug Z and 8 receive a
observed in our study is sufficiently small. You may ask placebo. The null hypothesis to be tested is that there
What is sufficiently small? How small is determined by is no difference in the mean systolic blood pressure of the
experimental and placebo groups. The alternative
how willing we are to reject the null hypothesis when it hypothesis is that there is a difference between the
accurately reflects the population from which it is means of the two groups. The type I error for your trial
sampled. This type of error is called a Type I error. will be 5%.
This error is also commonly called alpha (). Alpha is the
probability of rejecting the null hypothesis when the null is Results
true. This probability is selected by the researcher and is Below is the group assignments and resulting systolic
typically set at 0.05. It is important to remember that this blood pressure (SBP)
is an arbitrary cut-point and should be taken into
Patient Assignment Systolic BP
consideration when making conclusions about the results
of the study. 1 Drug Z 100

3 Drug Z 110
There is a second type of error that can be made during
statistical testing. It is known as Type II error, which is the 5 Drug Z 122

probability of not rejecting the null when the alternative 7 Drug Z 109
hypothesis is indeed true, or in other words, failing to 9 Drug Z 108
reject the null when the null hypothesis is false. Type II
11 Drug Z 111
error is commonly known as . Beta relates to another
13 Drug Z 118
important parameter in statistical testing which is power.
Power is equal to (1-) and is essentially the ability to 15 Drug Z 105
avoid making a type II error. Like , power is also defined 17 Drug Z 115
by the researcher, and is typically set at 0.80. Below is a 18 Drug Z 119
schematic of the relationships between , and power.
19 Drug Z 106

20 Drug Z 109

2 Placebo 129
Decision Truth
Null True Null False 4 Placebo 125

Reject Null power 6 Placebo 136

Accept Null 8 Placebo 129

10 Placebo 135
Students T test 12 Placebo 134
This test is most commonly used to test the difference between 14 Placebo 140
the means of the dependent variables of two groups. For
example, this test would be appropriate if one wanted to 16 Placebo 128
evaluate whether or not a new anti-hypertensive drug reduces
mean systolic blood pressure.
meandrug = 100 + 110 + + 109 = 111 mm Hg
12

ERIC at the UNC CH Department of Epidemiology Medical Center


ERIC NOTEBOOK PA G E 4

this quantity would = 0 when there is no difference


meanplacebo = 129 + 125 + + 128 = 132 mm Hg expected between the drug and placebo groups).
8 t = (meandrug - meanplacebo) - (*meandrug - *meanplacebo)
meandrug meanplacebo = - 21 mm Hg SE (meandrug meanplacebo)

Now that we have determined the difference between t= -21 - 0 = -7.8 = |-7.8| = 7.8
means, we need to determine the standard error for that 2.69
difference which is calculated using the pooled estimate of
We now compare our calculated value to a table of critical
the variance (2).
values for the Students' T distribution (found in most
The formula for the standard error of the drug Z group is: basic statistics books). The table also requires that we
know the degrees of freedom and the value of a we have
selected. Degrees of freedom (df) refers to the amount of
2drug = (SBPdrug meandrug)2 = information that a sample has in estimating the variance.
ndrug - 1 It is generally the sample size minus one. The df for our
calculation is 12 + 8 - 2 = 18 (the sample size for each
2drug =[(100-111)2 + (110-111)2 + ...+ (109-111)2] = 40.9 group - 1). With a two tailed a of 0.05, our value |-7.8|
12-1 is greater than the critical value from the table (2.101).
Thus, we can reject the null hypothesis that there is no
The standard error for the placebo group is calculated in difference between mean blood pressure levels, and
the same manner substituting the values for the placebo accept, by elimination, our alternative hypothesis.
group.

2placebo = 25.1 Chi-square analysis


What happens if we don't have continuous data, and are
faced with categorical data instead? We could turn to
Next, we would need to calculate a pooled estimate of the chi-square analysis to evaluate if there are significant
variance using the following equation: associations between a given exposure and outcome (the
row and column variables in a contingency table). 2 X 2
2p = [(ndrug - 1) 2drug] + (nplacebo - 1) 2placebo] = contingency tables are one of the most common ways to
(ndrug - 1) + (nplacebo - 1) present categorical data, and we can see this in analyzing
data that was collected to address the question presented
in this notebook.
2p = (11)(40.9) + (7)(25.1) = 626 = 34.8 Is a visit with a social worker, in addition to regular
11 + 7 18 medical visits, associated with greater satisfaction of care
for cancer patients as compared to those who only have
The pooled estimate of the variance can then be utilized to regular medical visits?
calculate the standard error for the difference in means:
Below is a generic 2 X 2 table representing the data. It is
important to note the set-up of the table, as cell a
SE2 (meandrug meanplacebo) = 2p + 2p generally represents the group of interest (diseased and
exposed) and cell d represents the referent group (no
ndrug nplacebo
disease and unexposed).
SE2 = 34.8 + 34.8 = 7.236 Row value (often disease
12 8 or health outcome)
Column Value 1 0 Total
SE = 2.69
(often Exposure)
Now we are finally ready to test for significant differences
1 a b a+b
in the mean blood pressure of our two groups: (*mean
indicates the hypothesized values for the null-generally 0 c d c+d

Total a+c b+d n

ERIC at the UNC CH Department of Epidemiology Medical Center


ERIC NOTEBOOK PA G E 5

Here we have the contingency table with data from our With this information, we can now calculate the 2
trial: statistic:

2 = (Observedi - Expectedi)2
Greater Satisfaction?
Expectedi
Social Worker Yes No Total
Visit?
Yes 64 46 90 2 = (64-55)2 + (46-55)2 + (36-45)2 + (54-45)2
55 55 45 45
No 36 54 110

Total 100 100 200


2 = 6.545

In chi-square analysis we are testing the null hypothesis The chi-square statistic for these data has approximately 1
that there is no association between a social worker visit degree of freedom, an of 0.05, and it is compared to the
and a greater satisfaction with care. critical values on standard Chi-square table. Note that
the degrees of freedom would increase as the number of
Generally, in evaluating this type of data, it is important for rows and columns of our tables increases (for instance a 3
each of the individual cells to have large values, (i.e. X 4 table). Since our calculated value (2 = 6.545) is
greater than 5 or 10 each), If these conditions are not met, greater than the critical value (3.841), we can once again
a special type of chi-square analysis is conducted called reject the null hypothesis that there is no association
the Fishers exact test. This will not be discussed in this between the exposure and the outcome of interest, and
notebook. conclude that in this case seeing a social worker is
To calculate the chi-square statistic (2 ): significantly associated with a greater satisfaction with
care.

2 = (Observedi - Expectedi)2 Important notes


Expectedi
It is important to remember that the statistical tests and
with i representing the frequency in a particular cell of the examples presented here are only an elementary
2 X 2 table. Below is the calculation for the frequencies presentation of the large scope of situations that can be
that are expected in each cell. addressed by these data. The intention of this notebook
is to provide a basic understanding of the underlying
Row value
principles of these statistical tests rather than implying
Column 1 2 Total that what has been presented is appropriate for every
Value situation.
1 (a+b)(a+c) (a+b)(b+d) a+b
n n Further information about these statistical tests and other
2 (c+d)(a+c) (c+d)(b+c) c+d applications can be found in the following references:

n n Statistical First Aid: Interpretation of Health Research


Total a+c b+d n Data by Robert P Hirsch and Richard K. Riegelman.
Blackwell Scientific Publications, Cambridge, MA 1992.

Thus, we now have a table that has both the actual and Categorical Data Analysis, Using the SAS System by ME
expected (in parentheses) values: Stokes, CS Davis, and GG Koch. SAS Institute Inc., Cary,
Greater Satisfaction? NC, 2001.
Social Worker Yes No Total
Visit?
Yes 64 (55) 46 (55) 90
No 36 (45) 54 (45) 110

Total 100 100 200

ERIC at the UNC CH Department of Epidemiology Medical Center


ERIC NOTEBOOK PA G E 6

Practice Questions
Acknowledgement
Answers are at the end of this notebook The authors of the Second Edition of the ERIC Notebook
would like to acknowledge the authors of t he
Researchers are conducting a study of the association ERIC N ot ebook, First Edition: Michel Ib rahim ,
between working in a noisy job environment and hearing MD, PhD, Lorraine Alexander, DrPH, Carl Shy,
loss. The researchers null hypothesis is that there is no MD, DrPH and Sherry Farr, GRA, Depart m ent of
difference in hearing loss between people who work in a Epidem iology at t he Univers it y of N ort h Carolina
noisy job environment compared with people who work in at Chapel Hill. The First Edition of the ERIC
a quiet job environment. The researchers alternative N ot eb ook was produced b y t he Educat ional Arm
hypothesis is that there is a difference in hearing loss of the Epidem iologic Res earch and Inform at ion
between people who work in a noisy job environment Cent er at Durham, N C. The funding for the ERIC
compared with people who work in a quiet job N ot eb ook First Edit ion was provided b y t he
environment. The researchers decided to set their alpha Departm ent of V et erans Affairs (DV A), V et erans
level at 0.05. The researchers analysis results show a p- Healt h Adm inist rat ion (V HA), Cooperat ive
value of 0.0003 (please note that for the purposes of this St udies Program (CSP) to prom ot e the s t rat egic
question you are being provided with just the p-value from growt h of the epidemiologic capacit y of t he
the study when in reality a study analysis is much more DV A.
complex).

1) True or false: The alpha level of 0.05 is an arbitrary


value.

2) True or false: Based on the results, the researchers


can conclude their null hypothesis is true.

3) True or false: Based on these results, the researchers


should reject the assumption that their null hypothesis
is true.

4) True or false: An alpha level of 0.05 means there is a


0.05 percent chance that the researchers will
incorrectly reject the null hypothesis.

References

Dr. Carl M. Shy, Epidemiology 160/600 Introduction to


Epidemiology for Public Health course lectures, 1994-
2001, The University of North Carolina at Chapel Hill,
Department of Epidemiology

Rothman KJ, Greenland S. Modern Epidemiology. Second


Edition. Philadelphia: Lippincott Williams and Wilkins,
1998.

The University of North Carolina at Chapel Hill, Department


of Epidemiology Courses: Epidemiology 710,
Fundamentals of Epidemiology course lectures, 2009-
2013, and Epidemiology 718, Epidemiologic Analysis of
Binary Data course lectures, 2009-2013.

ERIC at the UNC CH Department of Epidemiology Medical Center


ERIC NOTEBOOK PA G E 7

Answers to Practice Questions

1) True or false: The alpha level of 0.05 is an arbitrary


value.

Answer: True

This statement is true. The level of alpha is often set at


0.05, however, this choice is arbitrary and researchers
may choose a different value.

2) True or false: Based on the results, the researchers can


conclude their null hypothesis is true.

Answer: False

This statement is false. Researchers should never


conclude that their null hypothesis is true. It is possible to
conclude that we should fail to reject the assumption that
the null hypothesis is true but this is not the same as
concluding that the null hypothesis is actually true.

3) True or false: Based on these results, the researchers


should reject the assumption that their null hypothesis is
true.

Answer: True

This statement is true. The p-value was 0.0003 which is


less than the alpha of 0.05. The researchers should reject
their null hypothesis that there is no difference in hearing
loss diagnosis between people who work in a noisy job
environment compared with people who work in a quiet
job environment.

4) True or false: An alpha level of 0.05 means there is a


0.05 percent chance that the researchers will incorrectly
reject the null hypothesis.

Answer: False

This statement is false. An alpha level of 0.05 means


there is a 5% chance the researchers will incorrectly reject
the null hypothesis.

ERIC at the UNC CH Department of Epidemiology Medical Center

You might also like