Chapter One and 2 Biostatistics and Epidemiology
Chapter One and 2 Biostatistics and Epidemiology
Biostatistics and Epidemiology In the following section, we will consider selected epidemiologic
concepts.
Chapter One: Principles and Methods of Epidemiology
“Medicine to produce health has to examine disease; and music to
create harmony, must investigate discord” (Plutarch A.D. 46–120).
A rate is a proportion involving a numerator and a the midyear population is an estimate of the average number during the
denominator. year. One can, however, speak of death rates over a five year Period
Both the numerator and the denominator must be clearly rather than a one-year period, and one can define the population at risk
as all those alive at the beginning of the period.
defined so that you know to which group (denominator) your
rate refers.
1.3 Age-Adjusted Rates
The numerator is contained in (is a subset of) the denominator.
This is in contrast to a ratio where the numerator refers to a
When comparing death rates between two populations, the age
different group from the denominator.
composition of the populations must be taken into account. Since older
Mortality rates pertain to the number of deaths occurring in a particular people have a higher number of deaths per 1,000 people, if a population
population subgroup and often provide one of the first indications of a is heavily weighted by older people, the crude mortality rate would be
health problem. The following definitions are necessary before we higher than in a younger population and a comparison between the two
continue our discussion: groups might just reflect the age discrepancy rather than an intrinsic
difference in mortality experience. One way to deal with this problem
The Crude Anual Mortality Rate (or Death Rate) is to compare age-specific death rates, death rates specific to a
The total Number of Death During a Year in the Population at Risk particular age group. Another way that is useful when an overall
summary figure is required is to use age-adjusted rates. These are rates
The Population at Risk
adjusted to what they would be if the two populations being compared
(Usually Taken as the Population at Mid Year)
had the same age distributions as some arbitrarily selected standard
The Cause - Specific Anual Mortality Rate population.
Number of deaths occurring due to a particular cause
For example, the table below shows the crude and age-adjusted
during the year in the population at risk
mortality rates for the United States at five time periods.15,7 The
population at risk (all those alive at midyear)
adjustment is made to the age distribution of the population in 1940 as
well as the age distribution of the population in 2000. Thus, we see that
The Age - Specific Annual Mortality Rate
in 1991 the age-adjusted rate was 5.1/1000 when adjusted to 1940
number of deaths occurring in the given age group standard, but the crude mortality rate was 8.6/1000. This means that if
during the year in the population at risk in 1991the age distribution of the population were the same as it was
population at risk(those in that age group alive at midyear) in 1940,then the death rate would have been only 5.1/1000 people. The
crude and age-adjusted rates for 1940 are the same because the 1940
A reason for taking the population at midyear as the denominator is population serves as the “standard” population whose age distribution
that a population may grow or shrink during the year in question and
is used as the basis for adjustment.
Page 2
Biostatistics and Epidemiology
When adjusted to the year 2000 standard, the age-adjusted rate was 9.3. Number of persons with a disease
Prevalence rate of a disease
If in 1991the age distribution were the same as in 2000, then the death Total number of persons in population
rate would have been 9.3/1000 people. So age-adjusted rates depend at risk at a particular point in time
on the standard population being used for the adjustment.
1.4 Incidence and Prevalence Rates Number of new cases of a disease per unit of time
The incidence rate
Total number at risk in beginning of this time period
The prevalence rate and the incidence rate are two measures of
morbidity (illness). For example, studies have found that the ten-year incidence of a major
coronary event (such as heart attack) among white men, ages 30–59,
with diastolic blood pressure 105 mm Hg or above at the time they
were first seen, was found to be 183 per 1,000. This means that among
Page 3
Biostatistics and Epidemiology
1,000 white men, ages 30–59, who had diastolic blood pressure above observe them over a period of time, and determine how many develop
105 mm Hg at the beginning of the ten-year period of observation, 183 the disease over that time period. The implementation of such a process
of them had a major coronary event (heart attack or sudden death) is difficult and costly.
during the next ten years. Among white men with diastolic blood
pressure of <75 mm Hg, the ten-year incidence of a coronary event was 1.4 Measures of Relative Risk: Inferences from Prospective Studies
found to be 55/1,000.Thus comparison of these two incidence rates, In epidemiologic studies we are often interested in knowing how much
183/1,000for those with high blood pressure versus 55/1,000 for those more likely an individual is to develop a disease if he or she is exposed
with low blood pressure, may lead to the inference that elevated blood to a particular factor than the individual who is not so exposed. A
pressure is a risk factor for coronary disease. simple measure of such likelihood is called relative risk (RR). It is the
Often one may hear the word “incidence” used when what is really ratio of two incidence rates: the rate of development of the disease for
meant is prevalence. You should beware of such incorrect usage. For people with the exposure factor, divided by the rate of development of
example, you might hear or even read in a medical journal that the the disease for people without the exposure factor.
incidence of diabetes in 1973 was 42.6 per 1,000 individuals, ages 45– Example: Calculation of Relative Risk from Prospective Studies
64, when what is really meant is that the prevalence was 42.6/1,000.
The thing to remember is that prevalence refers to the existence of a Relative risk can be determined by constructing a 2 ×2 table as follows:
disease at a specific period in time, whereas incidence refers to new
cases of a disease developing within a specified period of time. RISK DISEASE (developing in the Total
FACTOR(determined specified period)
Note that mortality rate is an incidence rate, whereas morbidity may be at beginning of study Yes No
expressed as an incidence or prevalence rate. In a chronic disease the period ):High Blood
Pressure
prevalence rate is greater than the incidence rate because prevalence
Present a=90 b=403 a+b= 493
includes both new cases and existing cases that may have first occurred Absent c=70 d=1201 c+d= 1271
a long time ago, but the afflicted patients continued to live with the
condition. For a disease that is either rapidly fatal or quickly cured, incidence of disease among those with high BP
Relative Risk
incidence and prevalence may be similar. Prevalence can be incidence of disease among those with normal BP
established by doing a survey or a screening of a target population and a
a b 90
counting the cases of disease existing at the time of the survey. This is 493
3.31
c 70
a cross-sectional study. Incidence figures are harder to obtain than c d 1271
prevalence figures since to ascertain incidence one must identify a
group of people free of the disease in question (known as a cohort),
Page 4
Biostatistics and Epidemiology
The incidence of the disease among individuals with high obtained at time of sampling, in contrast to prospective studies where
blood pressure is 3.31 times that of with normal blood pressure. the independent variables are measured at time of sampling and the
dependent variable is measured at some future time (i.e., when the
Relative risk, or hazard ratio, can be calculated from Cox proportional disease develops).The real distinction between case-control or
hazards regression models (which allow for adjustment for other retrospective studies and prospective studies has to do with selecting
variables). individuals for the study—those with and without the disease in case
control/retrospective studies, and those with and without the factor of
1.5 Odds Ratio: Estimate of Relative Risk from Case-Control Studies interest in prospective studies.
A case-control study is one in which the investigator seeks to establish Since in prospective studies we sample the people with the
an association between the presence of a characteristic (a risk factor) characteristic of interest and the people without the characteristic, we
and the occurrence of a disease by starting out with a sample of can obtain the relative risk directly by calculating the incidence rates
diseased persons and a control group of non-diseased persons and by of disease in these two groups. In case-control studies, however, we
noting the presence or absence of the characteristic in each of these two sample patients with and without the disease, and note the presence or
groups. It can be illustrated by constructing a 2 ×2 table as follows: absence of the characteristic of interest in these two groups; we do not
have a direct measure of incidence of disease. Nevertheless, making
RISK DISEASE
FACTOR certain assumptions, we can make some approximations to what the
Present Absent
relative risk would be if we could measure incidence rates through a
Present a b prospective study. These approximations hold best for diseases of low
Absent c d
incidence.
Total a+c b+d
(number of persons (number of persons
To estimate relative risk from case-control studies note that
With disease) Without disease)
a a b a c d
The objective is to determine if the proportion of persons with the
disease who have the factor is greater than the proportion of persons c c d c a b
without the disease who have the factor. In other words, it is desired to
know whether a/(a+ c) is greater than b/(b+ d). Now assume that since the disease is not all that common, c is
negligible in relation to d (in other words among people without the
Case-control studies are often referred to as retrospective studies risk factor there aren't all that many who will get the disease, relative
because the investigator must gather data on the independent variables to the number of people who will not get it). Assume also that, in the
retrospectively. The dependent variable—the presence of disease—is population, a is negligible relative to b, since even among people with
Page 5
Biostatistics and Epidemiology
the risk factor not all that many will get the disease in comparison to 475 * 61
OR 9.60 estimate of relative risk
the number who won't get it. Then the terms in the parentheses become 431* 7
d in the numerator and b in the denominator so that
This means that smokers of 15–24 cigarettes daily are9.6 times more
a c d ad likely to get lung cancer than are nonsmokers.
reduces to OR
c a b bc
One more thing about the odds ratio: it is the ratio of odds of lung
This is known as the odds ratio (OR) and is a good estimate of relative cancer for those who smoke 15–24 cigarettes a day, relative to odds of
risk when the disease is rare. lung cancer for those who don't smoke. In the example above, we get
475
An example of how the odds ratio is calculated is shown below. In a for smokers : odds of lung cancer are
case-control study of lung cancer the table below was obtained. Note 431
that we are not sampling smokers and nonsmokers here. Rather we are
7
for nonsmokers : odds of lung cancer are
sampling those with and without the disease. So although in the 61
475 431
population at large a is small relative to b, in this sample it is not so. ratio of odds
7 61
So the point is, the odds ratio is the odds ratio, whether the disease is
rare or not. It is always the ratio of odds of disease for those with the
exposure versus the odds of disease for those without the exposure. But
when the disease is rare, it is also a good estimate of the relative risk.
Odds ratios can be calculated from logistic regression (which allow for
adjustment for other variables).
The odds ratio, as an estimate of the relative risk of developing lung There are many different types of bias that might lead you to either
cancer for people who smoke 15–24 cigarettes a day, compared with underestimate or overestimate the size of a relative risk of odds ratio,
nonsmokers is and it is important to try to anticipate potential sources of bias and
avoid them. The following illustration shows the impact of one kind of
possible bias: ascertainment or response bias.
Page 6
Biostatistics and Epidemiology
Assume that you have the following situation. Of 100 people exposed
to a risk factor, 20% develop the disease and of a 100 people
unexposed, 16% develop the disease, yielding a relative risk of 1.25, as
shown in the illustration.
Page 7
Biostatistics and Epidemiology
disease may be smoking. Since some coffee drinkers are also smokers, Whenever we enrolled a coffee drinker into the study, we would
if a study found a relationship between coffee drinking (the determine if that person was a smoker. If the patient was a smoker, the
independent variable) and development of heart disease (the dependent next patient who would be enrolled who was not a coffee drinker (i.e.,
a member of the comparison group), would also have to be a smoker.
variable), it could really mean that it is the smoking that causes heart
For each coffee-drinking nonsmoker, a non–coffee-drinking
disease, rather than the coffee. In this example, smoking is the nonsmoker would be enrolled. In this way we would have the same
confounding variable. number of smokers in the two groups. This is known as one -to-one
matching.
If both the confounding variable and the independent variable of
interest are closely associated with the dependent variable, then the
observed relationship between the independent and dependent
variables may really be a reflection of the true relationship between the
confounding variable and the dependent variable. An intuitive way to
look at this is to imagine that if a confounding variable were perfectly
associated with an independent variable, it could be substituted for it.
It is important to account or adjust for confounding variables in the
design and statistical analysis of studies in order to avoid wrong
inferences.
1.8 Matching
Page 8
Biostatistics and Epidemiology
Page 9
Biostatistics and Epidemiology
Page 10
Biostatistics and Epidemiology
Page 11
Biostatistics and Epidemiology
Why Do We Test the Null Hypothesis? The important point is that we can never be certain that we are
right in either accepting or rejecting a hypothesis. In fact, we run
Suppose we believe that drug A is better than drug B in the risk of making one of two kinds of errors: We can reject the
preventing death from a heart attack. Why don't we test that belief null or test hypothesis incorrectly, that is, we can conclude that
directly and see which drug is better, rather than testing the the drug does reduce mortality when in fact it has no effect. This
hypothesis that drug A is equal to drug B? The reason is that there is known as a type I error. Or we can fail to reject the null or test
is an infinite number of ways in which drug A can be better than hypothesis incorrectly, that is, we can conclude that the drug does
drug B, so we would have to test an infinite number of not have an effect when in fact it does reduce mortality. This is
hypotheses. If drug A causes 10% fewer deaths than drug B, it is known as a type II error. Each of these errors carries with it
better. So first we would have to see if drug A causes 10% fewer certain consequences. In some cases a type I error may be more
deaths. If it doesn't cause 10%fewer deaths, but if it causes 9% serious; in other cases a type II error may be more serious.
fewer deaths, it is also better. Then we would have to test whether
our observations are consistent with a 9% difference in mortality We cannot eliminate the risk of making one of these kinds of
between the two drugs. Then we would have to test whether there errors, but we can lower the probabilities that we will make these
is an 8% difference, and so on. errors. The probability of making a type I error is known as the
significance level of a statistical test. When you read in the
On the other hand, when we test the null hypothesis of no literature that a result was significant at the .05 level it means that
difference, we only have to test one value—a 0% difference and in this experiment the results are such that the probability of
we ask whether our observations are consistent with the making a type I error is less than or equal to .05. Mostly in
hypothesis that there is no difference in mortality between the experiments and surveys people are very concerned about having
two drugs. If the observations are consistent with a null a low probability of making a type I error and often ignore the
difference, then we cannot state that one drug is better than the type II error. This may be a mistake since in some cases a type II
other. If it is unlikely that they are consistent with a null error is a more serious one than a type I error. In designing a
difference, then we can reject that hypothesis and conclude there study, if you aim to lower the type I error you automatically raise
is a difference. the type II error probability. To lower the probabilities of both
the type I and type II error in a study, it is necessary to increase
the number of observations.
Page 12
Biostatistics and Epidemiology
Page 13
Biostatistics and Epidemiology
o Informed Consent participant’s native language. All this must be approved by the
medical institutions IRB (or Institutional Review Board), which
One of the most important elements in protection of human is generally a committee of experts and lay people who review
subjects is the principle of informed consent. The study subject and must approve all research protocols before the research is
must freely consent to be part of the study after being fully started, and who monitor adverse events as the research
informed of the potential risks and benefits. progresses. Different IRBs have different specific requirements
that are usually posted on their websites. Informed consent is an
There are certain elements that must be in a written consent form.
ongoing process. It is not just signing a form at the beginning of
The purpose of the research must be stated; a 24-hour contact
a study. The researcher has an obligation to keep the participant
person must be listed; there must be a description of the study
informed of relevant new research that may affect his or her
procedures: what is expected of the participant, the duration of
decision to continue participating.
the study, and how much of the participant’s time it will take.
The potential risks and discomforts, potential benefits, Note that, when is it ethical to begin a clinical trial of a new
inconvenience to the participants, all must be clearly stated. treatment? When there is equipoise. Equipoise means that there
There must be a statement that participation is voluntary and that is about equal evidence that the treatment may provide benefit as
the participant has the right to withdraw at any time and that this there is that it will not provide benefit.
will not prejudice the care of the participant. If the research may
result in need for further care or diagnostic procedures, the Research Integrity
participant must be told to what extent he or she is responsible
for further care and what the study will pay for. If there is any For research conclusions to be valid, data collection procedures
compensation to the participants, either for expenses incurred in must be rigorously and uniformly administered. No data may be
participating or time spent, they must be informed of the amount. altered without documentation. If there is a clerical error, the
(The amount should not be excessive, as that may appear change and reason for it must be documented. Enrollment must
coercive.) A statement assuring confidentiality and how it will be be according to strict and pre-planned standards. Sometimes
maintained must be included. (fortunately, rarely) there is a great pressure to enroll subjects in
a given time frame, or the researcher (in violation of the principle
Most important, the participant must understand what he or she of equipoise) really believes the treatment can help his or her
is agreeing to and the consent form must be phrased in language patients, and so “bends” the enrollment rules. This may
that is understandable, and if appropriate, translated into the invalidate the research and so is unethical.
Page 14
Biostatistics and Epidemiology
Page 15