Topic 4a
Topic 4a
Social Medicine
HOURS: 2
PLACE: Department of Public Health
1. Relevance of the topic:
Epidemiology is the study of the distribution and determinants of disease
frequency in human populations and the application of this study to control health
problems. The term study includes both surveillance, whose purpose is to monitor
aspects of disease occurrence and spread that are pertinent to effective control, and
epidemiologic research, whose goal is to harvest valid and precise information
about the causes, preventions, and treatments for disease. The term disease refers
to a broad array of health-related states and events including diseases, injuries,
disabilities, and death.
Epidemiologic research encompasses several types of study designs, including
experimental studies and observational studies such as cohort and case–control
studies. Each type of epidemiologic study design simply represents a different way
of harvesting information. The selection of one design over another depends on the
particular research question, concerns about validity and efficiency, and practical
and ethical considerations. For example, experimental studies, also known as trials,
investigate the role of some factor or agent in the prevention or treatment of a
disease. In this type of study, the investigator assigns individuals to two or more
groups that either receive or do not receive the preventive or therapeutic agent.
Because experimental studies closely resemble controlled laboratory
investigations, they are thought to produce the most scientifically rigorous data of
all the designs.
Cross-sectional studies
Cross-sectional studies measure the prevalence of disease and thus are often called
prevalence studies. In a cross-sectional study the measurements of exposure and
effect are made at the same time. It is not easy to assess the reasons for
associations shown in cross-sectional studies. The key question to be asked is
whether the exposure precedes or follows the effect. If the exposure data are
known to represent exposure before any effect occurred, the data from a cross-
sectional study can be treated like data generated from a cohort study.
Cross-sectional studies are relatively easy and inexpensive to conduct and are
useful for investigating exposures that are fixed characteristics of individuals, such
as ethnicity or blood group. In sudden outbreaks of disease, a cross-sectional study
to measure several exposures can be the most convenient first step in investigating
the cause.
Data from cross-sectional studies are helpful in assessing the health care needs of
populations. Data from repeated cross-sectional surveys using independent random
samples with standardized definitions and survey methods provide useful
indications of trends. Each survey should have a clear purpose. Valid surveys need
well-designed questionnaires, an appropriate sample of sufficient size, and a good
response rate.
Many countries conduct regular cross-sectional surveys on representative samples
of their populations focusing on personal and demographic characteristics,
illnesses and health-related habits. Frequency of disease and risk factors can then
be examined in relation to age, sex and ethnicity. Cross-sectional studies of risk
factors for chronic diseases have been done in a wide range of countries/
Method:
• Select persons with a given disease (CASES) and persons without a
given disease (CONTROLS)
• Various exposures are assessed as possible risk factors for the given
disease
• Good for evaluating potential risk factors for a disease
Schematic representation
Selection of Cases
Source:
• Hospitals
• Clinic patients
• Disease registries
• Tumor registries
Type:
Incident (newly diagnosed) cases
• Disadvantage - wait for new cases
Representativeness of cases
• Should cases be representative of all persons with the disease? NO
• For example, you may choose to study prostate cancer among men >55
years rather than all age groups – of course, your study results will be
applicable only to men > 55 years of age
Exposure
An important aspect of case-control studies is the determination of the start and
duration of exposure for cases and controls. In the case-control design, the
exposure status of the cases is usually determined after the development of the
disease (retrospective data) and usually by direct questioning of the affected person
or a relative or friend. The informant’s answers may be influenced by knowledge
about the hypothesis under investigation or the disease experience itself.
An example of the use of a case-control study design is shown in Table 3.2.
Researchers in Papua New Guinea compared the history of meat consumption in
people who had enteritis necroticans, with people who did not have the disease.
Proportionately more people who had the disease (50 of 61 cases) reported prior
meat consumption than those who were not affected (16 of 57).
Exposure is sometimes determined by biochemical measurements (e.g. lead in
blood or cadmium in urine), which may not accurately reflect the relevant past
exposure.
For example, lead in blood at age 6 years is not a good indicator of exposure at age
1 to 2 years, which is the age of greatest sensitivity to lead. This problem can be
avoided if exposure can be estimated from an established recording system (e.g.
stored results of routine blood testing or employment records) or if the case-control
study is carried out prospectively so that exposure data are collected before the
disease develops.
Thalidomide
A classic example of a case-control study was the discovery of the relationship
between thalidomide and limb defects in babies born in the Federal Republic of
Germany in 1959 and 1960. The study, done in 1961, compared affected children
with normal children. Of 46 mothers whose babies had malformations, 41 had been
given thalidomide between the fourth and ninth weeks of pregnancy, whereas none
of the 300 control mothers, whose children were normal, had taken the drug during
pregnancy.10 Accurate timing of the drug intake was crucial for determining
relevant exposure.
Odds ratio
The association of an exposure and a disease (relative risk) in a case-control study
is measured by calculating the odds ratio (OR), which is the ratio of the odds of
exposure among the cases to the odds of exposure among the controls. For the data
in Table 3.2, the odds ratio is given by:
This indicates that the cases were 11.6 times more likely than the controls to have
recently eaten meat.
The odds ratio is very similar to the risk ratio, particularly if a disease is rare. For
the odds ratio to be a good approximation, the cases and controls must be
representative of the general population with respect to exposure.
However, because the incidence of disease is unknown, the absolute risk can not be
calculated. An odds ratio should be accompanied by the confidence interval
observed around the point estimate.
Case-Control Studies
Advantages
• Well suited for rare diseases
• Sampling from source population
• Well suited for studying diseases with long latency periods (such as cancer)
• Time period shorter than a follow-up study
• Less expensive than a follow-up study
• Can be prospective or retrospective
Disadvantages:
• Difficult to establish a temporal relationship between exposure and disease
(Ai.e., exposure preceded disease)
Cohort studies
Cohort studies, also called follow-up or incidence studies, begin with a group of
people who are free of disease, and who are classified into subgroups according to
exposure to a potential cause of disease or outcome (Figure 3.6). Variables of
interest are specified and measured and the whole cohort is followed up to see how
the subsequent development of new cases of the disease (or other outcome) differs
between the groups with and without exposure. Because the data on exposure and
disease refer to different points in time, cohort studies are longitudinal, like
casecontrol studies.
Cohort studies have been called prospective studies, but this terminology is
confusing and should be avoided. As mentioned previously, the term “prospective”
refers to the timing of data collection and not to the relationship between exposure
and effect. Thus there can be both prospective and retrospective cohort studies.
Cohort studies provide the best information about the causation of disease and the
most direct measurement of the risk of developing disease. Although conceptually
simple, cohort studies are major undertakings and may require long periods of
follow-up since disease may occur a long time after exposure. For example, the
induction period for leukaemia or thyroid cancer caused by radiation
(i.e. the time required for the specific cause to produce an outcome) is many years
and it is necessary to follow up study participants for a long time. Many exposures
investigated are long-term in nature and accurate information about them requires
data collection over long periods. However, in the case of tobacco use, many
people have relatively stable habits and information about past and current
exposure can be collected at the time the cohort is defined.
In situations with sudden acute exposures, the causeeffect relationship for acute
effects may be obvious, but cohort studies are also used to investigate late or
chronic effects.
As cohort studies start with exposed and unexposed people, the difficulty of
measuring or finding existing data on individual exposures largely determines the
feasibility of doing one of these studies. If the disease is rare in the exposed group
as well as the unexposed group there may also be problems in obtaining a large
enough study group.
Late effects of poisoning: Bhopal
An example of measuring effects over a long time period is the catastrophic
poisoning of residents around a pesticide factory in Bhopal, India, in 1984.12 An
intermediate chemical in the production process, methyl isocyanate, leaked from a
tank and the fumes drifted into surrounding residential areas, exposing half a
million people to the gas. 20 000 people died as a result of this exposure. In
addition, 120 000 people still suffer health effects caused by the crash and
subsequent pollution. The acute effects were easily studied with a cross-sectional
design. More subtle chronic effects and those developing only after a long latency
period are still being studied using cohort study designs.
The expense of a cohort study can be reduced by using routine sources information
about mortality or morbidity, such as disease registers or national registers of
deaths as part of the follow-up. One example is the Nurses Health Study. Since
cohort studies take healthy people as their starting- point, it is possible to examine
a range of outcomes (in contrast to what can be achieved in case-control studies).
For example, the Framingham study – a cohort study that began in 1948 – has
investigated the risk factors for a wide range of diseases, including cardiovascular
and respiratory diseases and musculoskeletal disorders.
Similar large-scale cohort studies have been started in China. Baseline
demographic characteristics, medical histories, and major cardiovascular risk
factors including blood pressure and body weight were obtained from a
representative sample of 169 871 men and women 40 years of age and older in
1990. Researchers plan to follow this cohort on a regular basis.
A special type of cohort study is the study of identical twins, where the
confounding factor of genetic variation – between people exposed and not exposed
to a particular factor – can be eliminated. Such studies have provided strong
evidence for a variety of cause-effect relationships for chronic diseases. The
Swedish twin registry is a good example of the type of data source that can be used
to answer many epidemiological questions.
Types of Follow-up Studies
About 5000 subjects between the ages of 30-62 who were free of CV disease were
followed up every two years for CV disease from 1948 by daily surveillance of the
only hospital in Framingham
Several risk factors were studied (high BP, diabetes, physical activity, smoking,
alcohol)
Types of Follow-up Studies
Prospective Follow-up
• Measures
RR = Risk Ratio
Interpretation
The RR is the number of times greater the risk of disease in the exposed,
compared to the unexposed.
Lost to follow-up
èColon Cancer Cases
uDeath due to myocardial infarction
2 colon cancer cases diagnosed
2 cases/25 person-years or 8 cases/100 Person years
Relative Risk (incidence rate ratio) Computed Using Person-years of Follow-
up As the Denominator
Example:
A gaseous ingredient that is included in the production of a specific type of basket
is suspected of causing chronic respiratory disease in basket workers. You identify
26,811 basket workers who were exposed to this ingredient (ingredient Z) and
34,317 basket workers who did not make this type of basket and were not exposed
to the ingredient. A physical examination at start of employment confirmed that
none of these workers had chronic respiratory disease before they started working.
Time period of follow-up was from 1991-1995. The person-years at risk was
496004 years for the exposed workers and 62800 years for the unexposed
workers. There were 4412 newly diagnosed chronic respiratory diseases among
the exposed workers and 4232 among unexposed workers
Basket weavers exposed to ingredient Z are 1.33 times more likely to get
chronic respiratory disease than basket weavers unexposed to ingredient Z
Potential Biases
• Ascertainment bias (ascertainment of outcome or exposure different in
study groups)
• Information bias (non-comparable information on study subjects)
SMR = 100 The study group’s mortality rate is similar to that of the general population
SMR > 100 The study group’s mortality rate is higher than that of the general population
SMR < 100 The study group’s mortality rate is lower than than that of the general
population
Follow-up Study
• Advantages
Useful when exposure is rare
Bias is minimal
• Disadvantages
Confounding is a problem
Experimental epidemiology
Intervention or experimentation involves attempting to change a variable in one or
more groups of people. This could mean the elimination of a dietary factor thought
to cause allergy, or testing a new treatment on a selected group of patients. The
effects of an intervention are measured by comparing the outcome in the
experimental group with that in a control group. Since the interventions are strictly
determined by the study protocol, ethical considerations are of paramount
importance in the design of these studies. For example, no patient should be denied
appropriate treatment as a result of participation in an experiment, and the
treatment being tested must be acceptable in the light of current knowledge.
Informed consent from study participants is required in almost all circumstances.
An interventional study is usually designed as a randomized controlled trial, a field
trial, or a community trial.
Randomized controlled trials
A randomized controlled trial is an epidemiological experiment designed to study
the effects of a particular intervention, usually a treatment for a specific disease
(clinical trial). Subjects in the study population are randomly allocated to
intervention and control groups, and the results are assessed by comparing
outcomes. To ensure that the groups being compared are equivalent, patients are
allocated to them randomly, i.e. by chance. If the initial selection and
randomization is done properly, the control and treatment groups will be
comparable at the start of the investigation; any differences between groups are
chance occurrences unaffected by the conscious or unconscious biases of the
investigators.
Field trials
Field trials, in contrast to clinical trials, involve people who are healthy but
presumed to be at risk; data collection takes place “in the field,” usually among
non-institutionalized people in the general population (Figure 3.8). Since the
subjects are disease-free and the purpose is to prevent diseases that may occur with
relatively low frequency, field trials are often logistically complicated and
expensive endeavours.
One of the largest field trials was that testing the Salk vaccine for the prevention of
poliomyelitis, which involved over one million children.
Field trials can be used to evaluate interventions aimed at reducing exposure
without necessarily measuring the occurrence of health effects. For instance,
different protective methods for pesticide exposure have been tested in this way
and measurement of blood lead levels in children has shown the protection
provided by elimination of lead paint in the home environment. Such intervention
studies can be done on a smaller scale, and at lower cost, as they do not involve
lengthy follow-up or measurement of disease outcomes.
Community trials
In this form of experiment, the treatment groups are communities rather than
individuals. This is particularly appropriate for diseases that are influenced by
social conditions, and for which prevention efforts target group behaviour.
Cardiovascular disease is a good example of a condition appropriate for
community trials although unanticipated methodological issues can arise in large
community intervention trials (Box 3.6).
Random error
Random error is when a value of the sample measurement diverges – due to chance
alone – from that of the true population value. Random error causes inaccurate
measures of association. There are three major sources of random error:
- individual biological variation;
- sampling error; and
- measurement error.
Random error can never be completely eliminated since we can study only a
sample of the population. Sampling error is usually caused by the fact that a small
sample is not representative of all the population’s variables. The best way to
reduce sampling error is to increase the size of the study. Individual variation
always occurs and no measurement is perfectly accurate. Measurement error can
be reduced by stringent protocols, and by making individual measurements as
precise as possible. Investigators need to understand the measurement methods
being used in the study, and the errors that these methods can cause. Ideally,
laboratories should be able to document the accuracy and precision of their
measurements by systematic quality control procedures.
Sample size
The sample size must be large enough for the study to have sufficient statistical
power to detect the differences deemed important. Sample size calculations can be
done with standard formulae. The following information is needed before the
calculation can be done:
- required level of statistical significance of the ability to detect a difference
- acceptable error, or chance of missing a real effect
- magnitude of the effect under investigation
- amount of disease in the population
- relative sizes of the groups being compared.
In reality, sample size is often determined by logistic and financial considerations,
and a compromise always has to be made between sample size and costs. A
practical guide to determining sample size in health studies has been published by
WHO. The precision of a study can also be improved by ensuring that the groups
are of appropriate relative size. This is often an issue of concern in case-control
studies when a decision is required on the number of controls to be chosen for each
case. It is not possible to be definitive about the ideal ratio of controls to cases,
since this depends on the relative costs of accumulating cases and controls. If cases
are scarce and controls plentiful, it is appropriate to increase the ratio of controls to
cases. For example, in the case-control study of the effects of thalidomide, 46
affected children were compared with 300 normal children. In general, however,
there may be little point in having more than four controls for each case. It is
important to ensure that there is sufficient similarity between cases and controls
when the data are to be analysed by, for example, age group or social class; if most
cases and only a few controls were in the older age groups, the study would not be
able to account for the confounding factor of age.
Systematic error
Systematic error (or bias) occurs in epidemiology when results differ in a
systematic manner from the true values. A study with a small systematic error is
said to have a high accuracy. Accuracy is not affected by sample size. The possible
sources of systematic error in epidemiology are many and varied; over 30 specific
types of bias have been identified. The principal biases are:
- selection bias
- measurement (or classification) bias.
Selection bias
Selection bias occurs when there is a systematic difference between the
characteristics of the people selected for a study and the characteristics of those
who are not. An obvious source of selection bias occurs when participants select
themselves for a study, either because they are unwell or because they are
particularly worried about an exposure. It is well known, for example, that people
who respond to an invitation to participate in a study on the effects of smoking
differ in their smoking habits from non-responders; the latter are usually heavier
smokers. In studies of children’s health, where parental cooperation is required,
selection bias may also occur. In a cohort study of newborn children,23 successful
12-month follow-up varied according to income level of the parents. If individuals
entering or remaining in a study have different characteristics from those who are
not selected initially, or who drop out before completion, the result is a biased
estimate of the association between exposure and outcome.
An important selection bias is introduced when the disease or factor under
investigation itself makes people unavailable for study. For example, in a factory
where workers are exposed to formaldehyde, those who suffer most from eye
irritation are most likely to leave their jobs. The remaining workers are less
affected and a prevalence study of the association between formaldehyde exposure
and eye irritation that is done only in the workplace may be very misleading.
In such occupational epidemiology studies this important selection bias is called
the healthy worker effect. Workers have to be healthy enough to perform their
duties; the severely ill and disabled are usually excluded from employment.
Similarly, if a study is based on examinations done in a health centre and there is
no follow-up of participants who do not turn up, biased results may be produced:
unwell patients may be in bed either at home or in hospital. All epidemiological
study designs need to account for selection bias.
Measurement bias
Measurement bias occurs when the individual measurements or classifications of
disease or exposure are inaccurate – that is, they do not measure correctly what
they are supposed to measure. There are many sources of measurement bias and
their effects are of varying importance. For instance, biochemical or physiological
measurements are never completely accurate and different laboratories often
produce different results on the same specimen. If specimens from the exposed and
control groups are analysed randomly by different laboratories, there is less chance
for systematic measurement bias than in the situation where all specimens from the
exposed group are analysed in one laboratory and all those from the control group
are analysed in another.
A form of measurement bias of particular importance in retrospective casecontrol
studies is known as recall bias. This occurs when there is a differential recall of
information by cases and controls; for instance, cases may be more likely to recall
past exposure, especially if it is widely known to be associated with the disease
under study – for example, lack of exercise and heart disease. Recall bias can
either exaggerate the degree of effect associated with the exposure – as with people
affected by heart disease being more likely to admit to a past lack of exercise – or
underestimate it – if cases are more likely than controls to deny past exposure.
If measurement bias occurs equally in the groups being compared, it almost always
results in an underestimate of the true strength of the relationship. Such non-
differential bias may account for apparent discrepancies in the results of different
epidemiological studies.
If the investigator, laboratory technician or the participant knows the exposure
status, this knowledge can influence measurements and cause observer bias . To
avoid this bias, measurements can be made in a blind or double-blind fashion. A
blind study means that the investigators do not know how participants are
classified. A double-blind study means that neither the investigators, nor the
participants, know how the latter are classified.
Confounding
Confounding is another major issue in epidemiological studies. In a study of the
association between exposure to a cause (or risk factor) and the occurrence of
disease, confounding can occur when another exposure exists in the study
population and is associated both with the disease and the exposure being studied.
A problem arises if this extraneous factor – itself a determinant or risk factor for
the health outcome – is unequally distributed between the exposure subgroups.
Confounding occurs when the effects of two exposures (risk factors) have not been
separated and the analysis concludes that the effect is due to one variable rather
than the other. To be a confounding factor, two conditions must be met (Figure
3.10).
Randomization
In experimental studies, randomization is the ideal method for ensuring that
potential confounding variables are equally distributed among the groups being
compared. The sample sizes have to be sufficiently large to avoid random
maldistribution of such variables. Randomization avoids the association between
potentially confounding variables and the exposure that is being considered.
Restriction
One way to control confounding is to limit the study to people who have particular
characteristics. For example, in a study on the effects of coffee on coronary heart
disease, participation in the study could be restricted to nonsmokers, thus removing
any potential effect of confounding by cigarette smoking.
Matching
Matching is used to control confounding by selecting study participants so as to
ensure that potential confounding variables are evenly distributed in the two
groups being compared. For example, in a case-control study of exercise and
coronary heart disease, each patient with heart disease can be matched with a
control of the same age group and sex to ensure that confounding by age and sex
does not occur. Matching has been used extensively in case-control studies but it
can lead to problems in the selection of controls if the matching criteria are too
strict or too numerous; this is called overmatching.
Matching can be expensive and time-consuming, but is particularly useful if the
danger exists of there being no overlap between cases and controls, such as in a
situation where the cases are likely to be older than the controls.
Validity
Validity is an expression of the degree to which a test is capable of measuring what
it is intended to measure. A study is valid if its results correspond to the truth; there
should be no systematic error and the random error should be as small as possible.
Figure 3.11 indicates the relationship between the true value and measured values
for low and high validity and reliability. With low reliability but high validity the
measured values are spread out, but the mean of the measured values is close to the
true value. On the other hand, a high reliability (or repeatability) of the
measurements does not ensure validity since they may all be far from the true
value. There are two types of validity: internal and external.
Internal validity
Internal validity is the degree to which the results of an observation are correct for
the particular group of people being studied. For example, measurements of blood
haemoglobin must distinguish accurately participants with anaemia as defined in
the study. Analysis of the blood in a different laboratory may produce different
results because of systematic error, but the evaluation of associations with
anaemia, as measured by one laboratory, may still be internally valid.
For a study to be of any use it must be internally valid, although even a study that
is perfectly valid internally may be of no consequence because the results can not
be compared with other studies. Internal validity can be threatened by all sources
of systematic error but can be improved by good design and attention to detail.
External validity
External validity or generalizability is the extent to which the results of a study
apply to people not in it (or, for example, to laboratories not involved in it).
Internal validity is necessary for, but does not guarantee, external validity, and is
easier to achieve. External validity requires external quality control of the
measurements and judgements about the degree to which the results of a study can
be extrapolated. This does not require that the study sample be representative of a
reference population. For example, evidence that the effect of lowering blood
cholesterol in men is also relevant to women requires a judgment about the
external validity of studies in men. External validity is assisted by study designs
that examine clearly-stated hypotheses in well defined populations. The external
validity of a study is supported if similar results are found in studies in other
populations.
Ethical issues
Ethical issues are those involving actions and policies that are right or wrong, fair
or unfair, just or unjust. Ethical dilemmas arise frequently in the practice of
epidemiology and ethical principles govern the conduct of epidemiology, as they
do all human activities. Research and monitoring are essential to ensure that public
health interventions do not have serious unintended and harmful consequences as
occurred in Bangladesh following the installation of wells (Box 3.8).
Informed consent
Free and voluntary informed consent must be obtained from participants in
epidemiological studies and they must retain the right to withdraw at any time.
However, it may prove impracticable for informed consent to be given for access
to routine medical records. In such cases, as is the norm in any other research
study, epidemiologists must respect personal privacy and confidentiality at all
times. They have an obligation to tell communities what they are doing and why,
and to transmit the results of studies, and their significance, to the communities
involved. All proposals for epidemiological studies should be submitted to
properly constituted institutional ethics committees before the research work
begins.
Confidentiality
Epidemiologists also have an obligation to preserve confidentiality of information
they obtain through their studies. This also extends to the right of a person to
withhold information from others. As information in medical records, case
registers, and other data files and databases are generally confidential,
epidemiologists are required to obtain permission before being given access to
these data.
Scientific integrity
All scientists have the potential to behave in an unethical manner, perhaps in part
because of the pressure to succeed. Epidemiologists are not immune to unethical
behaviour. Examples include research results apparently influenced by conflict of
interests and the publication of fabricated data. Minimization of unethical
behaviour requires vigilance on the part of ethical review committees and close
attention to peer review of publications. The training and mentoring of
epidemiologists must include serious and repeated discussion of these issues.