0% found this document useful (0 votes)
81 views

Topic 4a

Uploaded by

Anya Lux
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views

Topic 4a

Uploaded by

Anya Lux
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 28

Department of the Public Health

Social Medicine

Practical lesson №6 for 4th year students of general medicine department

THEME: Study Designs. Analytical Studies

HOURS: 2
PLACE: Department of Public Health
1. Relevance of the topic:
Epidemiology is the study of the distribution and determinants of disease
frequency in human populations and the application of this study to control health
problems. The term study includes both surveillance, whose purpose is to monitor
aspects of disease occurrence and spread that are pertinent to effective control, and
epidemiologic research, whose goal is to harvest valid and precise information
about the causes, preventions, and treatments for disease. The term disease refers
to a broad array of health-related states and events including diseases, injuries,
disabilities, and death.
Epidemiologic research encompasses several types of study designs, including
experimental studies and observational studies such as cohort and case–control
studies. Each type of epidemiologic study design simply represents a different way
of harvesting information. The selection of one design over another depends on the
particular research question, concerns about validity and efficiency, and practical
and ethical considerations. For example, experimental studies, also known as trials,
investigate the role of some factor or agent in the prevention or treatment of a
disease. In this type of study, the investigator assigns individuals to two or more
groups that either receive or do not receive the preventive or therapeutic agent.
Because experimental studies closely resemble controlled laboratory
investigations, they are thought to produce the most scientifically rigorous data of
all the designs.

2. The aim of study.


1) to know:
- types of study design;
- advantages and disadvantages of observational studies
2) to be able to:
 Distinguish between experimental and observational studies.
 Describe the key characteristics of experimental, cohort, case–control, cross-
sectional, and ecologic studies regarding subject selection, data collection,
and analysis.
 Identify the design of a particular study.
 Discuss the factors that determine when a particular design is indicated.

3. Materials for independent work

Cross-sectional studies
Cross-sectional studies measure the prevalence of disease and thus are often called
prevalence studies. In a cross-sectional study the measurements of exposure and
effect are made at the same time. It is not easy to assess the reasons for
associations shown in cross-sectional studies. The key question to be asked is
whether the exposure precedes or follows the effect. If the exposure data are
known to represent exposure before any effect occurred, the data from a cross-
sectional study can be treated like data generated from a cohort study.
Cross-sectional studies are relatively easy and inexpensive to conduct and are
useful for investigating exposures that are fixed characteristics of individuals, such
as ethnicity or blood group. In sudden outbreaks of disease, a cross-sectional study
to measure several exposures can be the most convenient first step in investigating
the cause.
Data from cross-sectional studies are helpful in assessing the health care needs of
populations. Data from repeated cross-sectional surveys using independent random
samples with standardized definitions and survey methods provide useful
indications of trends. Each survey should have a clear purpose. Valid surveys need
well-designed questionnaires, an appropriate sample of sufficient size, and a good
response rate.
Many countries conduct regular cross-sectional surveys on representative samples
of their populations focusing on personal and demographic characteristics,
illnesses and health-related habits. Frequency of disease and risk factors can then
be examined in relation to age, sex and ethnicity. Cross-sectional studies of risk
factors for chronic diseases have been done in a wide range of countries/

WHO Global InfoBase: an online tool


The WHO Global InfoBase (http://infobase.who.int) is a data warehouse that
collects, stores and displays information on chronic diseases and their risk factors
(overweight/ obesity, blood pressure, cholesterol, alcohol, tobacco use,
fruit/vegetable intake, physical inactivity, diabetes) for 186 countries. The
InfoBase was initiated in 2002 to improve the access of health professionals and
researchers to country-reported chronic disease risk factor data. It has the
advantage of providing traceable sources and full survey methodology. The
following options are available online:
- compare countries using WHO estimates for certain risk factors
- make country profiles showing the most recent most nationally-representative
data
- use a survey search tool for all country data on particular risk factors
Case-control studies
Case-control studies provide a relatively simple way to investigate causes of
diseases, especially rare diseases. They include people with a disease (or other
outcome variable) of interest and a suitable control (comparison or reference)
group of people unaffected by the disease or outcome variable. The study
compares the occurrence of the possible cause in cases and in controls. The
investigators collect data on disease occurrence at one point in time and exposures
at a previous point in time. Case-control studies are longitudinal, in contrast to
cross-sectional studies (Figure 3.5). Case-control studies have been called
retrospective studies since the investigator is looking backward from the disease to
a possible cause. This can be confusing because the terms retrospective and
prospective are also used to describe the timing of data collection in relation to the
current date. In this sense a case-control study may be either retrospective, when
all the data deal with the past, or prospective, in which data collection continues
with the passage of time.

Method:
• Select persons with a given disease (CASES) and persons without a
given disease (CONTROLS)
• Various exposures are assessed as possible risk factors for the given
disease
• Good for evaluating potential risk factors for a disease

Schematic representation

Selection of Cases
Source:
• Hospitals
• Clinic patients
• Disease registries
• Tumor registries

Type:
Incident (newly diagnosed) cases
• Disadvantage - wait for new cases

Prevalent (existing disease) cases


• Advantage - larger number of cases available

• Disadvantage - risk factors identified may be related to survival of the


disease rather than the development of the disease

Representativeness of cases
• Should cases be representative of all persons with the disease? NO
• For example, you may choose to study prostate cancer among men >55
years rather than all age groups – of course, your study results will be
applicable only to men > 55 years of age

Selection of cases and controls


A case-control study begins with the selection of cases; these cases should
represent all the cases in a specified population group. Cases are selected on the
basis of disease, not exposure. Controls are people without the disease. A critical
and challenging aspect of population-based case control studies is finding a cost-
effective way to identify and enroll control subjects.9 The most difficult task is to
select controls so as to sample the exposure prevalence in the population that
generated the cases.
Furthermore, the choice of controls and cases must not be influenced by exposure
status, which should be determined in the same manner for both. It is not necessary
for cases and controls to be all-inclusive; in fact they can be restricted to any
specified subgroup, such as elderly people, males or females.
The controls should represent people who would have been designated study cases
if they had developed the disease. Ideally, case-control studies use new (incident)
cases to avoid the difficulty of separating factors related to causation and survival
(or recovery), although studies have often been conducted using prevalence data
(for example, case-control studies of congenital malformations). Case control
studies can estimate relative risk of disease, but they can not determine the
absolute incidence of disease.
Controls should be similar to cases except for disease under study.

Exposure
An important aspect of case-control studies is the determination of the start and
duration of exposure for cases and controls. In the case-control design, the
exposure status of the cases is usually determined after the development of the
disease (retrospective data) and usually by direct questioning of the affected person
or a relative or friend. The informant’s answers may be influenced by knowledge
about the hypothesis under investigation or the disease experience itself.
An example of the use of a case-control study design is shown in Table 3.2.
Researchers in Papua New Guinea compared the history of meat consumption in
people who had enteritis necroticans, with people who did not have the disease.
Proportionately more people who had the disease (50 of 61 cases) reported prior
meat consumption than those who were not affected (16 of 57).
Exposure is sometimes determined by biochemical measurements (e.g. lead in
blood or cadmium in urine), which may not accurately reflect the relevant past
exposure.
For example, lead in blood at age 6 years is not a good indicator of exposure at age
1 to 2 years, which is the age of greatest sensitivity to lead. This problem can be
avoided if exposure can be estimated from an established recording system (e.g.
stored results of routine blood testing or employment records) or if the case-control
study is carried out prospectively so that exposure data are collected before the
disease develops.
Thalidomide
A classic example of a case-control study was the discovery of the relationship
between thalidomide and limb defects in babies born in the Federal Republic of
Germany in 1959 and 1960. The study, done in 1961, compared affected children
with normal children. Of 46 mothers whose babies had malformations, 41 had been
given thalidomide between the fourth and ninth weeks of pregnancy, whereas none
of the 300 control mothers, whose children were normal, had taken the drug during
pregnancy.10 Accurate timing of the drug intake was crucial for determining
relevant exposure.

Odds ratio
The association of an exposure and a disease (relative risk) in a case-control study
is measured by calculating the odds ratio (OR), which is the ratio of the odds of
exposure among the cases to the odds of exposure among the controls. For the data
in Table 3.2, the odds ratio is given by:

This indicates that the cases were 11.6 times more likely than the controls to have
recently eaten meat.
The odds ratio is very similar to the risk ratio, particularly if a disease is rare. For
the odds ratio to be a good approximation, the cases and controls must be
representative of the general population with respect to exposure.
However, because the incidence of disease is unknown, the absolute risk can not be
calculated. An odds ratio should be accompanied by the confidence interval
observed around the point estimate.

Calculation of the OR in Case-control Studies


Analysis
 Simple analysis (2 X 2 tables)

Crude Odds ratio= (a/c)÷(b/d) = ad/bc


 Multivariable analysis – logistic regression

Case-Control Studies
Advantages
• Well suited for rare diseases
• Sampling from source population
• Well suited for studying diseases with long latency periods (such as cancer)
• Time period shorter than a follow-up study
• Less expensive than a follow-up study
• Can be prospective or retrospective
Disadvantages:
• Difficult to establish a temporal relationship between exposure and disease
(Ai.e., exposure preceded disease)

• Prone to many sources of bias


• Difficult to find an appropriate control group

Cohort studies
Cohort studies, also called follow-up or incidence studies, begin with a group of
people who are free of disease, and who are classified into subgroups according to
exposure to a potential cause of disease or outcome (Figure 3.6). Variables of
interest are specified and measured and the whole cohort is followed up to see how
the subsequent development of new cases of the disease (or other outcome) differs
between the groups with and without exposure. Because the data on exposure and
disease refer to different points in time, cohort studies are longitudinal, like
casecontrol studies.

Cohort studies have been called prospective studies, but this terminology is
confusing and should be avoided. As mentioned previously, the term “prospective”
refers to the timing of data collection and not to the relationship between exposure
and effect. Thus there can be both prospective and retrospective cohort studies.
Cohort studies provide the best information about the causation of disease and the
most direct measurement of the risk of developing disease. Although conceptually
simple, cohort studies are major undertakings and may require long periods of
follow-up since disease may occur a long time after exposure. For example, the
induction period for leukaemia or thyroid cancer caused by radiation
(i.e. the time required for the specific cause to produce an outcome) is many years
and it is necessary to follow up study participants for a long time. Many exposures
investigated are long-term in nature and accurate information about them requires
data collection over long periods. However, in the case of tobacco use, many
people have relatively stable habits and information about past and current
exposure can be collected at the time the cohort is defined.
In situations with sudden acute exposures, the causeeffect relationship for acute
effects may be obvious, but cohort studies are also used to investigate late or
chronic effects.
As cohort studies start with exposed and unexposed people, the difficulty of
measuring or finding existing data on individual exposures largely determines the
feasibility of doing one of these studies. If the disease is rare in the exposed group
as well as the unexposed group there may also be problems in obtaining a large
enough study group.
Late effects of poisoning: Bhopal
An example of measuring effects over a long time period is the catastrophic
poisoning of residents around a pesticide factory in Bhopal, India, in 1984.12 An
intermediate chemical in the production process, methyl isocyanate, leaked from a
tank and the fumes drifted into surrounding residential areas, exposing half a
million people to the gas. 20 000 people died as a result of this exposure. In
addition, 120 000 people still suffer health effects caused by the crash and
subsequent pollution. The acute effects were easily studied with a cross-sectional
design. More subtle chronic effects and those developing only after a long latency
period are still being studied using cohort study designs.

The expense of a cohort study can be reduced by using routine sources information
about mortality or morbidity, such as disease registers or national registers of
deaths as part of the follow-up. One example is the Nurses Health Study. Since
cohort studies take healthy people as their starting- point, it is possible to examine
a range of outcomes (in contrast to what can be achieved in case-control studies).
For example, the Framingham study – a cohort study that began in 1948 – has
investigated the risk factors for a wide range of diseases, including cardiovascular
and respiratory diseases and musculoskeletal disorders.
Similar large-scale cohort studies have been started in China. Baseline
demographic characteristics, medical histories, and major cardiovascular risk
factors including blood pressure and body weight were obtained from a
representative sample of 169 871 men and women 40 years of age and older in
1990. Researchers plan to follow this cohort on a regular basis.
A special type of cohort study is the study of identical twins, where the
confounding factor of genetic variation – between people exposed and not exposed
to a particular factor – can be eliminated. Such studies have provided strong
evidence for a variety of cause-effect relationships for chronic diseases. The
Swedish twin registry is a good example of the type of data source that can be used
to answer many epidemiological questions.
Types of Follow-up Studies

Example of a Prospective Follow-up Study


• Framingham study of cardiovascular disease, study start year (1948)

About 5000 subjects between the ages of 30-62 who were free of CV disease were
followed up every two years for CV disease from 1948 by daily surveillance of the
only hospital in Framingham
Several risk factors were studied (high BP, diabetes, physical activity, smoking,
alcohol)
Types of Follow-up Studies

Example of a Retrospective Follow-up Study (Study Start Year 2001)


• A historical cohort of 14,000 workers employed during 1951-1999 in a
herbicide manufacturing factory was identified in 2000. Follow-up was
conducted for the years 1951-1999, during which time 1,320 males and 400
females died. Mortality comparisons were made on cause-specific basis by
comparing the observed numbers of deaths among cohort members to the
expected numbers of deaths based on the US general population during the
years of follow-up.

Nurses’ Health Study


Although cost is a major factor in large cohort studies, methods have been
developed to make them less expensive to run. In 1976, 121 700 married female
nurses aged 30–55 years completed the initial Nurses’ Health Survey
questionnaire. Every two years, self-administered questionnaires were sent to these
nurses, who supplied information on their health behaviours and reproductive and
medical histories. The initial cohort was enrolled with the objective of evaluating
the health effects of oral contraceptive use. Investigators tested their methods on
small subgroups of the larger cohort, and obtained information on disease
outcomes from routine data sources. In addition to studying the relationship
between oral contraceptive use and the risk of ovarian and breast cancer, they were
also able to evaluate other diseases in this cohort – such as heart disease and
stroke, and the relationship between smoking and the risk of stroke; as shown in
Table 2.3. Although stroke is a relatively common cause of death, it is a rare
occurrence in younger women, and so a large cohort is necessary

Historical cohort studies


Costs can occasionally be reduced by using a historical cohort (identified on the
basis of records of previous exposure). This type of investigation is called a
historical cohort study, because all the exposure and effect (disease) data have been
collected before the actual study begins. For example, records of military personnel
exposure to radioactive fall-out at nuclear bomb testing sites have been used to
examine the possible causal role of fall-out in the development of cancer over the
past 30 years. This sort of design is relatively common for studies of cancer related
to occupational exposures.

Nested case-control studies


The nested case-control design makes cohort studies less expensive. The cases and
controls are both chosen from a defined cohort, for which some information on
exposures and risk factors is already available (Figure 3.7). Additional information
on new cases and controls, particularly selected for the study, is collected and
analysed.
This design is particularly useful when measurement of exposure is expensive. An
example of a nested case control study is shown in Box 3.5.

Box 3.5. Nested case-control study of gastric cancer


To determine if infection with Helicobacter pylori was associated with gastric
cancer, investigators used a cohort of 128 992 people that had been established in
the mid-1960s. By 1991, 186 people in the original cohort had developed gastric
cancer. The investigators then did a nested case-control study by selecting the 186
people with gastric cancer as cases and another 186 cancer-free individuals from
the same cohort as controls. H. pylori infection status was determined
retrospectively from serum samples that had been stored since the 1960s. 84% of
people with gastric cancer – and only 61% of the controls – had been infected
previously with H. pylori, suggesting a positive association between H. pylori
infection and gastric cancer risk.
Summary of epidemiological studies
Table 3.3 summarizes the applications of different observational studies and Table
3.4 outlines the advantages, disadvantages and the likelihood of errors.

Prospective Follow-up
• Measures

 Cumulative incidence (risk)

 Incidence rate (also known as incidence density)

Cumulative incidence in a follow-up study Study:


the cumulative incidence of the disease in the exposed = a / (a+b)
the cumulative incidence of the disease in the unexposed = c / (c+d)
To measure the association between the
exposure and the disease, compare
RR= CIE = Cumulative incidence in the exposed
CIE Cumulative incidence in the unexposed

RR = Risk Ratio

Interpretation
The RR is the number of times greater the risk of disease in the exposed,
compared to the unexposed.

Risk Ratio = (90/1000) ÷ (20/2000) = 9.0


Smokers are 9 times more likely to get lung cancer than non-smokers.
Calculation of person-years for a hypothetical incidence rate for colon cancer,
1991 – 2000

Lost to follow-up
èColon Cancer Cases
uDeath due to myocardial infarction
2 colon cancer cases diagnosed
2 cases/25 person-years or 8 cases/100 Person years
Relative Risk (incidence rate ratio) Computed Using Person-years of Follow-
up As the Denominator

Exposed Unexposed Total


Cases a b a+b
Person- PYe PYē PYe +
years PYē
Incidence Rate in exposed =a/PYe
Incidence in unexposed = b/ PYē
Incidence rate ratio (IRR) = (a/PYe) ÷ (b/ Pyē)

Example:
A gaseous ingredient that is included in the production of a specific type of basket
is suspected of causing chronic respiratory disease in basket workers. You identify
26,811 basket workers who were exposed to this ingredient (ingredient Z) and
34,317 basket workers who did not make this type of basket and were not exposed
to the ingredient. A physical examination at start of employment confirmed that
none of these workers had chronic respiratory disease before they started working.
Time period of follow-up was from 1991-1995. The person-years at risk was
496004 years for the exposed workers and 62800 years for the unexposed
workers. There were 4412 newly diagnosed chronic respiratory diseases among
the exposed workers and 4232 among unexposed workers

Table. Chronic Respiratory Disease Among Basket Workers


Exposed Unexposed

Respiratory Disease 4412 4232

Person-years 496004 628000

IRR = 4412 / 496004 = 0.0089 = 1.33


4232 / 628000 0.0067

Basket weavers exposed to ingredient Z are 1.33 times more likely to get
chronic respiratory disease than basket weavers unexposed to ingredient Z
Potential Biases
• Ascertainment bias (ascertainment of outcome or exposure different in
study groups)
• Information bias (non-comparable information on study subjects)

• Non response and losses to follow-up

Retrospective Follow-up Study


Measure of association
• Standardized Mortality Ratio (SMR) – when outcome is mortality

(Obs # of deaths/ Exp # of deaths) * 100


• Standardized Incidence Ratio (SIR) –

when outcome is morbidity


(Obs # of cases/ Exp # of cases) * 100
SMR = # Observed Deaths x 100
# Expected Deaths
expressed as a percentage
SIR = # Observed Cases x 100
# Expected Cases
expressed as a percentage
Numerator: Number of deaths (cases) in the study group
Denominator: Number of deaths (cases) that would have occurred in the study
group if the study group had experienced the mortality or disease incidence as the
general population
SMR = # Observed Deaths x 100
# Expected Deaths
expressed as a percentage

SMR = 100 The study group’s mortality rate is similar to that of the general population

SMR > 100 The study group’s mortality rate is higher than that of the general population

SMR < 100 The study group’s mortality rate is lower than than that of the general
population

Example of Results of Retrospective Follow-up Study


Men
Cause of death Obs Exp SMR (95% CI)
All causes 1339 1361.8 98 (91-99)
All cancer 251 220.1 114 (98-120)
Lung cancer 151 139.5 108 (99-115)
Circulatory
disease 900 950.3 95 (89-97)

Follow-up Study
• Advantages
 Useful when exposure is rare

 Entire source population studied

 Can examine multiple effects of a single exposure

 Exposure occurs before disease can be evaluated

 Bias is minimal

 Direct measurement of incidence of disease in the exposed and


unexposed populations

 Can be prospective or retrospective

• Disadvantages

 Not good for rare disease

 Prospective studies are expensive and time consuming

 Retrospective – require adequate records

 Confounding is a problem

 Validity affected by loss to follow-up and/or nonparticipation

Justification of a Follow-up Study


• Apriori info that there is an association between a certain exposure and
disease

• Interval between exposure and disease is short (e.g., pregnancy


outcomes)

Experimental epidemiology
Intervention or experimentation involves attempting to change a variable in one or
more groups of people. This could mean the elimination of a dietary factor thought
to cause allergy, or testing a new treatment on a selected group of patients. The
effects of an intervention are measured by comparing the outcome in the
experimental group with that in a control group. Since the interventions are strictly
determined by the study protocol, ethical considerations are of paramount
importance in the design of these studies. For example, no patient should be denied
appropriate treatment as a result of participation in an experiment, and the
treatment being tested must be acceptable in the light of current knowledge.
Informed consent from study participants is required in almost all circumstances.
An interventional study is usually designed as a randomized controlled trial, a field
trial, or a community trial.
Randomized controlled trials
A randomized controlled trial is an epidemiological experiment designed to study
the effects of a particular intervention, usually a treatment for a specific disease
(clinical trial). Subjects in the study population are randomly allocated to
intervention and control groups, and the results are assessed by comparing
outcomes. To ensure that the groups being compared are equivalent, patients are
allocated to them randomly, i.e. by chance. If the initial selection and
randomization is done properly, the control and treatment groups will be
comparable at the start of the investigation; any differences between groups are
chance occurrences unaffected by the conscious or unconscious biases of the
investigators.

Field trials
Field trials, in contrast to clinical trials, involve people who are healthy but
presumed to be at risk; data collection takes place “in the field,” usually among
non-institutionalized people in the general population (Figure 3.8). Since the
subjects are disease-free and the purpose is to prevent diseases that may occur with
relatively low frequency, field trials are often logistically complicated and
expensive endeavours.
One of the largest field trials was that testing the Salk vaccine for the prevention of
poliomyelitis, which involved over one million children.
Field trials can be used to evaluate interventions aimed at reducing exposure
without necessarily measuring the occurrence of health effects. For instance,
different protective methods for pesticide exposure have been tested in this way
and measurement of blood lead levels in children has shown the protection
provided by elimination of lead paint in the home environment. Such intervention
studies can be done on a smaller scale, and at lower cost, as they do not involve
lengthy follow-up or measurement of disease outcomes.
Community trials
In this form of experiment, the treatment groups are communities rather than
individuals. This is particularly appropriate for diseases that are influenced by
social conditions, and for which prevention efforts target group behaviour.
Cardiovascular disease is a good example of a condition appropriate for
community trials although unanticipated methodological issues can arise in large
community intervention trials (Box 3.6).

Box 3.6. Stanford Five-City Community Intervention Trial


The Stanford Five-City Project started in 1978 as one of several community
intervention studies designed to lower population risk of cardiovascular disease.
Researchers believed that the community approach was the best way to address the
large compounded risk of mild elevations of multiple risk factors and the
interrelation of several health behaviours. Although some components of the
intervention proved effective when evaluated individually (for example, efficiency
of the mass media and other community-wide programs), large, favourable
changes in risk factor also occurred in the control sites. Part of the problem was
related to design limitations. Internal validity was compromised by the fact that
only a few intervention units could be studied in sufficient detail. Researchers also
noted the need to improve educational interventions and expand the environmental
and health policy components of health promotion.

Limitations of community trials


A limitation of such studies is that only a small number of communities can be
included and random allocation of communities is usually not practicable; other
methods are required to ensure that any differences found at the end of the study
can be attributed to the intervention rather than to inherent differences between
communities. Furthermore, it is difficult to isolate the communities where
intervention is taking place from general social changes that may be occurring.
Design limitations, especially in the face of unexpectedly large, favourable risk
factor changes in control sites, are difficult to overcome. As a result, definitive
conclusions about the overall effectiveness of the community-wide efforts are not
always possible.
Figure 3.9 shows a community trial of a tuberculosis outreach programme in rural
Ethiopia. 32 communities – with a combined population of 350 000 people – were
randomly allocated to intervention and control groups. The study showed that
community outreach improved the speed of case-finding (more cases identified in
the first 3 months) although the treatment outcome remained the same at 12
months.

Potential errors in epidemiological studies


Epidemiological investigations aim to provide accurate measures of disease
occurrence (or other outcomes). However, there are many possibilities for errors in
measurement. Epidemiologists devote much attention to minimizing errors and as-
sessing the impact of errors that can not be eliminated. Sources of error can be
random or systematic.

Random error
Random error is when a value of the sample measurement diverges – due to chance
alone – from that of the true population value. Random error causes inaccurate
measures of association. There are three major sources of random error:
- individual biological variation;
- sampling error; and
- measurement error.
Random error can never be completely eliminated since we can study only a
sample of the population. Sampling error is usually caused by the fact that a small
sample is not representative of all the population’s variables. The best way to
reduce sampling error is to increase the size of the study. Individual variation
always occurs and no measurement is perfectly accurate. Measurement error can
be reduced by stringent protocols, and by making individual measurements as
precise as possible. Investigators need to understand the measurement methods
being used in the study, and the errors that these methods can cause. Ideally,
laboratories should be able to document the accuracy and precision of their
measurements by systematic quality control procedures.

Sample size
The sample size must be large enough for the study to have sufficient statistical
power to detect the differences deemed important. Sample size calculations can be
done with standard formulae. The following information is needed before the
calculation can be done:
- required level of statistical significance of the ability to detect a difference
- acceptable error, or chance of missing a real effect
- magnitude of the effect under investigation
- amount of disease in the population
- relative sizes of the groups being compared.
In reality, sample size is often determined by logistic and financial considerations,
and a compromise always has to be made between sample size and costs. A
practical guide to determining sample size in health studies has been published by
WHO. The precision of a study can also be improved by ensuring that the groups
are of appropriate relative size. This is often an issue of concern in case-control
studies when a decision is required on the number of controls to be chosen for each
case. It is not possible to be definitive about the ideal ratio of controls to cases,
since this depends on the relative costs of accumulating cases and controls. If cases
are scarce and controls plentiful, it is appropriate to increase the ratio of controls to
cases. For example, in the case-control study of the effects of thalidomide, 46
affected children were compared with 300 normal children. In general, however,
there may be little point in having more than four controls for each case. It is
important to ensure that there is sufficient similarity between cases and controls
when the data are to be analysed by, for example, age group or social class; if most
cases and only a few controls were in the older age groups, the study would not be
able to account for the confounding factor of age.

Systematic error
Systematic error (or bias) occurs in epidemiology when results differ in a
systematic manner from the true values. A study with a small systematic error is
said to have a high accuracy. Accuracy is not affected by sample size. The possible
sources of systematic error in epidemiology are many and varied; over 30 specific
types of bias have been identified. The principal biases are:
- selection bias
- measurement (or classification) bias.

Selection bias
Selection bias occurs when there is a systematic difference between the
characteristics of the people selected for a study and the characteristics of those
who are not. An obvious source of selection bias occurs when participants select
themselves for a study, either because they are unwell or because they are
particularly worried about an exposure. It is well known, for example, that people
who respond to an invitation to participate in a study on the effects of smoking
differ in their smoking habits from non-responders; the latter are usually heavier
smokers. In studies of children’s health, where parental cooperation is required,
selection bias may also occur. In a cohort study of newborn children,23 successful
12-month follow-up varied according to income level of the parents. If individuals
entering or remaining in a study have different characteristics from those who are
not selected initially, or who drop out before completion, the result is a biased
estimate of the association between exposure and outcome.
An important selection bias is introduced when the disease or factor under
investigation itself makes people unavailable for study. For example, in a factory
where workers are exposed to formaldehyde, those who suffer most from eye
irritation are most likely to leave their jobs. The remaining workers are less
affected and a prevalence study of the association between formaldehyde exposure
and eye irritation that is done only in the workplace may be very misleading.
In such occupational epidemiology studies this important selection bias is called
the healthy worker effect. Workers have to be healthy enough to perform their
duties; the severely ill and disabled are usually excluded from employment.
Similarly, if a study is based on examinations done in a health centre and there is
no follow-up of participants who do not turn up, biased results may be produced:
unwell patients may be in bed either at home or in hospital. All epidemiological
study designs need to account for selection bias.

Measurement bias
Measurement bias occurs when the individual measurements or classifications of
disease or exposure are inaccurate – that is, they do not measure correctly what
they are supposed to measure. There are many sources of measurement bias and
their effects are of varying importance. For instance, biochemical or physiological
measurements are never completely accurate and different laboratories often
produce different results on the same specimen. If specimens from the exposed and
control groups are analysed randomly by different laboratories, there is less chance
for systematic measurement bias than in the situation where all specimens from the
exposed group are analysed in one laboratory and all those from the control group
are analysed in another.
A form of measurement bias of particular importance in retrospective casecontrol
studies is known as recall bias. This occurs when there is a differential recall of
information by cases and controls; for instance, cases may be more likely to recall
past exposure, especially if it is widely known to be associated with the disease
under study – for example, lack of exercise and heart disease. Recall bias can
either exaggerate the degree of effect associated with the exposure – as with people
affected by heart disease being more likely to admit to a past lack of exercise – or
underestimate it – if cases are more likely than controls to deny past exposure.
If measurement bias occurs equally in the groups being compared, it almost always
results in an underestimate of the true strength of the relationship. Such non-
differential bias may account for apparent discrepancies in the results of different
epidemiological studies.
If the investigator, laboratory technician or the participant knows the exposure
status, this knowledge can influence measurements and cause observer bias . To
avoid this bias, measurements can be made in a blind or double-blind fashion. A
blind study means that the investigators do not know how participants are
classified. A double-blind study means that neither the investigators, nor the
participants, know how the latter are classified.

Confounding
Confounding is another major issue in epidemiological studies. In a study of the
association between exposure to a cause (or risk factor) and the occurrence of
disease, confounding can occur when another exposure exists in the study
population and is associated both with the disease and the exposure being studied.
A problem arises if this extraneous factor – itself a determinant or risk factor for
the health outcome – is unequally distributed between the exposure subgroups.
Confounding occurs when the effects of two exposures (risk factors) have not been
separated and the analysis concludes that the effect is due to one variable rather
than the other. To be a confounding factor, two conditions must be met (Figure
3.10).

Confounding arises because non-random distribution of risk factors in the source


population also occurs in the study population thus providing misleading estimates
of effect (see Box 3.7).

Box 3.7. Confounding: difficult to control


The word “confounding” comes from the Latin confundere, meaning to mix
together. Confounding can have a very important influence, and may even change
the apparent direction of an association. A variable that appears to be protective
may, after control of confounding, be found to be harmful. The most common
concern about confounding is that it may create the appearance of a cause-effect
relationship that does not actually exist.
For a variable to be a confounder, it must, in its own right, be a determinant of the
occurrence of disease (i.e. a risk factor) and associated with the exposure under
investigation. Thus, in a study of radon exposure and lung cancer, smoking is not a
confounder if the smoking habits are identical in the radon-exposed and control
groups.
In this sense, it might appear to be a bias, but in fact it does not result from
systematic error in research design. Age and social class are often confounders in
epidemiological studies. An association between high blood pressure and coronary
heart disease may in truth represent concomitant changes in the two variables that
occur with increasing age; the potential confounding effect of age has to be
considered, and when this is done it is seen that high blood pressure indeed
increases the risk of coronary heart disease.
In the example in Figure 3.10, confounding may be the explanation for the
relationship demonstrated between coffee drinking and the risk of coronary heart
disease, since it is known that coffee consumption is associated with tobacco use:
people who drink coffee are more likely to smoke than people who do not drink
coffee.
It is also well known that cigarette smoking is a cause of coronary heart disease.
It is thus possible that the relationship between coffee drinking and coronary heart
disease merely reflects the known causal association of tobacco use and heart
disease. In this situation, smoking confounds the apparent relationship between
coffee consumption and coronary heart disease because smoking is correlated with
coffee drinking and is a risk factor even for those who do not drink coffee.

The control of confounding


Several methods are available to control confounding, either through study design
or during the analysis of results.
The methods commonly used to control confounding in the design of an
epidemiological study are:
- randomization
- restriction
- matching.
At the analysis stage, confounding can be controlled by:
- stratification
- statistical modeling.

Randomization
In experimental studies, randomization is the ideal method for ensuring that
potential confounding variables are equally distributed among the groups being
compared. The sample sizes have to be sufficiently large to avoid random
maldistribution of such variables. Randomization avoids the association between
potentially confounding variables and the exposure that is being considered.

Restriction
One way to control confounding is to limit the study to people who have particular
characteristics. For example, in a study on the effects of coffee on coronary heart
disease, participation in the study could be restricted to nonsmokers, thus removing
any potential effect of confounding by cigarette smoking.

Matching
Matching is used to control confounding by selecting study participants so as to
ensure that potential confounding variables are evenly distributed in the two
groups being compared. For example, in a case-control study of exercise and
coronary heart disease, each patient with heart disease can be matched with a
control of the same age group and sex to ensure that confounding by age and sex
does not occur. Matching has been used extensively in case-control studies but it
can lead to problems in the selection of controls if the matching criteria are too
strict or too numerous; this is called overmatching.
Matching can be expensive and time-consuming, but is particularly useful if the
danger exists of there being no overlap between cases and controls, such as in a
situation where the cases are likely to be older than the controls.

Stratification and statistical modelling


In large studies it is usually preferable to control for confounding in the analytical
phase rather than in the design phase. Confounding can then be controlled by
stratification, which involves the measurement of the strength of associations in
well defined and homogeneous categories (strata) of the confounding variable. If
age is a confounder, the association may be measured in, say, 10-year age groups;
if sex or ethnicity is a confounder, the association is measured separately in men
and women or in the different ethnic groups. Methods are available for
summarizing the overall association by producing a weighted average of the
estimates calculated in each separate stratum.
Although stratification is conceptually simple and relatively easy to carry out, it is
often limited by the size of the study and it can not help to control many factors
simultaneously, as is often necessary. In this situation, multivariate statistical
modelling is required to estimate the strength of the associations while controlling
for several confounding variables simultaneously; a range of statistical techniques
is available for these analyses.

Validity
Validity is an expression of the degree to which a test is capable of measuring what
it is intended to measure. A study is valid if its results correspond to the truth; there
should be no systematic error and the random error should be as small as possible.
Figure 3.11 indicates the relationship between the true value and measured values
for low and high validity and reliability. With low reliability but high validity the
measured values are spread out, but the mean of the measured values is close to the
true value. On the other hand, a high reliability (or repeatability) of the
measurements does not ensure validity since they may all be far from the true
value. There are two types of validity: internal and external.

Internal validity
Internal validity is the degree to which the results of an observation are correct for
the particular group of people being studied. For example, measurements of blood
haemoglobin must distinguish accurately participants with anaemia as defined in
the study. Analysis of the blood in a different laboratory may produce different
results because of systematic error, but the evaluation of associations with
anaemia, as measured by one laboratory, may still be internally valid.
For a study to be of any use it must be internally valid, although even a study that
is perfectly valid internally may be of no consequence because the results can not
be compared with other studies. Internal validity can be threatened by all sources
of systematic error but can be improved by good design and attention to detail.

External validity
External validity or generalizability is the extent to which the results of a study
apply to people not in it (or, for example, to laboratories not involved in it).
Internal validity is necessary for, but does not guarantee, external validity, and is
easier to achieve. External validity requires external quality control of the
measurements and judgements about the degree to which the results of a study can
be extrapolated. This does not require that the study sample be representative of a
reference population. For example, evidence that the effect of lowering blood
cholesterol in men is also relevant to women requires a judgment about the
external validity of studies in men. External validity is assisted by study designs
that examine clearly-stated hypotheses in well defined populations. The external
validity of a study is supported if similar results are found in studies in other
populations.

Ethical issues
Ethical issues are those involving actions and policies that are right or wrong, fair
or unfair, just or unjust. Ethical dilemmas arise frequently in the practice of
epidemiology and ethical principles govern the conduct of epidemiology, as they
do all human activities. Research and monitoring are essential to ensure that public
health interventions do not have serious unintended and harmful consequences as
occurred in Bangladesh following the installation of wells (Box 3.8).

Box 3.8. Unintended consequences: arsenic in tube wells in Bangladesh


The installation of tube wells to improve rural standards of water and hygiene in
Bangladesh over the past few decades, has been an important element in the
control of cholera and other waterborne enteric diseases. Although 95% of the
population now relies on groundwater from these wells, no testing for microbial
counts, heavy metals or toxic chemicals was carried out in the initial stages. It was
only in 1985 when a local physician in West Bengal, India, began noticing patients
with clinical signs of arsenic intoxication (skin pigmentation and increased rates of
a variety of cancers), that the tube wells were checked.
Currently about 30 million people, one quarter of Bangladesh’s population, are
drinking water with significantly high levels of arsenic. All of the possible
interventions to lower arsenic intake from water (treating water at the pump, in-
home treatment of water, community level water treatment, sealing wells with high
arsenic content, and sinking deeper wells below the water table with high arsenic
content) are either costly or require continuing maintenance and monitoring.
All epidemiological studies must be reviewed and approved by ethical review
committees. The ethical principles that apply to epidemiologic practice and
research include:
- informed consent
- confidentiality
- respect for human rights
- scientific integrity.

Informed consent
Free and voluntary informed consent must be obtained from participants in
epidemiological studies and they must retain the right to withdraw at any time.
However, it may prove impracticable for informed consent to be given for access
to routine medical records. In such cases, as is the norm in any other research
study, epidemiologists must respect personal privacy and confidentiality at all
times. They have an obligation to tell communities what they are doing and why,
and to transmit the results of studies, and their significance, to the communities
involved. All proposals for epidemiological studies should be submitted to
properly constituted institutional ethics committees before the research work
begins.

Confidentiality
Epidemiologists also have an obligation to preserve confidentiality of information
they obtain through their studies. This also extends to the right of a person to
withhold information from others. As information in medical records, case
registers, and other data files and databases are generally confidential,
epidemiologists are required to obtain permission before being given access to
these data.

Respect for individual rights


Tension can often arise in epidemiological studies between the interests of the
group and the interests of the individual. An example is provided by efforts to limit
the public health impact of HIV/AIDS. Cuba successfully contained the spread of
HIV/ AIDS by testing individuals at risk and segregating infected people from the
general population. Others argue that individual human rights are key to preventing
infection because spread of disease was facilitated by their denial; for example,
women in many affected countries cannot refuse demands for unprotected sex. In
addition, much of the behaviour that places individuals at risk of HIV/AIDS
happens in private, beyond the reach of the state. Public health efforts to modify
the behaviour of vulnerable people are unlikely to be successful without assurances
that their interests will be protected.

Scientific integrity
All scientists have the potential to behave in an unethical manner, perhaps in part
because of the pressure to succeed. Epidemiologists are not immune to unethical
behaviour. Examples include research results apparently influenced by conflict of
interests and the publication of fabricated data. Minimization of unethical
behaviour requires vigilance on the part of ethical review committees and close
attention to peer review of publications. The training and mentoring of
epidemiologists must include serious and repeated discussion of these issues.

3.3. Recommended literature:


1. A. Haupt, T. Kane. Population handbook. 5th edition. 2004. Eleventh printing
2.T. H. Tulchinsky, E. A. Varavikova. The new public health. 2nd edition, 2009.
Elsevier inc., 262 p.
3. R. B. Wallace. Public Health and preventive medicine. The McGraw-Hill
Companies, Inc. 15th edition, 2008. 1367 p.
4. Encyclopedia of population, edited by P. Demeny, G. McNiccol, 2003, 1040 p.
5. G. Mooney. Economics, medicine and health care. 3d edition. 2003, 147 p.
6. Seán Boyle: United Kingdom (England): Health system review. Health Systems
in Transition, 2011; 13(1):1–486.
7.S. Folland, A. C. Goodman, M. Stano. The economics ot Health and Health Care
– 4th edition, 2003
8. R. Bonita, R. Beaglehole, T. Kjellström. Basic epidemiology, 2 nd edition. World
Health Organization, 2006. 212p.
9. N. Black, R. Gruen. Understanding health services. Open university press, 2005.
243p.
10. V. Lekhan, V. Rudly, E.Nolte. Health care system in transition: Ukraine.
Copenhagen, WHO Regional Office for Europe on behalf of the European
Observatory on Health System and Policies, 2004. 128p.
11. Health systems, health. Wealth and societal well-being. Assessing the case for
inversting in health systems. Edited by J. Figueras, M. McKee. Open university
press. 1st edition, 2012. 302p.
12. Public Health for the 21st Century. New Perspectives on policy, participation
and practice, 2nd edition. Editors: J. Orme, J. Powell, P.Taylor, M. Grey. Open
University Press, 2003, 401p.
13. World Health Statistics, 2010. World Health Organization, 177 p.
14. R. Beaglehole. Global Public Health: A new era. Oxford University Press,
2003, 284 p.
15. B.McPake, L. Kumaranayake,Ch. Normand. Health Economics,An
International Perspective. Taylor & Francis e-Library, 2005, 260 p.

3.4. The estimated card for individual students’ work.


# Main tasks Instructions
1. Distinguish between experimental To know
and observational studies.
2. Describe the key characteristics of To know
experimental, cohort, case–control,
cross-sectional, and ecologic studies
regarding subject selection, data
collection, and analysis.
3. Identify the design of a particular To know
study
4. Discuss the factors that determine To know
when a particular design is
indicated.

3.5 Tasks for selfcontrol


1. Analytical studies
2. Definition of the cohort
3. Case-control studies – design, method, selection of cases,
representativeness of cases
4. Case-control studies - selection of controls, sources of controls,
matching, types of matching,
5. Case-control studies - measurement of exposure, odds ratio, odds of
event, odds of exposure. interpretation
6. Case-control studies - advantages, disadvantages
7. Follow-up study – types of follow-up, examples,
8. Prospective follow-up studies, examples
9. Retrospective follow-up studies, examples
10.Relative risk, its calculation
11.Outline the design of a case-control study and a cohort study to
examine the association of a high-fat diet with bowel cancer.
12. What is random error and how can it be reduced?
13.What are the main types of systematic error in epidemiological studies
and how can their effects be reduced?
14.Describe in which studies the relative risk (RR) and the odds ratio
(OR) are used. Outline the reasons why they would be used in a
particular study and not in another.
15.In the case of a rare disease, the OR and RR can have very similar
values. Explain the reasons behind this similarity.
16. A cross-sectional study of Downs syndrome has found an association
with birth order. What could be a cause of confounding and how
would you avoid it?

You might also like