Session 7 Summary
Session 7 Summary
* Chi-square tests
* Logistic regression
* Odds ratios.
What is a prospective cohort study? Pospective cohort study ~ Key characteris
A prospective cohort study is a type of observational study that follows a Study Design: In a prospective cohort stud
group of individuals over a specific period of time to investigate the their exposure status, such as being expos
association between certain exposures or risk factors and the development specific characteristic. They are then follow
of outcomes or diseases. the outcome or disease being investigated.
In a prospective cohort study, participants are identified based on their Exposures and Risk Factors: The study colle
exposure status and then followed forward in time to observe the interest. These factors may include lifes
occurrence of the outcome of interest. environmental exposures, or any other vari
association with the outcome being studied.
Prospective cohort studies are valuable for studying the natural progression
of diseases, identifying risk factors, establishing temporal relationships, and Follow-up Period: Participants in a prospec
evaluating the impact of exposures on health outcomes. They provide time to observe the occurrence of the out
important evidence for understanding causality and informing preventive depending on the research question and th
and therapeutic interventions. may range from months to years or even dec
Factors: The study collects data on the exposure or risk factors of Study Population:
tors may include lifestyle choices, behaviors, genetic factors, Total number of participants enrolled: 2,000
sures, or any other variables that are believed to have a potential Number of participants exposed to Exposure A: 800
outcome being studied. Number of participants not exposed to Exposure A: 1,200
articipants in a prospective cohort study are followed forward in Exposure and Outcome:
e occurrence of the outcome. The duration of follow-up can vary Exposure A: The presence or absence of a specific risk factor (e.g
esearch question and the nature of the outcome being studied. It consumption of a certain food)
nths to years or even decades. Outcome B: Development of a particular disease (e.g., diabetes)
a on both exposures and outcomes are collected during the study. Follow-up Period:
ically assessed at regular intervals to collect information on their 5 years
ential confounding factors, and the occurrence of the outcome.
Outcome Assessment:
nt: The primary outcome or disease of interest is identified at the Number of participants who developed Outcome B (disease) amo
udy and is defined and measured in a standardized manner. The exposed to Exposure A: 50
utcome is ascertained by following participants over time and can Number of participants who developed Outcome B (disease) amo
assessments, medical records, laboratory tests, or other reliable not exposed to Exposure A: 30
NNT interpretation:
Study Population:
e cases are individuals who have developed the Number of cases (individuals with Outcome B): 100
eing investigated. They are identified either through Number of controls (individuals without Outcome B): 200
ase registries, or other sources. Cases should be
sure consistency and reliability in the study. Exposure and Outcome:
Exposure A: The presence or absence of a specific risk factor (e.g., exposure
s are individuals who do not have the outcome or to a certain chemical)
They should be selected from the same population Outcome B: Development of a particular disease (e.g., lung cancer)
arise and should be representative of the population
cteristics and potential risk factors. Exposure Assessment:
Number of cases exposed to Exposure A: 40
: In a case-control study, the exposure history of both Number of cases not exposed to Exposure A: 60
determined retrospectively. Participants are asked Number of controls exposed to Exposure A: 60
ures or risk factors, such as behaviors, environmental Number of controls not exposed to Exposure A: 140
graphic information. This information is collected
uestionnaires, or medical records review.
Interpretations:
Odd per case = 5
It means that individuals who have diabetes have ap
being exposed to exposure A compared to individuals
The odds per control represents the odds of exposure among OR = (Odds of Exposure A among cases) / (Odds of Exposure A amo
duals who have diabetes have approximately five times the odds of
posure A compared to individuals without the disease.
92 Risk ratio = 5
uals who are controls (those without diabetes) have 0.92 times the Odds ratio = 5.4444444
ed to exposure A compared to individuals who have diabetes among
.
fies the association between exposure and
osure among cases to the odds of exposure
Decision rule:
If the chi square statistic is larger than the chi square critical value,
we reject Ho.
Otherwise we do not reject Ho.
hi square critical value (3.841) Chi square test conclusion
e degrees of freedom for a 2x2 table is (2-1) x (2-1) = 1. Since the chi square statistic or value (43.947) is > than the chi
ing Chi square table: critical value (3.841) we reject Ho in favor of Ha. Meaning th
relationship (association) between treatments and outcomes.
Fisher's test
Fisher's exact test is a statistical test that can be used as an
alternative to the chi-square test in certain situations. It is
typically employed when the sample size is small (typically with
cells having expected frequencies less than 5) or when the
assumptions of the chi-square test are not met.
Approximation accuracy: Fisher's exact test involves exact Simulation-based methods: Monte Carlo simulation methods,
calculations, which can be computationally intensive for larger such as the permutation test or bootstrap resampling, can be
sample sizes. In such cases, an asymptotic chi-square test used to assess the significance of associations between variables
approximation is often sufficient and less computationally in contingency tables.
burdensome.
Exact conditional tests: Exact conditional tests, such as the
Barnard's test or the mid-p exact test, provide alternative
approaches for assessing associations in contingency tables.
o simulation methods,
ap resampling, can be
tions between variables
Observed Frequencies
Column variable Calculations
Row variable C1 C2 Total fo-fe
R1 80 25 105 23.65854 -23.65854
R2 30 70 100 -23.65854 23.65854
Total 110 95 205
Expected Frequencies
Column variable
Row variable C1 C2 Total (fo-fe)^2/fe
R1 56.34146 48.65854 105 9.934537 11.50315
R2 53.65854 46.34146 100 10.43126 12.07831
Total 110 95 205
Results
Critical Value 3.841459
Chi-Square Test Statistic 43.94725
p-Value 3.373E-11
Reject the null hypothesis
The chi-square test of association, also known as the chi-square test for The chi-square test for proportions is a sta
independence, is a statistical test used to determine if there is a significant percentages between two groups or con
association or relationship between two categorical variables. proportions is statistically significant.
It assesses whether the observed frequencies in a contingency table are
significantly different from what would be expected if the variables were This test is typically used when the outco
independent. categories (e.g., success/failure, yes/no) and
categories differ significantly between the gro
The test calculates a chi-square statistic by comparing the observed
frequencies to the expected frequencies under the assumption of The test calculates a chi-square statistic by co
independence. A significant result indicates that there is evidence of an proportions assuming the null hypothesis of n
association between the variables, suggesting that they are not is evidence of a statistically significant differen
independent.
Assumptions
Assumptions
Independence: The observations within each
Independence: The observations in the contingency table should be assumption ensures that the proportions are n
independent of each other. In other words, the data should be collected in
such a way that each observation is not influenced by or related to any Random Sampling: The data should be obtain
other observation in the table. of interest. Random sampling helps to ens
respective populations and reduces the risk of
Random Sampling: The data should be obtained through random sampling
from the population of interest. This assumption ensures that the sample is Large Sample Sizes: For approximate inferenc
representative of the population and reduces the risk of bias. or Z-test), both sample sizes should be larg
proportion can be approximated by a norma
Expected Frequencies: The expected frequencies in each cell of the number of successes and failures in each grou
contingency table should be reasonably large. Generally, it is recommended
that no more than 20% of the cells have expected frequencies below 5. Independent Samples: The two groups bein
Extremely low expected frequencies can affect the validity of the chi-square This assumption ensures that the proportions
test and may require alternative tests, such as Fisher's exact test. proportions in the other group.
Sample Size: The sample size should be adequate to provide enough Level of Measurement: The variable being a
statistical power to detect any meaningful associations. Larger sample sizes (e.g., success/failure, yes/no) or binary. The te
tend to produce more reliable results. continuous or ordinal data.
Table Data:
ampling: The data should be obtained through random sampling from the populations
t. Random sampling helps to ensure that the samples are representative of their
populations and reduces the risk of bias.
mple Sizes: For approximate inference (using methods such as the normal approximation
both sample sizes should be large enough so that the sampling distribution of the
n can be approximated by a normal distribution. A general rule of thumb is that the
successes and failures in each group should be at least 10.
ent Samples: The two groups being compared should be independent of each other.
mption ensures that the proportions in one group are not influenced by or related to the
ns in the other group.
Measurement: The variable being analyzed should be categorical with two categories
ess/failure, yes/no) or binary. The test of difference in proportions is not appropriate for
s or ordinal data.
Question: Is there a significant difference in the effectiveness of two drug treatments for
disease?
a:
mple, we have collected data on the treatment outcomes (Improved or Not Improved)
drug treatments (Treatment A or Treatment B). The table presents the observed
es.
Treatment A Treatment B Total
re the proportions of treatment outcomes between the two drug treatments, we can
chi-square test for
mproved proportions.
50 The test
30 will determine
80 if there is a significant
in the effectiveness of the two treatments based on the observed proportions of
Not Improved
outcomes. 30 40 70
otal 80 70 150
chi-square test for proportions. The test will determine if there is a significant
in the effectiveness of the two treatments based on the observed proportions of
outcomes.
w to decide which to use?
the chi-square test for independence and the test for
roportions depends on the nature of your research
e of data you have.
where:
of the outcome,
ent for X (0.63): The coefficient of 0.63 represents the estimated change in the log-odds (logit) of the
e for a one-unit increase in the treatment variable (X). In this case, since the treatment variable is
0 for treatment A and 1 for treatment B), we can interpret it as the difference in log-odds between
ent B and treatment A. Thus, for patients in treatment B compared to treatment A, there is an Y hat = ao + boX + e
ed increase of 0.63 in the log-odds of the outcome (death). Y hat = Bo + B1X + e
bo +b1 = -1.034074
bo = -1.661706
0.6222964
1.8632017
1.8731706
Y hat = ao + boX + e
Y hat = Bo + B1X + e
Confounding Variable Intermediate Variable
* Confounding variables are external factors that are associated * Intermediate variables, also known as mediat
with both the exposure and the outcome. intervening variables, are variables that lie on the
pathway between the exposure and the outcome.
* They can distort or mask the true relationship between the
exposure and the outcome, leading to confounding bias. * They are affected by the exposure and, in turn, influe
mediate the relationship between the exposure a
* Confounding variables are typically considered at the analysis outcome.
stage to adjust for their effects and isolate the true association
between the exposure and the outcome. * Intermediate variables provide insight into the mech
or pathways through which the exposure affects the outc
* Controlling for confounding variables is important to obtain
unbiased estimates of the exposure-outcome relationship. * Analyzing intermediate variables may help understa
underlying processes and identify potential targe
intervention.
In summary
so known as mediators or
ables that lie on the causal In summary, confounding variables are external factors that can
and the outcome. distort the relationship between the exposure and outcome,
while intermediate variables are variables that lie on the causal
osure and, in turn, influence or pathway and mediate the relationship between the exposure
ween the exposure and the and outcome.
= 287.29%
Odds Ratio (OR) greater than 1: If the odds ratio is greater than 1, it
suggests that the odds of the outcome are higher among individuals
exposed to a particular factor compared to those unexposed. In other
Log Exp words, the exposure is associated with an increased likelihood or risk of
1.03648 2.8192757 the outcome.
1.67152 5.3202484 Odds Ratio (OR) less than 1: If the odds ratio is less than 1, it indicates
that the odds of the outcome are lower among individuals exposed to a
odds ratio is different from 1 particular factor compared to those unexposed. In this case, the exposure
is associated with a decreased likelihood or protective effect against the
since the CI does not include 1 outcome.