Biostatics and Epidemiology 2022 1
Biostatics and Epidemiology 2022 1
Statistics
The science of collecting, monitoring, analyzing, summarizing, interpreting data.
Biostatistics
Statistics applied to biological (life) problems, including Public ealth, Medicine,
Ecological and environmental, Populations and Samples
Population – a group of individuals that we would like to know something about.
Studying populations is too expensive and time-consuming, and thus impractical
Often denoted with Greek letters (μ, σ, ρ)
Sample – a subset of a population, then by observing the sample we can learn
something about the population often denoted with lower letters (m, SD)
Incidence and Prevalence
Prevalence = Incidence x Duration
Incidence Rate
Incidence (New Cases) / Population exposed to the Risk
Resp. Infection among exposed / Total exposed children
Relative Risk
RR = Incidence Rate in Exposed
Incidence Rate in Non-Exposed
1
60 / 24 = 2.5
300/120 = 2.5, Statistically Significant
RR = 1 No Relationship, NOT statistically significant
RR = > 1 Positive association, Risk Factor
RR = < 1 Negative association, Protective Factor
Relative Risk of Disease is 2.5 times more in Exposed Population
Attributable Risk
= Incidence Rate in Exposed - Incidence Rate in Non-Exposed 60 – 24 = 36
Question No 1
A new blood test has been devised and used in 50 people to look for diabetes
The results are as follows.
True Positive 4
False Positive 16
False Negative 6
True Negative 24
What is the Sensitivity and Specificity of the test?
What is the positive and negative predicted value?
Conclusions
For dyspneic adult emergency patients, Directed History, Physical Exam.
Chest X-Ray, and ECG should be performed.
If the suspicion of Heart Failure remains, obtaining a serum BNP level may be
helpful, especially for excluding heart failure.
A low serum BNP (<100 pg/mL) proved to be the most useful test
-LR = 0.11 (95% CI, 0.07-0.16)
Probability
Chance of something happening, or risk. A statistical way of quantifying
uncertainty
0- Never
1- Certain
• Values are mostly less than one
• Chance of heads 50% or 0.5
Risk = Number of Events
Number of people at Risk
3
If two patients develop a side effect of a drug out of 100 in a clinical trial.
Calculate the risk of side effect.
Risk = 2/100 (0.02 or 2%)
Mean
The average value, Summation of values divided by its Number,
Example:
Monthly income of 5 employees is:
1, 3, 4, 2, 5 dollars
Calculate their mean:
Arithmetic mean = sum of values / n
= 1+ 3+ 4+ 5+ 2
Mean = 15 / 5 = 3
Median
The value that divides the data into two equal sets after arrangement in descending
or ascending order.
The Middle Value
To calculate the median you need to:
1. Arrange the values in ascending or descending order.
4
2. Determine location of median:
(n+1)/2
Odd number, the location is direct
Even number, the location is midpoint between two values
1. Determine the Value of the median
Median
The middle value
Example
Number of children of some families were n= 9
6, 4, 5, 0, 1, 3,5,2,2
Calculate the median
▪ Arrange in an ascending order
▪ Pick-up the middle value
0,1,2,2,3,4,5,5,6
Mode
The most frequent value, least affected by skewness of bell curve
Example
Number of children in nine families
n= 9
3,5,2,4,3,0,1,3,2
0,1,2,2,3,3,3,4,5
Mode= 3
Standard Deviation
It’s a measured of spread away from the arithmetical mean
It lies either side of mean value, minus or plus
Narrow SD: The data is squeezed and close to the mean
Wide SD: The data is spread out and away from the mean
Example
US average men’s height = 178, SD = 8 cms
Most men (68%) height = 170 to 186 cms (One SD = 8 cms)
All men heights = 162 to 184 cms (Two SD = 16 cms)
Measures of Dispersion:
Range
• Used with ordinal, interval & ratio
• Difference between largest & smallest value
• Entirely dependent on the most extreme scores
• Outlier may exaggerate range
5
Example: n= 11 8 3 6 5 4 11 2 9 4 10 11
Range: n= 11 11-2 = 9
Example: 8 3 6 4 11 2 9 4 10 4 11
Range: 11-2 = 9
Example: 8 3 6 4 11 2 9 4 10 4 19
Range: 19 -2 = 17
Data are identical except for one point
Outliers/extreme scores has large effect on range.
Measures of Dispersion:
Interquartile Range (mid-spread)
• used for ordinal, interval & ratio data
• comprises the middle 50% of the data
• difference between the 75th and 25th percentile
• not influenced by extreme scores than range
• disregards half of the data (lower Quarter and upper Quarter)
• IQR for 1 year old baby wt.11.5 - 9.5 = 2.0 kg
• Mean , 50th Percentile for 1 year old baby wt. = 10.5 kg
Measures of Dispersion:
Interquartile Range
Example: 42 43 45 47 48 49 51 53 53 54
(N=10)
step 1: calculate Q1 (median of lower half)
step 2: calculate Q3 (median of upper half)
step 3: IQR = Q3-Q1
Q1= 45 (42 43 45 47 48)
Q3= 53 (49 51 53 53 54)
IQR= 8 (53-45)
Measures of Dispersion:
Comparing Range & Interquartile Range
Example: 42 43 45 47 48 49 51 53 53 54
• Interquartile range: 45-53 = 8
• Range: 54-42 = 12
Example: 42 43 45 47 48 49 51 53 53 64
• Interquartile range: 45-53 = 8
• Range: 42-64 = 22
• Interquartile range NOT affected by extreme score
• Range is affected by extreme score
6
• Measures of Dispersion: Standard Deviation
Standard Deviation
• take the square root of the variance
• returns value to the original unit of measurement
• Easier to interpret
• Average deviation from the mean
• How much scores vary on average
Question No 2
You are director of occupational health for a corporation that has many
employees aged over 45 who smoke one or more packs of cigarettes daily
and are at increased risk for lung cancer.
What strategy for the early detection of lung cancer in asymptomatic
individuals would you recommend?
a) No strategy has been shown to be effective in reducing mortality
b) Chest x-ray and sputum cytology every 6 months for high-risk employees
c) Annual chest x-ray and sputum cytology for high-risk employees
d) Annual chest x-ray and sputum cytology for all employees
e) Annual chest x-ray for all employees
Question No 3
A double-blind trial is planned to compare the utility of glyburide and
metformin in the treatment of diabetes mellitus. The main reasons for
randomizing patients are:
a) So that the number of subjects in each group will be identical
b) So that the two patient groups will have similar prognostic features
c) So that the statistician will not analyze the data in a biased fashion
d) So that the investigator does no know in advance what therapy which patient
will receive?
e) To prevent the clinician knowing which drug the patient is taking
Question No 4
A type of gynecological cancer has the same incidence rate in white women and
African American women in the US, but the prevalence rate of this type of cancer
is lower in African American than in white women. The most likely explanation
for this difference in prevalence rates is that when compared to white women,
African American women are more likely to;
A. Recover from this type of cancer
B. Have natural immunity to this type of cancer
C. Have increased access to treatment for this type of cancer
7
D. Be resistant to this type of cancer
E. African American live longer with this type of cancer
Question No 5
A research study is done to determine if a new drug (Drug A) will prevent stroke in
men aged 55–65 years who have hypertension. 4000 Hypertensive men in this age
group are randomly assigned to a group taking Drug A (n = 2,000) and placebo (n
= 2,000). Over 10 years, there were 400 strokes in the placebo group and 200
strokes in the Drug A group
Based on these data, how many men would have to be treated with Drug A to
prevent one case of stroke?
a) 10
b) 20
c) 40
d) 80
Question No 6
A cohort study is conducted to evaluate the relationship between calcium
supplements and the occurrence of hip fractures in post-menopausal woman. The
study examines the hip fracture rate in 100 woman taking calcium supplements and
100 woman taking placebo over 3 years.5 women have hip fractures in the calcium
group and 10 woman have fractures in the placebo group
What is the risk of hip fracture in the group treated by calcium supplements?
a) 1%
b) 5%
c) 10%
d) 20%
e) 50 %
Risk of Fracture in Calcium group (Experimental) EER = 5 / 100 = 0.05 = 5%
Risk of Fracture in Controlled group (Control) CER = 10 / 100 = 0.1 = 10%
Relative Risk Reduction (by Calcium) RRR = 0.05 / 0.1= 0.5 = 50%
8
Null Hypothesis
Type I (α) error:
The probability of saying that there is a difference in treatment effects between
groups while in fact there is none (a falsely rejected null hypothesis)
p-value < 5% or p < 0.05
The p-value is an estimate of the probability that differences in treatment effects in
a study could have happened by chance alone.
Classically, differences associated with a p < 0.05 are statistically significant.
Type II (β) error:
The probability of saying that there is no difference in treatment effects.
A falsely accepted the null hypothesis while in fact there is a difference.
Power > 80 %
The probability that a study will find a statistically significant difference when one
is truly there.
It relates directly to the number of subjects. Power (β) = 1 − type II error.
Question No 7
A randomized trial comparing the efficacy of two drugs showed a difference
between two with a p value of < 0.05. However, two drugs do not differ.
This is an example of:
9
a. Type I error (α error)
b. Type II error (β error)
c. 1- α
d. 1- β
e. A statistically significant trial
Type I (α) error:
The probability of saying that there is a difference in treatment effects between
groups while in fact there is none (a falsely rejected Null Hypothesis)
Question No 8
To study the psychiatric morbidity of Schizophrenia, all patients with the disorder
being treated in the city, as identified by medical records, were interviewed by a
research team. A control group matched for age, sex and socio-economic status
was also interviewed. This is an example of which one of the following.
Question No 10
In a classroom of 25 students (15 males and 10 Females) , 5 males develop
hepatitis A over a 2-week period. During the next 6 weeks, an additional 3 males
and 2 females develop the infections.
10
Calculate secondary attack rates of hepatitis A in this class.
a) 5%
b) 10%
c) 20%
d) 25%
e) 50%
Question No 11
Several studies have shown that 85% of Lung Cancer are due to cigarette smoking.
This measure is an example of
a. Incidence Rate
b. Attributable Risk
c. Relative Risk
d. Prevalence
e. Mortality Ratio
Question No. 12
40,000 students take MCC-QE1 each year.
The mean score is 222 with a Standard Deviation of 16.
If passing score is 226, How many students scored above 254?
(A) 10,000
(B) 6,400
(C) 1,000
(D) 600
Answer: There are 1,000 students who scored above 254 on MCC-QE1
• Total Student = 40,000
• 254 is 2 SD above the mean (222+16+16) or > 95% above the mean!
• This indicates that 2.5% were above 254
• 2.5 percent of 40,000 is 1,000 students (2.5 / 100 x 40,000 )
11
STANDARD NORMAL DISTRIBUTION
Question No. 13
During a hospitalization, a patient’s serum Na+ value follows a normal distribution
with a mean of 140 and an SD of 2.5. During his stay, what percentage of his Na+
values will be greater than 145?
Answer: 2.5%.
Therefore 5% of his values will be >2 SDs away; 2.5% will be >145 and 2.5% will
be <135
Mean serum Na+ = 140
❑ > 1 SD contains 68% of values (140+2.5 = 142.5)
❑ > 2 SD contains 95% of values (140++2.5++2.5 = 145)
❑ > 3 SD contains 99% of values (140++2.5++2.5 = 145)
Question No. 14
In a cohort study of coronary artery disease in people with smoking versus those
who are not smoking, the following results were obtained:
Calculate Relative Risk of CAD in smoker and what does it mean?
Relative Risk
= Incidence rate in smoker 80% / Incidence rate in non-smoker 50% = 1.6
It means that smokers are 1.6 times more at risk of developing coronary artery
disease.
12
Question No. 15
200 Hypertensive males are randomly allocated to treatment A (100 patients) and
Control (100 patients). After 3 months of treatment, 70 of the treated groups and
50 of the placebo group showed a drop in their blood pressure.
Calculate
• The incidence among treated = 30%
• The incidence among controls = 50%
• The Relative Risk = 30/50 = 0.6
• Absolute Risk Reduction (ARR) = 50% - 30% = 20% or 0.2
• Number Needed to Treat (NTT) = 1/ARR = 1/0.2 = 5
It means that you need to treat 5 Patients to have ONE favorable outcome, or
corrected blood pressure. As the RR is 0.6, the treatment is successful
In one study, the ARR of statin therapy is calculated at 4%. What is the NNT?
Number Needed to Treat (NTT) = 1/ARR = 1/0.04 = 25
It means that 25 patients would need to be treated with statins to prevent one MI.
Question No. 16
A new test for tuberculosis finds that 195 out of 200 people with tuberculosis test
positive. A total of 35 people out of 150 without the disease also tested positive.
Before calculation, decide whether this test is sensitive or specific.
Draw a 2 × 2 table and calculate the sensitivity, specificity, PPV, and NPV.
Positive TP = 195 FP = 35
Negative FN = 5 TN = 115
13
Question No. 17
A case-control study is designed to study risk factors for developing Buerger’s
disease. Ten people are selected with Buerger’s disease, and 10 subjects are
selected as controls. Nine patients who developed Buerger’s disease were heavy
smokers, and two people without Buerger’s disease smoked.
Construct a 2 × 2 table and calculate the odds ratio.
Present a = 10 b = 190
Absent c = 30 d = 170
14
The Odds Ratio for Osteoporosis
OR increased by 8% (p=0.001) with each year of life.
The prevalence increased with age from 24.9% in 60-64 years to 37.4% in 70-75
years.
Non-smokers the Odds Ratio for osteoporosis was 0.424, which was statistically
significant (p<0.05).
BMI <18.5 increased the OR 1.86 95% CI (0.35-9.8)
BMI >30-35 decreased the OR 0.19 95% CI (0.13-0.28)
Age 60-75 years Risk of fractures increased with increasing age and observed
height loss (p<0.001).
Hormone Therapy (HT) decreased the prevalence of osteoporosis by 25% in
comparison with non-users.
Question No. 18
Investigators conduct a randomized control trial to study the benefits of a new
asthma medication. Of 200 people on the medication, only 10 had asthma attacks.
A total of 30 people out of 200 in the control group developed an asthma attack.
Construct a 2 × 2 table and calculate the incidence in the exposed and unexposed,
Control c = 30 d = 170
15
Hierarchy of Evidence
16
CRP levels showed a inverse correlation to P/F ratio on admission demonstrating
an association of these markers with the severity of Acute Hypoxemic Respiratory
Failure
A scatterplot displays the strength, direction, and form of the relationship between
two quantitative variables. A correlation coefficient measures the strength of that
relationship.
Meta-Analysis
A Systematic Review that Combines Statistical Data from Similar Quantitative
Studies to find a common resultant effect.
The power of the meta-analysis comes from its ability to statistically digest dozens
of different reviews and emerge with a final assessment
Five Odds Ratios used in Meta-Analysis with the summary measure (centre line of
diamond) and associated Confidence Intervals (lateral tips of diamond),
and solid vertical line of No Effect
[email protected] https://www.facebook.com/imgstudygroup
17