Session 2 Workshop
Session 2 Workshop
Session 2 workshop:
Stefan Scholtes
Unit 1
No assumptions Descriptive statistics: How well
needed to interpret do we understand the data we
results have?
“Population” of
Sample interest
Data that you
Data love to
have
have
History Future
Customers you have data on Customers you’d like to have data on
The population of interest is defined by an
“exam question” that informs a decision
Decision on 28 Sep 2021:
Over the past two weeks, Covid bed
capacity in the East of England has
been declining. Hospitals are asking
whether it is safe to release reserved
COVID ward capacity for planned
surgery over the coming 2-3 weeks?
Exam question?
How will the COVID bed occupancy
change between 28/9/21 and 15/10/21
Population of interest?
Daily COVID hospital bed occupancy
in the region from 15/8/21 – 15/10/21
A “population of interest” doesn’t refer to actual
“people” but to a spreadsheet you’d like to have,
so that its summary statistics would answer your
exam question
When are confidence
intervals useful?
Think weather app: If the weather app says that there is a 60% chance
of rain today, then this tells you that on 60% of all days that have a
“60% chance of rain” label, it actually rains, and on 40% it doesn’t.
95% CI = Sample_statistic ± 2 ∗ 𝑆𝐸
!(#$!)
Proportion: SE= &
, (𝑝 is the sample proportion)
Three ways of presenting a Confidence Interval
1. As a range:
95% CI = 2.98 mins to 3.74 mins
!"#
stdev.s(Data) = ∗ 𝐬tdev.p(Data)
!
stdev.p(Data)
SE of Average =
N
Exercise 1: How many patients die in
hospitals as a consequence of negligence?
Calculate a 95% confidence interval for the number of
avoidable lethal adverse events per 1,000 admitted patients
Data
• 2.6 M hospital admissions in NY State in 1984
• Sample: Medical records of 30,195 hospital admissions in NY State hospitals in 1984,
randomly drawn
Summary statistics
• 3.7% of sampled patients experienced at least one “adverse event”
• 13.6% of these adverse events led to death
• 51.3% of the lethal adverse events were classified by the research team as “avoidable”
Key questions:
1. What is the population of interest? (the data we’d love to have)
2. What’s the population statistic of interest?
3. What’s the value of the corresponding sample statistic?
4. What’s the formula we need to calculate the 95% CI?
5. How should we communicate the result?
Communication
Why?
“30 years ago, researchers at Harvard showed that
hospitals had more than double the number of
avoid adverse events resulting in patient
fatalities compared to road accidents. Has the
situation improved?”
Question: What proportion of
the UK population are in
favor of writing off student
loans for nurses?
https://yougov.co.uk/topics/politics/survey-results/daily/2023/09/28/b92d8/2
Excel
Formulas
VLE example: Call Centre Staffing
Contract with new client: Need to deliver 1,000 calls per day
Contract with callers: A caller is expected to deliver 360 call-minutes per working
day
stdev.p(data)
EM = 2 ∗
!
"
stdev.p(data)
Rearranging gives 𝑁 = 2∗
EM
You need a large sample if the proportion of 1’s is very small or very large
!(#$!)
à 𝑝=0.02; SE= = 0.014
&
à 95% CI = [-0.018, 0.048]
Precise Imprecise
Precise Imprecise
Unknown value we
We are increasing
are interested in
the sample to
estimating
improve precision
One of the biggest mistakes in naïve statistics
WRONG!