Count Data
Count Data
To load it, we download the file and set file_path to the path of
sharks.csv
Are there any trends over time for the number of reported
attacks?
To fit it, we use glm as before, but this time with family = poisson :
Interpretation
A 1-unit increase in X is associated with an average change of 100×β1%
in Y
Explanation
We want to know how Y changes when we increase X by 1 unit, i.e. when
X becomes (X + 1).
Let’s call:
log(Ynew/Yold)=β1
So, Ynew/Yold=eβ1
Now let’s transform Ynew/Yold
to a percent change in Y, by subtracting 1 and multiplying by 100.
So, when X increases by 1 unit, the percent change in Y will be (eβ1−1)
×100
when β1 < 0.1: (eβ1−1)×100≈100×β1
The fitted model seems to confirm our view that there is no trend over
time in the number of
attacks.
For model diagnostics, we can use a binned residual plot and a plot of
Cook’s distance to find influential points:
ggplot(attacks, aes(Type)) +
geom_histogram(binwidth = 1, colour = "black")
install.packages("AER")
library(AER)
dispersiontest(m, trafo = 1)
There are several alternative models that can be considered in
the case of overdispersion.
summary(m_nb)
For the shark attack data, the predictions from the two models are
virtually identical, meaning that both are equally applicable in this case: