Zero-Inflated Model
Zero-Inflated Model
In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that
allows for frequent zero-valued observations.
For statistical analysis, the distribution of the counts is often represented using a Poisson distribution or a negative binomial
distribution. Hilbe [3] notes that "Poisson regression is traditionally conceived of as the basic count model upon which a
variety of other count models are based." In a Poisson model, "… the random variable is the count response and parameter
(lambda) is the mean. Often, is also called the rate or intensity parameter… In statistical literature, is also expressed as
(mu) when referring to Poisson and traditional negative binomial models."
In some data, the number of zeros is greater than would be expected using a Poisson distribution or a negative binomial
distribution. Data with such an excess of zero counts are described as Zero-inflated.[4]
Example histograms of zero-inflated Poisson distributions with mean of 5 or 10 and proportion of zero inflation of 0.2 or
0.5 are shown below, based on the R program ZeroInflPoiDistPlots.R from Bilder and Laughlin.[1]
Examples of Zero-inflated count data
Fish counts [1] "… suppose we recorded the number of fish caught on various lakes in 4-hour fishing trips to
Minnesota. Some lakes in Minnesota are too shallow for fish to survive the winter, so fishing in those lakes
will yield no catch. On the other hand, even on a lake where fish are plentiful, we may or may not catch any
fish due to conditions or our own competence. Thus, the number of fish caught will be zero if the lake does
not support fish, and will be zero, one or more if it does."
Number of wisdom teeth extracted.[5] The number of wisdom teeth that a person has had extracted can
range from 0 to 4. Some individuals, about one-third of the population, do not have any wisdom teeth. For
these individuals, the number of wisdom teeth extracted will always be zero. For other individuals, the
number extracted will be between 0 and 4, where a 0 indicates that the subject has not yet, and may never,
have any of their 4 wisdom teeth extracted.
Publications by PhD candidates.[6] Long examined the number of publications by 915 doctoral candidates in
biochemistry in the last three years of their PhD studies. The proportion of candidates with zero publications
exceeded the number predicted by a Poisson model. "Long [6] argued that the PhD candidates might fall into
two distinct groups: "publishers" (perhaps striving for an academic career) and "non-publishers" (seeking
other career paths). One reasonable form of explanation is that the observed zero counts reflect a mixture of
the two latent classes – those who simply have not yet published and those who will likely never publish."[7]
As the examples above show, zero-inflated data can arise as a mixture of two distributions. The first distribution generates
zeros. The second distribution, which may be a Poisson distribution, a negative binomial distribution or other count
distribution, generates counts, some of which may be zeros.".[7]
In the statistical literature, different authors may use different names to distinguish zeros from the two distributions. Some
authors describe zeros generated by the first (binary) distribution as "structural" and zeros generated by the second (count)
distribution as "random".[7] Other authors use the terminology "immune" and "susceptible" for the binary and count zeros,
respectively [1]
Zero-inflated Poisson
One well-known zero-inflated model is Diane Lambert's zero-inflated Poisson model,
which concerns a random event containing excess zero-count data in unit time.[8] For
example, the number of insurance claims within a population for a certain type of risk
would be zero-inflated by those people who have not taken out insurance against the
risk and thus are unable to claim. The zero-inflated Poisson (ZIP) model mixes two
zero generating processes. The first process generates zeros. The second process is
governed by a Poisson distribution that generates counts, some of which may be zero.
The mixture distribution is described as follows:
Histogram of a zero-inflated Poisson
distribution
where the outcome variable has any non-negative integer value, is the expected Poisson count for the th individual; is
the probability of extra zeros.
The maximum likelihood estimator[10] can be found by solving the following equation
then the discrete data obey discrete pseudo compound Poisson distribution.[16]
We say that the discrete random variable satisfying probability generating function characterization
When all the are non-negative, it is the discrete compound Poisson distribution (non-Poisson case) with overdispersion
property.
See also
Poisson distribution
Zero-truncated Poisson distribution
Compound Poisson distribution
Sparse approximation
Hurdle model
Software
pscl (https://cran.r-project.org/web/packages/pscl/index.html) and brms (https://paul-buerkner.github.io/brms/)
R packages
References
1. Bilder, Christopher; Loughin, Thomas (2015), Analysis of Categorical Data with R (First ed.), CRC Press /
Chapman & Hall, ISBN 978-1439855676
2. Hilbe, Joseph M. (2014), Modeling Count Data (First ed.), Cambridge University Press, ISBN 978-
1107611252
3. Hilbe, Joseph M. (2007), Negative Binomial Regression (Second ed.), Cambridge University Press,
ISBN 978-0521198158
4. Lachin, John M. (2011), Biostatistical Methods: The Assessment of Relative Risks (Second ed.), Wiley,
ISBN 978-0470508220
5. "Biostatistics II. 1.3 - Zero-inflated Models" (https://www.youtube.com/watch?v=14B5QUUmqts). YouTube.
Retrieved July 1, 2022.
6. Long, J. Scott (1997), Regression Models for Categorical and Limited Dependent Variables (First ed.), Sage
Publications, ISBN 978-0803973749
7. Friendly, Michael; David, Thomas (2016), Discrete Data Analysis with R (First ed.), CRC Press / Chapman &
Hall, ISBN 978-1498725835
8. Lambert, Diane (1992). "Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing".
Technometrics. 34 (1): 1–14. doi:10.2307/1269547 (https://doi.org/10.2307%2F1269547). JSTOR 1269547
(https://www.jstor.org/stable/1269547).
9. Beckett, Sadie; Jee, Joshua; Ncube, Thalepo; Washington, Quintel; Singh, Anshuman; Pal, Nabendu (2014).
"Zero-inflated Poisson (ZIP) distribution: parameter estimation and applications to model data from natural
calamities" (https://doi.org/10.2140%2Finvolve.2014.7.751). Involve. 7 (6): 751–767.
doi:10.2140/involve.2014.7.751 (https://doi.org/10.2140%2Finvolve.2014.7.751).
10. Johnson, Norman L.; Kotz, Samuel; Kemp, Adrienne W. (1992). Univariate Discrete Distributions (2nd ed.).
Wiley. pp. 312–314. ISBN 978-0-471-54897-3.
11. Dencks, Stefanie; Piepenbrock, Marion; Schmitz, Georg (2020). "Assessing Vessel Reconstruction in
Ultrasound Localization Microscopy by Maximum-Likelihood Estimation of a Zero-Inflated Poisson Model" (h
ttps://doi.org/10.1109%2FTUFFC.2020.2980063). IEEE Transactions on Ultrasonics, Ferroelectrics, and
Frequency Control. doi:10.1109/TUFFC.2020.2980063 (https://doi.org/10.1109%2FTUFFC.2020.2980063).
12. Corless, R. M.; Gonnet, G. H.; Hare, D. E. G.; Jeffrey, D. J.; Knuth, D. E. (1996). "On the Lambert W
Function". Advances in Computational Mathematics. 5 (1): 329–359. arXiv:1809.07369 (https://arxiv.org/abs/
1809.07369). doi:10.1007/BF02124750 (https://doi.org/10.1007%2FBF02124750).
13. Böhning, Dankmar; Dietz, Ekkehart; Schlattmann, Peter; Mendonca, Lisette; Kirchner, Ursula (1999). "The
zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology".
Journal of the Royal Statistical Society, Series A. 162 (2): 195–209. doi:10.1111/1467-985x.00130 (https://do
i.org/10.1111%2F1467-985x.00130).
14. Greene, William H. (1994). "Some Accounting for Excess Zeros and Sample Selection in Poisson and
Negative Binomial Regression Models". Working Paper EC-94-10: Department of Economics, New York
University. SSRN 1293115 (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1293115).
15. Hall, Daniel B. (2000). "Zero-Inflated Poisson and Binomial Regression with Random Effects: A Case
Study". Biometrics. 56 (4): 1030–1039. doi:10.1111/j.0006-341X.2000.01030.x (https://doi.org/10.1111%2Fj.
0006-341X.2000.01030.x).
16. Huiming, Zhang; Yunxiao Liu; Bo Li (2014). "Notes on discrete compound Poisson model with applications
to risk theory". Insurance: Mathematics and Economics. 59: 325–336. doi:10.1016/j.insmatheco.2014.09.012
(https://doi.org/10.1016%2Fj.insmatheco.2014.09.012).
17. Zygmund, A. (2002). Trigonometric Series. Cambridge: Cambridge University Press. p. 245.