0% found this document useful (0 votes)
20 views

R Handbook - Regression For Count Data

This document discusses regression approaches for count data, including Poisson regression, negative binomial regression, and zero-inflated regression. It provides examples using monarch butterfly count data with different garden plots. Functions used include glm, glm.nb, Anova, nagelkerke, emmeans and pairs for modeling and analyzing count data regression in R.

Uploaded by

bongjae
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

R Handbook - Regression For Count Data

This document discusses regression approaches for count data, including Poisson regression, negative binomial regression, and zero-inflated regression. It provides examples using monarch butterfly count data with different garden plots. Functions used include glm, glm.nb, Anova, nagelkerke, emmeans and pairs for modeling and analyzing count data regression in R.

Uploaded by

bongjae
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

27/03/2024, 11:09 R Handbook: Regression for Count Data

Summary and Analysis of Extension Program


Evaluation in R
Salvatore S. Mangiafico

Regression for Count Data


Introduction
Count data
In general, common parametric tests like t-test and anova shouldn’t be used for count data. One reason
is technical in nature: that parametric analyses require continuous data. Count data is by its nature
discrete and is left-censored at zero. (That is, usually counts can’t be less than zero.)
A second reason is more practical in nature. Count data are often highly skewed, and often produce
skewed residuals if a parametric approach is attempted. In this case, the hypothesis tests will not be
accurate.
For further discussion, see the “Count data may not be appropriate for common parametric tests”
section in the Introduction to Parametric Tests chapter.
Regression approaches for count data
The most common regression approach for handling count data is probably Poisson regression.
However, Poisson regression makes assumptions about the distribution of the data that may not be
appropriate in all cases. Hermite regression is a more flexible approach, but at the time of writing
doesn’t have a complete set of support functions in R. Quasi-Poisson regression is also flexible with data
assumptions, but also but at the time of writing doesn’t have a complete set of support functions in R.
Negative binomial regression allows for overdispersion in data; and zero-inflated regression is useful
when there are a high proportion of zero counts in the data.
Cautionary note
Note that model assumptions and pitfalls of these regression techniques are not discussed in depth
here. The reader is urged to understand the assumptions of this kind of modeling before proceeding.
Generalized linear regression
Poisson, Hermite, and related regression approaches are a type of generalized linear model. This should
not be confused with general linear model, which is implemented with the lm function. Generalized
linear models are implemented with the glm function or other functions.
Generalized linear models are used when the dependent variable is count, binary, multinomial, etc. More
information on using the glm function can be found by using help(glm) and help(family). For examples of
logistic regression, see the chapter Models for Nominal Data; the chapter Beta Regression for Percent and
Proportion Data; or Mangiafico (2015) in the “References” section. For a table of common uses for family
and link function in generalized linear models, see the Wikipedia article in the “References” section for
this chapter.

Packages used in this chapter


The packages used in this chapter include:
• pysch
• hermite
• lattice
• plyr
• boot

https://rcompanion.org/handbook/J_01.html 1/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
• DescTools
• ggplot2
• car
• multcompView
• emmeans
• MASS
• pscl
• rcompanion
• robust
The following commands will install these packages if they are not already installed:
if(!require(psych)){install.packages("psych")}
if(!require(hermite)){install.packages("hermite")}
if(!require(lattice)){install.packages("lattice")}
if(!require(plyr)){install.packages("plyr")}
if(!require(boot)){install.packages("boot")}
if(!require(DescTools)){install.packages("DescTools")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(car)){install.packages("car")}
if(!require(multcompView)){install.packages("multcompView")}
if(!require(emmeans)){install.packages("emmeans")}
if(!require(MASS)){install.packages("MASS")}
if(!require(pscl)){install.packages("pscl")}
if(!require(rcompanion)){install.packages("rcompanion")}
if(!require(robust)){install.packages("robust")}

Count data example


In this example, extension researchers have set up garden plots with different suites of plants, with each
suite identified as a level of the variable Garden below. In September, they counted the number of
monarch butterflies in each garden plot.
Input = ("
Garden Monarchs
A 0
A 4
A 2
A 2
A 0
A 6
A 0
A 0
B 5
B 9
B 7
B 5
B 7
B 5
B 9
B 5
C 10
C 14
C 12
C 12
C 10
C 16
C 10
C 10
")

Data = read.table(textConnection(Input),header=TRUE)

### Order factors by the order in data frame


### Otherwise, R will alphabetize them
Data$Garden = factor(Data$Garden,
levels=unique(Data$Garden))

https://rcompanion.org/handbook/J_01.html 2/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
### Check the data frame

library(psych)
headTail(Data)

str(Data)
summary(Data)

### Remove unnecessary objects


rm(Input)

Histograms
library(lattice)
histogram(~ Monarchs | Garden,
data=Data,
layout=c(1,3) # columns and rows of individual plots
)

Poisson regression example


Poisson regression makes certain assumptions about the relationship between the mean and the
dispersion of the dependent variable. Because this assumption may not be met for all data sets, Poisson
regression may not be recommended for routine use. Particularly, classic Poisson regression should be
avoided if there is overdispersion in the data or if there are several zero counts in the dependent
variable.

An alternate approach for data with overdispersion is negative binomial regression.

An alternative approach for data with many zeros is zero-inflated Poisson regression.
For further discussion, see the “Count data may not be appropriate for common parametric tests”
section in the Introduction to Parametric Tests chapter.

Note that model assumptions and pitfalls of this approach are not discussed here. The reader is urged to
understand the assumptions of this kind of modeling before proceeding.
model.p = glm(Monarchs ~ Garden,
https://rcompanion.org/handbook/J_01.html 3/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
data=Data,
family="poisson")
library(car)

Anova(model.p,
type="II",
test="LR")
Analysis of Deviance Table (Type II tests)

LR Chisq Df Pr(>Chisq)
Garden 66.463 2 3.697e-15 ***

library(rcompanion)

nagelkerke(model.p)
$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden 0.387929
Cox and Snell (ML) 0.937293
Nagelkerke (Cragg and Uhler) 0.938037

$Likelihood.ratio.test
Df.diff LogLik.diff Chisq p.value
-2 -33.231 66.463 3.6967e-15

library(multcompView)

library(emmeans)
marginal = emmeans(model.p,
~ Garden)
pairs(marginal,
adjust="tukey")

cld(marginal,
alpha=0.05,
Letters=letters, ### Use lower-case letters for .group
adjust="tukey") ### Tukey adjustment for multiple comparisons

Garden emmean SE df asymp.LCL asymp.UCL .group


A 0.5596158 0.2672450 NA -0.07849527 1.197727 a
B 1.8718022 0.1386750 NA 1.54068251 2.202922 b
C 2.4638532 0.1031421 NA 2.21757688 2.710130 c

Results are given on the log (not the response) scale.


Confidence level used: 0.95
Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
Tests are performed on the log scale
significance level used: alpha = 0.05
### Note that estimates are on log scale

Negative binomial regression example


Negative binomial regression is similar in application to Poisson regression, but allows for
overdispersion in the dependent count variable.

This example will use the glm.nb function in the MASS package. The Anova function in the car package
will be used for an analysis of deviance, and the nagelkerke function will be used to determine a p-value
and pseudo R-squared value for the model. Post-hoc analysis can be conducted with the emmeans
package.

Note that model assumptions and pitfalls of this approach are not discussed here. The reader is urged to
understand the assumptions of this kind of modeling before proceeding.

https://rcompanion.org/handbook/J_01.html 4/13
27/03/2024, 11:09 R Handbook: Regression for Count Data

library(MASS)
model.nb = glm.nb(Monarchs ~ Garden,
data=Data,
control = glm.control(maxit=10000))

library(car)
Anova(model.nb,
type="II",
test="LR")
Analysis of Deviance Table (Type II tests)

LR Chisq Df Pr(>Chisq)
Garden 66.464 2 3.694e-15 ***

library(rcompanion)

nagelkerke(model.nb)
$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden 0.255141
Cox and Snell (ML) 0.776007
Nagelkerke (Cragg and Uhler) 0.778217
$Likelihood.ratio.test
Df.diff LogLik.diff Chisq p.value
-2 -17.954 35.907 1.5952e-08

library(multcompView)

library(emmeans)
marginal = emmeans(model.nb,
~ Garden)

pairs(marginal,
adjust="tukey")

cld(marginal,
alpha = 0.05,
Letters = letters, ### Use lower-case letters for .group
type = "response", ### Report emmeans in orginal scale
adjust = "tukey") ### Tukey adjustment for multiple comparisons

Garden response SE df asymp.LCL asymp.UCL .group


A 1.75 0.4677072 NA 0.9244706 3.312707 a
B 6.50 0.9013878 NA 4.6677750 9.051422 b
C 11.75 1.2119200 NA 9.1850474 15.031223 c

Confidence level used: 0.95


Conf-level adjustment: sidak method for 3 estimates
Intervals are back-transformed from the log scale
P value adjustment: tukey method for comparing a family of 3 estimates
Tests are performed on the log scale
significance level used: alpha = 0.05

Zero-inflated regression example


Zero-inflated regression is similar in application to Poisson regression, but allows for an abundance of
zeros in the dependent count variable.
This example will use the zeroinfl function in the pscl package. The Anova function in the car package will
be used for an analysis of deviance, and the nagelkerke function will be used to determine a p-value and
pseudo R-squared value for the model. Post-hoc analysis can be conducted with the emmeans package.
library(pscl)

https://rcompanion.org/handbook/J_01.html 5/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
model.zi = zeroinfl(Monarchs ~ Garden,
data = Data,
dist = "poisson")
### dist = "negbin" may be used

summary(model.zi)
Call:
zeroinfl(formula = Monarchs ~ Garden | Garden, data = Data, dist = "poisson")

Count model coefficients (poisson with log link):


Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.2182 0.2847 4.278 1.89e-05 ***
GardenB 0.6536 0.3167 2.064 0.039 *
GardenC 1.2457 0.3029 4.113 3.90e-05 ***

Zero-inflation model coefficients (binomial with logit link):


Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.046e-02 7.363e-01 -0.096 0.924
GardenB -2.057e+01 1.071e+04 -0.002 0.998
GardenC -2.057e+01 1.071e+04 -0.002 0.998
### Note that there are separate coefficients for the
### Poisson part of the analysis and for the zero-inflation part.

library(car)
Anova(model.zi,
type="II",
test="Chisq")
Analysis of Deviance Table (Type II tests)

Df Chisq Pr(>Chisq)
Garden 2 23.914 6.414e-06 ***

library(rcompanion)

nagelkerke(model.zi)
$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden 0.284636
Cox and Snell (ML) 0.797356
Nagelkerke (Cragg and Uhler) 0.800291

$Likelihood.ratio.test
Df.diff LogLik.diff Chisq p.value
-4 -19.156 38.311 9.6649e-08

library(multcompView)

library(emmeans)
marginal = emmeans(model.zi,
~ Garden)

pairs(marginal,
adjust="tukey")

cld(marginal,
alpha=0.05,
Letters=letters, ### Use lower-case letters for .group
adjust="tukey") ### Tukey adjustment for multiple comparisons

Garden emmean SE df asymp.LCL asymp.UCL .group


A 1.75 0.7586301 NA -0.06140972 3.561410 a
B 6.50 0.9013877 NA 4.34772229 8.652278 b
C 11.75 1.2119199 NA 8.85625304 14.643747 c

Confidence level used: 0.95


Conf-level adjustment: sidak method for 3 estimates

https://rcompanion.org/handbook/J_01.html 6/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05
### Note, emmeans are on the original measurement scale

Robust Poisson regression example


Robust Poisson regression is robust to outliers in the dependent variable.
This example uses the glmRob function in the robust package. The anova function can be used to conduct
an analysis of deviance. The p-value for the model can be found by comparing the model to a null model.
However, at the time of writing, I don’t know of any way to determine AIC or pseudo R-squared for the
model.
At the time of writing, the glmRob function can only use the Poisson and binomial families of models.

An alternate method is the glmrob function in the robustbase package.


library(robust)
model.rob = glmRob(Monarchs ~ Garden,
data = Data,
family = "poisson")
anova(model.rob, test="Chisq")

Df Deviance Resid. Df Resid. Dev Pr(>Chi)


NULL NA NA 23 430.19850 NA
Garden 2 400.9221 21 29.27641 3.567693e-63

model.rob.null = glmRob(Monarchs ~ 1,
data = Data,
family = "poisson")
anova(model.rob.null, model.rob, test="Chisq")

Terms Resid. Df Resid. Dev Test Df Deviance Pr(>Chi)


1 1 23 95.12606 NA NA NA
2 Garden 21 29.27641 2 65.84965 5.536815e-11

Quasi-Poisson regression
Quasi-Poisson regression is useful since it has a variable dispersion parameter, so that it can model over-
dispersed data. It may be better than negative binomial regression in some circumstances (Verhoef and
Boveng. 2007).

At the time of writing, Quasi-Poisson regression doesn’t have complete set of support functions in R.
Using the quasipoisson family option in the glm function, the results will have the same parameter
coefficients as with the poisson option, but the inference statistics are adjusted in the summary function.
The Anova function in the car package can be used for an analysis of deviance table, and the emmeans
package can be used for post-hoc comparisons. Since the model doesn’t produce a log-likelihood value, I
don’t know a way to produce a p-value for the mode, for a pseudo R-squared value for the model.
.
model.qp = glm(Monarchs ~ Garden,
data=Data,
family="quasipoisson")

library(car)
Anova(model.qp,
type="II",
test="LR")

Analysis of Deviance Table (Type II tests)

https://rcompanion.org/handbook/J_01.html 7/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
Response: Monarchs
LR Chisq Df Pr(>Chisq)
Garden 52.286 2 4.429e-12 ***

library(multcompView)

library(emmeans)
marginal = emmeans(model.qp,
~ Garden)

pairs(marginal,
adjust="tukey")

cld(marginal,
alpha=0.05,
Letters=letters,
adjust="tukey")

Garden emmean SE df asymp.LCL asymp.UCL .group


A 0.5596158 0.3013057 NA -0.1598233 1.279055 a
B 1.8718022 0.1563493 NA 1.4984809 2.245123 b
C 2.4638532 0.1162877 NA 2.1861887 2.741518 c

Results are given on the log (not the response) scale.


Confidence level used: 0.95
Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05

Hermite regression
The generalized Hermite distribution is a more general distribution that can handle overdispersion or
multimodality (Moriñ a and others, 2015). This makes generalized Hermite regression a powerful and
flexible tool for modeling count data. It is implemented with the hermite package.

Fitting models with the hermite package can be somewhat difficult. One issue is that model fitting may
fail without some parameters being specified. Often specifying an appropriate value for the m option will
help.

A further difficulty with this approach is that, at the time writing, the package isn’t supported by the
anova function to compare models, the Anova function to test effects, or other useful functions like
emmeans for factor effects.
The hermite package is used to conduct hermite regression. Here, the m=3 option is specified. Often the
default m=NULL can be used. In this case, if the m value is not specified, the function cannot complete the
model fitting, and errors are produced. Using m=2 often works. Here, m=3 was used because it
produced a model with a lower AIC than did the m=2 option.
library(hermite)

model = glm.hermite(Monarchs ~ Garden,


data = Data,
link = "log",
m=3)

summary(model)

Coefficients:
Estimate Std. Error z value p-value
(Intercept) 0.5081083 0.3251349 1.5627612 1.181088e-01
GardenB 1.3700567 0.3641379 3.7624662 1.682461e-04
GardenC 1.9596153 0.3476326 5.6370291 1.730089e-08
dispersion.index 1.0820807 0.2877977 0.1281707 3.601681e-01
order 3.0000000 NA NA NA
(Likelihood ratio test against Poisson is reported by *z value* for *dispersion.index*)

AIC: 112.7762

https://rcompanion.org/handbook/J_01.html 8/13
27/03/2024, 11:09 R Handbook: Regression for Count Data

Post-hoc analysis: Medians and confidence intervals


At the time of writing, the emmeans package does not support post-hoc analysis of regressions produced
with the hermite package.
One imperfect approach for post-hoc analysis would be to examine median counts for treatments and
the confidence intervals of these medians. We can conclude that groups with non-overlapping 95%
confidence intervals for their medians are significantly different.

However, this approach does not represent any information learned from the Hermite regression.
A second issue is that, because the dependent variable is not continuous, the distribution of the
bootstrapped confidence intervals is not likely to be continuous, and so is may not be reliable.

To get confidence intervals for the medians for each group, we will use the groupwiseMedian function.
Here I used the percentile method for confidence intervals.
library(rcompanion)

Sum = groupwiseMedian(Monarchs ~ Garden,


data=Data,
conf=0.95,
R=5000,
percentile=TRUE,
bca=FALSE,
digits=3)
Sum

Garden n Median Conf.level Percentile.lower Percentile.upper


1 A 8 1 0.95 0 4
2 B 8 6 0.95 5 8
3 C 8 11 0.95 10 14

### In this case, none of the confidence intervals overlap.

Plot of medians and confidence intervals


The data frame Sum created above will be passed to ggplot for plotting. At the end of the code, annotate
is used to add text to the plot to indicate which medians are significantly different from one another.
library(ggplot2)

ggplot(Sum, ### The data frame to use.


aes(x = Garden,
y = Median)) +
geom_errorbar(aes(ymin = Percentile.lower,
ymax = Percentile.upper),
width = 0.05,
size = 1) +
geom_point(shape = 15,
size = 5) +
theme_bw() +
theme(axis.title = element_text(face = "bold")) +
ylab("Median count of monarch butterflies") +

annotate("text",
x = 1:3,
y = c(5, 10, 15),
label = c("Group 3", "Group 2", "Group 1"))

https://rcompanion.org/handbook/J_01.html 9/13
27/03/2024, 11:09 R Handbook: Regression for Count Data

Optional code for chi-square goodness-of-fit test


An alternative approach to handling count data is to sum up the counts for treatments, and use a chi-
square test or related test. Here, a chi-square goodness-of-fit test is used to see if counts differ from
“expected” equal proportions.

Omnibus test
Tabla = xtabs(Monarchs ~ Garden,
data = Data)

Tabla

Garden
A B C
14 52 94

chisq.test(Tabla)
Chi-squared test for given probabilities

X-squared = 60.05, df = 2, p-value = 9.127e-14

Post-hoc chi-square tests


Garden.A = sum(Data$Monarchs[Data$Garden=="A"])
Garden.B = sum(Data$Monarchs[Data$Garden=="B"])
Garden.C = sum(Data$Monarchs[Data$Garden=="C"])
observed = c(Garden.A, Garden.B) # observed frequencies
expected = c(1/2, 1/2) # expected proportions

chisq.test(x = observed,
p = expected)

Chi-squared test for given probabilities

X-squared = 21.879, df = 1, p-value = 2.904e-06

https://rcompanion.org/handbook/J_01.html 10/13
27/03/2024, 11:09 R Handbook: Regression for Count Data

observed = c(Garden.B, Garden.C) # observed frequencies


expected = c(1/2, 1/2) # expected proportions

chisq.test(x = observed,
p = expected)
Chi-squared test for given probabilities

X-squared = 12.082, df = 1, p-value = 0.0005091

observed = c(Garden.A, Garden.C) # observed frequencies


expected = c(1/2, 1/2) # expected proportions

chisq.test(x = observed,
p = expected)
Chi-squared test for given probabilities

X-squared = 59.259, df = 1, p-value = 1.382e-14

Optional analysis: Vuong test to compare Poisson, negative binomial, and zero-
inflated models
The Vuong test, implemented by the pscl package, can test two non-nested models. It works with negbin,
zeroinfl, and some glm model objects which are fitted to the same data.
The null hypothesis is that there is no difference in models. The function produces three tests, a “Raw”
test, an AIC-corrected, and a BIC-corrected, any of which could be used.

It has been suggested that the Vuong test not be used to test for zero-inflation (Wilson, 2015).
Define models
model.p = glm(Monarchs ~ Garden,
data=Data,
family="poisson")

library(MASS)

model.nb = glm.nb(Monarchs ~ Garden,


data=Data,
control = glm.control(maxit=10000))

library(pscl)
model.zi = zeroinfl(Monarchs ~ Garden,
data = Data,
dist = "poisson")

Vuong test
library(pscl)

vuong(model.p,
model.nb,
digits = 4)
Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw 0.03324988 model1 > model2 0.48674
AIC-corrected 0.03324988 model1 > model2 0.48674
BIC-corrected 0.03324988 model1 > model2 0.48674
### Positive Vuong z-statistic suggests that model 1 is superior,
### but, in this case, the difference is not significant,

https://rcompanion.org/handbook/J_01.html 11/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
### and the value of the statistic is probably too tiny to be meaningful.

vuong(model.p,
model.zi,
digits = 4)

Vuong Non-Nested Hypothesis Test-Statistic:


(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw -1.4424725 model2 > model1 0.074585
AIC-corrected -0.4335210 model2 > model1 0.332318
BIC-corrected 0.1607786 model1 > model2 0.436134

### Negative Vuong z-statistic suggests that model 2 is superior.


### If the Raw statistic is used, p = 0.07 gives some evidence
### that zi model is superior.

vuong(model.nb,
model.zi,
digits = 4)

Vuong Non-Nested Hypothesis Test-Statistic:


(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw -1.4424725 model2 > model1 0.074585
AIC-corrected -0.4335210 model2 > model1 0.332318
BIC-corrected 0.1607786 model1 > model2 0.436134
### Negative Vuong z-statistic suggests that model 2 is superior.
### If the Raw statistic is used, p = 0.07 gives some evidence
### that zi model is superior.

References
Moriñ a, D., M. Higueras, P. Puig, and M. Oliveira. 2015. Generalized Hermite Distribution
Modelling with the R Package hermite. The R Journal 7(2):263–274. journal.r-
project.org/archive/2015-2/morina-higueras-puig-etal.pdf.
help(package="hermite")

library(hermite); ?glm.hermite
library(MASS); ?glm.nb

library(pscl); ?zeroinfl
library(pscl); ?vuong

“Simple Logistic Regression” in Mangiafico, S.S. 2015. An R Companion for the Handbook of Biological
Statistics, version 1.09. rcompanion.org/rcompanion/e_06.html.
"Generalized linear model: Link function". No date. Wikipedia. Retrieved 31 Jan. 2016.
en.wikipedia.org/wiki/Generalized_linear_model#Link_function.

Verhoef, J.M. and P.L. Boveng. 2007. Quasi-Poisson vs. negative binomial regression: How should we
model overdispersed count data? Ecology 88(11) 2766–2772.
Wilson, P. 2015. The Misuse of the Vuong Test for Non-Nested Models to Test for Zero-Inflation.
Economic Letters 127: 51–53. cybermetrics.wlv.ac.uk/paperdata/misusevuong.pdf

References for count data

https://rcompanion.org/handbook/J_01.html 12/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
Grace-Martin, K. No date. "Regression Models for Count Data". The Analysis Factor.
www.theanalysisfactor.com/regression-models-for-count-data/.
Grace-Martin, K. No date. " Zero-Inflated Poisson Models for Count Outcomes". The Analysis Factor.
www.theanalysisfactor.com/zero-inflated-poisson-models-for-count-outcomes/.

©2016 by Salvatore S. Mangiafico.


Rutgers Cooperative Extension, New Brunswick, NJ.
Non-commercial reproduction of this content, with attribution, is permitted.
For-profit reproduction without permission is prohibited.
If you use the code or information in this site in a published work, please cite it as a source. Also, if you are an instructor
and use this book in your course, please let me know. My contact information is on the About the Author of this Book page.

Citation
Mangiafico, S.S. 2016. Summary and Analysis of Extension Program Evaluation in R, version 1.20.05, revised 2023.
rcompanion.org/handbook/. (Pdf version: rcompanion.org/documents/RHandbookProgramEvaluation.pdf.)

https://rcompanion.org/handbook/J_01.html 13/13

You might also like