0% found this document useful (0 votes)

20 views

R Handbook - Regression For Count Data

This document discusses regression approaches for count data, including Poisson regression, negative binomial regression, and zero-inflated regression. It provides examples using monarch butterfly count data with different garden plots. Functions used include glm, glm.nb, Anova, nagelkerke, emmeans and pairs for modeling and analyzing count data regression in R.

Uploaded by

bongjae

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

R Handbook - Regression For Count Data

Uploaded by

bongjae

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

27/03/2024, 11:09 R Handbook: Regression for Count Data

Summary and Analysis of Extension Program

Evaluation in R
Salvatore S. Mangiafico

Regression for Count Data

Introduction
Count data
In general, common parametric tests like t-test and anova shouldn’t be used for count data. One reason
is technical in nature: that parametric analyses require continuous data. Count data is by its nature
discrete and is left-censored at zero. (That is, usually counts can’t be less than zero.)
A second reason is more practical in nature. Count data are often highly skewed, and often produce
skewed residuals if a parametric approach is attempted. In this case, the hypothesis tests will not be
accurate.
For further discussion, see the “Count data may not be appropriate for common parametric tests”
section in the Introduction to Parametric Tests chapter.
Regression approaches for count data
The most common regression approach for handling count data is probably Poisson regression.
However, Poisson regression makes assumptions about the distribution of the data that may not be
appropriate in all cases. Hermite regression is a more flexible approach, but at the time of writing
doesn’t have a complete set of support functions in R. Quasi-Poisson regression is also flexible with data
assumptions, but also but at the time of writing doesn’t have a complete set of support functions in R.
Negative binomial regression allows for overdispersion in data; and zero-inflated regression is useful
when there are a high proportion of zero counts in the data.
Cautionary note
Note that model assumptions and pitfalls of these regression techniques are not discussed in depth
here. The reader is urged to understand the assumptions of this kind of modeling before proceeding.
Generalized linear regression
Poisson, Hermite, and related regression approaches are a type of generalized linear model. This should
not be confused with general linear model, which is implemented with the lm function. Generalized
linear models are implemented with the glm function or other functions.
Generalized linear models are used when the dependent variable is count, binary, multinomial, etc. More
information on using the glm function can be found by using help(glm) and help(family). For examples of
logistic regression, see the chapter Models for Nominal Data; the chapter Beta Regression for Percent and
Proportion Data; or Mangiafico (2015) in the “References” section. For a table of common uses for family
and link function in generalized linear models, see the Wikipedia article in the “References” section for
this chapter.

Packages used in this chapter

The packages used in this chapter include:
• pysch
• hermite
• lattice
• plyr
• boot

https://rcompanion.org/handbook/J_01.html 1/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
• DescTools
• ggplot2
• car
• multcompView
• emmeans
• MASS
• pscl
• rcompanion
• robust
The following commands will install these packages if they are not already installed:
if(!require(psych)){install.packages("psych")}
if(!require(hermite)){install.packages("hermite")}
if(!require(lattice)){install.packages("lattice")}
if(!require(plyr)){install.packages("plyr")}
if(!require(boot)){install.packages("boot")}
if(!require(DescTools)){install.packages("DescTools")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(car)){install.packages("car")}
if(!require(multcompView)){install.packages("multcompView")}
if(!require(emmeans)){install.packages("emmeans")}
if(!require(MASS)){install.packages("MASS")}
if(!require(pscl)){install.packages("pscl")}
if(!require(rcompanion)){install.packages("rcompanion")}
if(!require(robust)){install.packages("robust")}

Count data example

In this example, extension researchers have set up garden plots with different suites of plants, with each
suite identified as a level of the variable Garden below. In September, they counted the number of
monarch butterflies in each garden plot.
Input = ("
Garden Monarchs
A 0
A 4
A 2
A 2
A 0
A 6
A 0
A 0
B 5
B 9
B 7
B 5
B 7
B 5
B 9
B 5
C 10
C 14
C 12
C 12
C 10
C 16
C 10
C 10
")

Data = read.table(textConnection(Input),header=TRUE)

### Order factors by the order in data frame

### Otherwise, R will alphabetize them
Data$Garden = factor(Data$Garden,
levels=unique(Data$Garden))

https://rcompanion.org/handbook/J_01.html 2/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
### Check the data frame

library(psych)
headTail(Data)

str(Data)
summary(Data)

### Remove unnecessary objects

rm(Input)

Histograms
library(lattice)
histogram(~ Monarchs | Garden,
data=Data,
layout=c(1,3) # columns and rows of individual plots
)

Poisson regression example

Poisson regression makes certain assumptions about the relationship between the mean and the
dispersion of the dependent variable. Because this assumption may not be met for all data sets, Poisson
regression may not be recommended for routine use. Particularly, classic Poisson regression should be
avoided if there is overdispersion in the data or if there are several zero counts in the dependent
variable.

An alternate approach for data with overdispersion is negative binomial regression.

An alternative approach for data with many zeros is zero-inflated Poisson regression.
For further discussion, see the “Count data may not be appropriate for common parametric tests”
section in the Introduction to Parametric Tests chapter.

Note that model assumptions and pitfalls of this approach are not discussed here. The reader is urged to
understand the assumptions of this kind of modeling before proceeding.
model.p = glm(Monarchs ~ Garden,
https://rcompanion.org/handbook/J_01.html 3/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
data=Data,
family="poisson")
library(car)

Anova(model.p,
type="II",
test="LR")
Analysis of Deviance Table (Type II tests)

LR Chisq Df Pr(>Chisq)
Garden 66.463 2 3.697e-15 ***

library(rcompanion)

nagelkerke(model.p)
$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden 0.387929
Cox and Snell (ML) 0.937293
Nagelkerke (Cragg and Uhler) 0.938037

$Likelihood.ratio.test
Df.diff LogLik.diff Chisq p.value
-2 -33.231 66.463 3.6967e-15

library(multcompView)

library(emmeans)
marginal = emmeans(model.p,
~ Garden)
pairs(marginal,
adjust="tukey")

cld(marginal,
alpha=0.05,
Letters=letters, ### Use lower-case letters for .group
adjust="tukey") ### Tukey adjustment for multiple comparisons

Garden emmean SE df asymp.LCL asymp.UCL .group

A 0.5596158 0.2672450 NA -0.07849527 1.197727 a
B 1.8718022 0.1386750 NA 1.54068251 2.202922 b
C 2.4638532 0.1031421 NA 2.21757688 2.710130 c

Results are given on the log (not the response) scale.

Confidence level used: 0.95
Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
Tests are performed on the log scale
significance level used: alpha = 0.05
### Note that estimates are on log scale

Negative binomial regression example

Negative binomial regression is similar in application to Poisson regression, but allows for
overdispersion in the dependent count variable.

This example will use the glm.nb function in the MASS package. The Anova function in the car package
will be used for an analysis of deviance, and the nagelkerke function will be used to determine a p-value
and pseudo R-squared value for the model. Post-hoc analysis can be conducted with the emmeans
package.

Note that model assumptions and pitfalls of this approach are not discussed here. The reader is urged to
understand the assumptions of this kind of modeling before proceeding.

https://rcompanion.org/handbook/J_01.html 4/13
27/03/2024, 11:09 R Handbook: Regression for Count Data

library(MASS)
model.nb = glm.nb(Monarchs ~ Garden,
data=Data,
control = glm.control(maxit=10000))

library(car)
Anova(model.nb,
type="II",
test="LR")
Analysis of Deviance Table (Type II tests)

LR Chisq Df Pr(>Chisq)
Garden 66.464 2 3.694e-15 ***

library(rcompanion)

nagelkerke(model.nb)
$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden 0.255141
Cox and Snell (ML) 0.776007
Nagelkerke (Cragg and Uhler) 0.778217
$Likelihood.ratio.test
Df.diff LogLik.diff Chisq p.value
-2 -17.954 35.907 1.5952e-08

library(multcompView)

library(emmeans)
marginal = emmeans(model.nb,
~ Garden)

pairs(marginal,
adjust="tukey")

cld(marginal,
alpha = 0.05,
Letters = letters, ### Use lower-case letters for .group
type = "response", ### Report emmeans in orginal scale
adjust = "tukey") ### Tukey adjustment for multiple comparisons

Garden response SE df asymp.LCL asymp.UCL .group

A 1.75 0.4677072 NA 0.9244706 3.312707 a
B 6.50 0.9013878 NA 4.6677750 9.051422 b
C 11.75 1.2119200 NA 9.1850474 15.031223 c

Confidence level used: 0.95

Conf-level adjustment: sidak method for 3 estimates
Intervals are back-transformed from the log scale
P value adjustment: tukey method for comparing a family of 3 estimates
Tests are performed on the log scale
significance level used: alpha = 0.05

Zero-inflated regression example

Zero-inflated regression is similar in application to Poisson regression, but allows for an abundance of
zeros in the dependent count variable.
This example will use the zeroinfl function in the pscl package. The Anova function in the car package will
be used for an analysis of deviance, and the nagelkerke function will be used to determine a p-value and
pseudo R-squared value for the model. Post-hoc analysis can be conducted with the emmeans package.
library(pscl)

https://rcompanion.org/handbook/J_01.html 5/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
model.zi = zeroinfl(Monarchs ~ Garden,
data = Data,
dist = "poisson")
### dist = "negbin" may be used

summary(model.zi)
Call:
zeroinfl(formula = Monarchs ~ Garden | Garden, data = Data, dist = "poisson")

Count model coefficients (poisson with log link):

Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.2182 0.2847 4.278 1.89e-05 ***
GardenB 0.6536 0.3167 2.064 0.039 *
GardenC 1.2457 0.3029 4.113 3.90e-05 ***

Zero-inflation model coefficients (binomial with logit link):

Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.046e-02 7.363e-01 -0.096 0.924
GardenB -2.057e+01 1.071e+04 -0.002 0.998
GardenC -2.057e+01 1.071e+04 -0.002 0.998
### Note that there are separate coefficients for the
### Poisson part of the analysis and for the zero-inflation part.

library(car)
Anova(model.zi,
type="II",
test="Chisq")
Analysis of Deviance Table (Type II tests)

Df Chisq Pr(>Chisq)
Garden 2 23.914 6.414e-06 ***

library(rcompanion)

nagelkerke(model.zi)
$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden 0.284636
Cox and Snell (ML) 0.797356
Nagelkerke (Cragg and Uhler) 0.800291

$Likelihood.ratio.test
Df.diff LogLik.diff Chisq p.value
-4 -19.156 38.311 9.6649e-08

library(multcompView)

library(emmeans)
marginal = emmeans(model.zi,
~ Garden)

pairs(marginal,
adjust="tukey")

cld(marginal,
alpha=0.05,
Letters=letters, ### Use lower-case letters for .group
adjust="tukey") ### Tukey adjustment for multiple comparisons

Garden emmean SE df asymp.LCL asymp.UCL .group

A 1.75 0.7586301 NA -0.06140972 3.561410 a
B 6.50 0.9013877 NA 4.34772229 8.652278 b
C 11.75 1.2119199 NA 8.85625304 14.643747 c

Confidence level used: 0.95

Conf-level adjustment: sidak method for 3 estimates

https://rcompanion.org/handbook/J_01.html 6/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05
### Note, emmeans are on the original measurement scale

Robust Poisson regression example

Robust Poisson regression is robust to outliers in the dependent variable.
This example uses the glmRob function in the robust package. The anova function can be used to conduct
an analysis of deviance. The p-value for the model can be found by comparing the model to a null model.
However, at the time of writing, I don’t know of any way to determine AIC or pseudo R-squared for the
model.
At the time of writing, the glmRob function can only use the Poisson and binomial families of models.

An alternate method is the glmrob function in the robustbase package.

library(robust)
model.rob = glmRob(Monarchs ~ Garden,
data = Data,
family = "poisson")
anova(model.rob, test="Chisq")

Df Deviance Resid. Df Resid. Dev Pr(>Chi)

NULL NA NA 23 430.19850 NA
Garden 2 400.9221 21 29.27641 3.567693e-63

model.rob.null = glmRob(Monarchs ~ 1,
data = Data,
family = "poisson")
anova(model.rob.null, model.rob, test="Chisq")

Terms Resid. Df Resid. Dev Test Df Deviance Pr(>Chi)

1 1 23 95.12606 NA NA NA
2 Garden 21 29.27641 2 65.84965 5.536815e-11

Quasi-Poisson regression
Quasi-Poisson regression is useful since it has a variable dispersion parameter, so that it can model over-
dispersed data. It may be better than negative binomial regression in some circumstances (Verhoef and
Boveng. 2007).

At the time of writing, Quasi-Poisson regression doesn’t have complete set of support functions in R.
Using the quasipoisson family option in the glm function, the results will have the same parameter
coefficients as with the poisson option, but the inference statistics are adjusted in the summary function.
The Anova function in the car package can be used for an analysis of deviance table, and the emmeans
package can be used for post-hoc comparisons. Since the model doesn’t produce a log-likelihood value, I
don’t know a way to produce a p-value for the mode, for a pseudo R-squared value for the model.
.
model.qp = glm(Monarchs ~ Garden,
data=Data,
family="quasipoisson")

library(car)
Anova(model.qp,
type="II",
test="LR")

Analysis of Deviance Table (Type II tests)

https://rcompanion.org/handbook/J_01.html 7/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
Response: Monarchs
LR Chisq Df Pr(>Chisq)
Garden 52.286 2 4.429e-12 ***

library(multcompView)

library(emmeans)
marginal = emmeans(model.qp,
~ Garden)

pairs(marginal,
adjust="tukey")

cld(marginal,
alpha=0.05,
Letters=letters,
adjust="tukey")

Garden emmean SE df asymp.LCL asymp.UCL .group

A 0.5596158 0.3013057 NA -0.1598233 1.279055 a
B 1.8718022 0.1563493 NA 1.4984809 2.245123 b
C 2.4638532 0.1162877 NA 2.1861887 2.741518 c

Results are given on the log (not the response) scale.

Confidence level used: 0.95
Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05

Hermite regression
The generalized Hermite distribution is a more general distribution that can handle overdispersion or
multimodality (Moriñ a and others, 2015). This makes generalized Hermite regression a powerful and
flexible tool for modeling count data. It is implemented with the hermite package.

Fitting models with the hermite package can be somewhat difficult. One issue is that model fitting may
fail without some parameters being specified. Often specifying an appropriate value for the m option will
help.

A further difficulty with this approach is that, at the time writing, the package isn’t supported by the
anova function to compare models, the Anova function to test effects, or other useful functions like
emmeans for factor effects.
The hermite package is used to conduct hermite regression. Here, the m=3 option is specified. Often the
default m=NULL can be used. In this case, if the m value is not specified, the function cannot complete the
model fitting, and errors are produced. Using m=2 often works. Here, m=3 was used because it
produced a model with a lower AIC than did the m=2 option.
library(hermite)

model = glm.hermite(Monarchs ~ Garden,

data = Data,
link = "log",
m=3)

summary(model)

Coefficients:
Estimate Std. Error z value p-value
(Intercept) 0.5081083 0.3251349 1.5627612 1.181088e-01
GardenB 1.3700567 0.3641379 3.7624662 1.682461e-04
GardenC 1.9596153 0.3476326 5.6370291 1.730089e-08
dispersion.index 1.0820807 0.2877977 0.1281707 3.601681e-01
order 3.0000000 NA NA NA
(Likelihood ratio test against Poisson is reported by *z value* for *dispersion.index*)

AIC: 112.7762

https://rcompanion.org/handbook/J_01.html 8/13
27/03/2024, 11:09 R Handbook: Regression for Count Data

Post-hoc analysis: Medians and confidence intervals

At the time of writing, the emmeans package does not support post-hoc analysis of regressions produced
with the hermite package.
One imperfect approach for post-hoc analysis would be to examine median counts for treatments and
the confidence intervals of these medians. We can conclude that groups with non-overlapping 95%
confidence intervals for their medians are significantly different.

However, this approach does not represent any information learned from the Hermite regression.
A second issue is that, because the dependent variable is not continuous, the distribution of the
bootstrapped confidence intervals is not likely to be continuous, and so is may not be reliable.

To get confidence intervals for the medians for each group, we will use the groupwiseMedian function.
Here I used the percentile method for confidence intervals.
library(rcompanion)

Sum = groupwiseMedian(Monarchs ~ Garden,

data=Data,
conf=0.95,
R=5000,
percentile=TRUE,
bca=FALSE,
digits=3)
Sum

Garden n Median Conf.level Percentile.lower Percentile.upper

1 A 8 1 0.95 0 4
2 B 8 6 0.95 5 8
3 C 8 11 0.95 10 14

### In this case, none of the confidence intervals overlap.

Plot of medians and confidence intervals

The data frame Sum created above will be passed to ggplot for plotting. At the end of the code, annotate
is used to add text to the plot to indicate which medians are significantly different from one another.
library(ggplot2)

ggplot(Sum, ### The data frame to use.

aes(x = Garden,
y = Median)) +
geom_errorbar(aes(ymin = Percentile.lower,
ymax = Percentile.upper),
width = 0.05,
size = 1) +
geom_point(shape = 15,
size = 5) +
theme_bw() +
theme(axis.title = element_text(face = "bold")) +
ylab("Median count of monarch butterflies") +

annotate("text",
x = 1:3,
y = c(5, 10, 15),
label = c("Group 3", "Group 2", "Group 1"))

https://rcompanion.org/handbook/J_01.html 9/13
27/03/2024, 11:09 R Handbook: Regression for Count Data

Optional code for chi-square goodness-of-fit test

An alternative approach to handling count data is to sum up the counts for treatments, and use a chi-
square test or related test. Here, a chi-square goodness-of-fit test is used to see if counts differ from
“expected” equal proportions.

Omnibus test
Tabla = xtabs(Monarchs ~ Garden,
data = Data)

Tabla

Garden
A B C
14 52 94

chisq.test(Tabla)
Chi-squared test for given probabilities

X-squared = 60.05, df = 2, p-value = 9.127e-14

Post-hoc chi-square tests

Garden.A = sum(Data$Monarchs[Data$Garden=="A"])
Garden.B = sum(Data$Monarchs[Data$Garden=="B"])
Garden.C = sum(Data$Monarchs[Data$Garden=="C"])
observed = c(Garden.A, Garden.B) # observed frequencies
expected = c(1/2, 1/2) # expected proportions

chisq.test(x = observed,
p = expected)

Chi-squared test for given probabilities

X-squared = 21.879, df = 1, p-value = 2.904e-06

https://rcompanion.org/handbook/J_01.html 10/13
27/03/2024, 11:09 R Handbook: Regression for Count Data

observed = c(Garden.B, Garden.C) # observed frequencies

expected = c(1/2, 1/2) # expected proportions

chisq.test(x = observed,
p = expected)
Chi-squared test for given probabilities

X-squared = 12.082, df = 1, p-value = 0.0005091

observed = c(Garden.A, Garden.C) # observed frequencies

expected = c(1/2, 1/2) # expected proportions

chisq.test(x = observed,
p = expected)
Chi-squared test for given probabilities

X-squared = 59.259, df = 1, p-value = 1.382e-14

Optional analysis: Vuong test to compare Poisson, negative binomial, and zero-
inflated models
The Vuong test, implemented by the pscl package, can test two non-nested models. It works with negbin,
zeroinfl, and some glm model objects which are fitted to the same data.
The null hypothesis is that there is no difference in models. The function produces three tests, a “Raw”
test, an AIC-corrected, and a BIC-corrected, any of which could be used.

It has been suggested that the Vuong test not be used to test for zero-inflation (Wilson, 2015).
Define models
model.p = glm(Monarchs ~ Garden,
data=Data,
family="poisson")

library(MASS)

model.nb = glm.nb(Monarchs ~ Garden,

data=Data,
control = glm.control(maxit=10000))

library(pscl)
model.zi = zeroinfl(Monarchs ~ Garden,
data = Data,
dist = "poisson")

Vuong test
library(pscl)

vuong(model.p,
model.nb,
digits = 4)
Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw 0.03324988 model1 > model2 0.48674
AIC-corrected 0.03324988 model1 > model2 0.48674
BIC-corrected 0.03324988 model1 > model2 0.48674
### Positive Vuong z-statistic suggests that model 1 is superior,
### but, in this case, the difference is not significant,

https://rcompanion.org/handbook/J_01.html 11/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
### and the value of the statistic is probably too tiny to be meaningful.

vuong(model.p,
model.zi,
digits = 4)

Vuong Non-Nested Hypothesis Test-Statistic:

### Negative Vuong z-statistic suggests that model 2 is superior.

### If the Raw statistic is used, p = 0.07 gives some evidence
### that zi model is superior.

vuong(model.nb,
model.zi,
digits = 4)

Vuong Non-Nested Hypothesis Test-Statistic:

(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw -1.4424725 model2 > model1 0.074585
AIC-corrected -0.4335210 model2 > model1 0.332318
BIC-corrected 0.1607786 model1 > model2 0.436134
### Negative Vuong z-statistic suggests that model 2 is superior.
### If the Raw statistic is used, p = 0.07 gives some evidence
### that zi model is superior.

References
Moriñ a, D., M. Higueras, P. Puig, and M. Oliveira. 2015. Generalized Hermite Distribution
Modelling with the R Package hermite. The R Journal 7(2):263–274. journal.r-
project.org/archive/2015-2/morina-higueras-puig-etal.pdf.
help(package="hermite")

library(hermite); ?glm.hermite
library(MASS); ?glm.nb

library(pscl); ?zeroinfl
library(pscl); ?vuong

“Simple Logistic Regression” in Mangiafico, S.S. 2015. An R Companion for the Handbook of Biological
Statistics, version 1.09. rcompanion.org/rcompanion/e_06.html.
"Generalized linear model: Link function". No date. Wikipedia. Retrieved 31 Jan. 2016.
en.wikipedia.org/wiki/Generalized_linear_model#Link_function.

Verhoef, J.M. and P.L. Boveng. 2007. Quasi-Poisson vs. negative binomial regression: How should we
model overdispersed count data? Ecology 88(11) 2766–2772.
Wilson, P. 2015. The Misuse of the Vuong Test for Non-Nested Models to Test for Zero-Inflation.
Economic Letters 127: 51–53. cybermetrics.wlv.ac.uk/paperdata/misusevuong.pdf

References for count data

https://rcompanion.org/handbook/J_01.html 12/13
27/03/2024, 11:09 R Handbook: Regression for Count Data
Grace-Martin, K. No date. "Regression Models for Count Data". The Analysis Factor.
www.theanalysisfactor.com/regression-models-for-count-data/.
Grace-Martin, K. No date. " Zero-Inflated Poisson Models for Count Outcomes". The Analysis Factor.
www.theanalysisfactor.com/zero-inflated-poisson-models-for-count-outcomes/.

Rutgers Cooperative Extension, New Brunswick, NJ.
Non-commercial reproduction of this content, with attribution, is permitted.
For-profit reproduction without permission is prohibited.
If you use the code or information in this site in a published work, please cite it as a source. Also, if you are an instructor
and use this book in your course, please let me know. My contact information is on the About the Author of this Book page.

Citation
Mangiafico, S.S. 2016. Summary and Analysis of Extension Program Evaluation in R, version 1.20.05, revised 2023.
rcompanion.org/handbook/. (Pdf version: rcompanion.org/documents/RHandbookProgramEvaluation.pdf.)

https://rcompanion.org/handbook/J_01.html 13/13

Companion Applied Regression R
100% (13)
Companion Applied Regression R
802 pages
James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
100% (1)
James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
789 pages
2009 - Introductory Time Series With R - Select Solutions - Aug 05
33% (3)
2009 - Introductory Time Series With R - Select Solutions - Aug 05
16 pages
An R Companion To Applied Regression 2nd Edition
No ratings yet
An R Companion To Applied Regression 2nd Edition
538 pages
Exercises For R
No ratings yet
Exercises For R
40 pages
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
Count Data
No ratings yet
Count Data
5 pages
Count Data 2012
No ratings yet
Count Data 2012
20 pages
Poisson Regression Models
No ratings yet
Poisson Regression Models
14 pages
GLM in R
No ratings yet
GLM in R
6 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
P NB ProbitE
No ratings yet
P NB ProbitE
21 pages
Countdata2018 2
No ratings yet
Countdata2018 2
23 pages
Logistic Regression (With R) : 1 Theory
No ratings yet
Logistic Regression (With R) : 1 Theory
15 pages
R Course
No ratings yet
R Course
7 pages
ACTS 372 UNIT 6
No ratings yet
ACTS 372 UNIT 6
40 pages
Negative Binomial Regression Second Edition
No ratings yet
Negative Binomial Regression Second Edition
9 pages
R Commands
No ratings yet
R Commands
5 pages
LabNote 3
No ratings yet
LabNote 3
3 pages
07 GLM
No ratings yet
07 GLM
49 pages
Exercise 6 ANOVA
No ratings yet
Exercise 6 ANOVA
7 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
V27i08 PDF
No ratings yet
V27i08 PDF
25 pages
lect12
No ratings yet
lect12
36 pages
glm-ohp
No ratings yet
glm-ohp
6 pages
Regression models: ∼ N (µ, φ) µ Y ∼ P (µ, φ) g (µ) = Xβ
No ratings yet
Regression models: ∼ N (µ, φ) µ Y ∼ P (µ, φ) g (µ) = Xβ
6 pages
IntroStat Oct2010
No ratings yet
IntroStat Oct2010
324 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
Count Data Models in SAS
No ratings yet
Count Data Models in SAS
12 pages
CRAN Task View: Statistics For The Social Sciences: Maintainer: John Fox
No ratings yet
CRAN Task View: Statistics For The Social Sciences: Maintainer: John Fox
8 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
In Sem 2 Study Material
No ratings yet
In Sem 2 Study Material
19 pages
Remedial Measures Purdue - Edu
No ratings yet
Remedial Measures Purdue - Edu
28 pages
Intro Stat
No ratings yet
Intro Stat
324 pages
UL2
No ratings yet
UL2
2 pages
MATH1208AnnotatedBook Imp
No ratings yet
MATH1208AnnotatedBook Imp
145 pages
Shorten - Count Data Analysis
No ratings yet
Shorten - Count Data Analysis
24 pages
BAN5
No ratings yet
BAN5
2 pages
BES - R Lab 7
No ratings yet
BES - R Lab 7
5 pages
A Guide To Doing Statistics PDF
No ratings yet
A Guide To Doing Statistics PDF
320 pages
A Guide To Doing Statistics in Second Language Research Using R
No ratings yet
A Guide To Doing Statistics in Second Language Research Using R
320 pages
Practical2 3
No ratings yet
Practical2 3
6 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
Statistical Models in S
No ratings yet
Statistical Models in S
115 pages
BES - R Lab
No ratings yet
BES - R Lab
5 pages
HWK5_SS
No ratings yet
HWK5_SS
11 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
A1
No ratings yet
A1
8 pages
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
No ratings yet
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
34 pages
Homework 1: Statistics 109 Due February 17, 2019 at 11:59pm EST
No ratings yet
Homework 1: Statistics 109 Due February 17, 2019 at 11:59pm EST
23 pages
ANOVA in R
No ratings yet
ANOVA in R
7 pages
EDUC/PSY 6600: Unit 2 Homework: Your Name Fall 2019
No ratings yet
EDUC/PSY 6600: Unit 2 Homework: Your Name Fall 2019
48 pages
Regression Models For Count Data in R: Achim Zeileis Christian Kleiber Simon Jackman
No ratings yet
Regression Models For Count Data in R: Achim Zeileis Christian Kleiber Simon Jackman
25 pages
Oneway Anova Basics
No ratings yet
Oneway Anova Basics
149 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Chapter 9: Testing of Hypothesis: Mr. Mark Anthony Garcia, M.S. Mathematics Department de La Salle University
No ratings yet
Chapter 9: Testing of Hypothesis: Mr. Mark Anthony Garcia, M.S. Mathematics Department de La Salle University
40 pages
Research Methods Lecture Note
No ratings yet
Research Methods Lecture Note
70 pages
Hypothesis Testing - Z and T-Tests
No ratings yet
Hypothesis Testing - Z and T-Tests
9 pages
LAS Stat Prob Q4 Wk2 Test-on-Population-Mean
No ratings yet
LAS Stat Prob Q4 Wk2 Test-on-Population-Mean
10 pages
8409 Statistics
No ratings yet
8409 Statistics
17 pages
Testing Hypothesis One Sample
No ratings yet
Testing Hypothesis One Sample
7 pages
Chapter 9 Tools: Z Test Statistic for a Hypothesis Test of the Population Mean (when σ is known)
No ratings yet
Chapter 9 Tools: Z Test Statistic for a Hypothesis Test of the Population Mean (when σ is known)
5 pages
Ch5 MMW BSN
No ratings yet
Ch5 MMW BSN
17 pages
Chi-Square Test of Independence
No ratings yet
Chi-Square Test of Independence
46 pages
Hypothesis Test Full
No ratings yet
Hypothesis Test Full
41 pages
Nonparametric Methods: Prem Mann, Introductory Statistics, 7/E
No ratings yet
Nonparametric Methods: Prem Mann, Introductory Statistics, 7/E
33 pages
STATS - WK - April 23 25
No ratings yet
STATS - WK - April 23 25
9 pages
QUIZ
No ratings yet
QUIZ
41 pages
Kruskal Wallis Test PDF
100% (2)
Kruskal Wallis Test PDF
2 pages
Detailed Numerical Examples of Confidence Interval and Hypothesis Testing For PU
No ratings yet
Detailed Numerical Examples of Confidence Interval and Hypothesis Testing For PU
12 pages
Asset-V1 MITx+CTL - sc0x+2T2023+Type@Asset+Block@Module 3 Clean
No ratings yet
Asset-V1 MITx+CTL - sc0x+2T2023+Type@Asset+Block@Module 3 Clean
130 pages
Assignment in Statistics
100% (1)
Assignment in Statistics
11 pages
Statistical Hypothesis:: Details At:::::::::::::::::::::http://goo - gl/7Dztn
No ratings yet
Statistical Hypothesis:: Details At:::::::::::::::::::::http://goo - gl/7Dztn
11 pages
Shs Core Statistics and Probability CGPDF
No ratings yet
Shs Core Statistics and Probability CGPDF
6 pages
22st202- p&s Notes - 2024- Dr. k. Kalyani (1)
No ratings yet
22st202- p&s Notes - 2024- Dr. k. Kalyani (1)
224 pages
Learning Competency 4
No ratings yet
Learning Competency 4
53 pages
OneWayANOVA LectureNotes
No ratings yet
OneWayANOVA LectureNotes
13 pages
Quantitative Methods Past Question
No ratings yet
Quantitative Methods Past Question
17 pages
Hypothesis Test - Population Variance (1,2)
No ratings yet
Hypothesis Test - Population Variance (1,2)
10 pages
Two-Sample Tests of Hypothesis: Mcgraw-Hill/Irwin
No ratings yet
Two-Sample Tests of Hypothesis: Mcgraw-Hill/Irwin
14 pages
IM (VM 8.2.1.4) Distributions and Probability 2024-2025
No ratings yet
IM (VM 8.2.1.4) Distributions and Probability 2024-2025
67 pages
Student Level of Attendance L Low H High Final Exam Score (%) A B C D E F G H I J K
No ratings yet
Student Level of Attendance L Low H High Final Exam Score (%) A B C D E F G H I J K
3 pages
Stab22h3 A17
No ratings yet
Stab22h3 A17
30 pages
1003 0720 Modelación y Simulación 2 - Libro Averill M Law - Simulation Modeling and Analysis - Solutions of Select Exercises
100% (1)
1003 0720 Modelación y Simulación 2 - Libro Averill M Law - Simulation Modeling and Analysis - Solutions of Select Exercises
285 pages
Statistics Week 5
No ratings yet
Statistics Week 5
7 pages

R Handbook - Regression For Count Data

Uploaded by

R Handbook - Regression For Count Data

Uploaded by

27/03/2024, 11:09 R Handbook: Regression for Count Data

Summary and Analysis of Extension Program

Regression for Count Data

Packages used in this chapter

Count data example

### Order factors by the order in data frame

### Remove unnecessary objects

Poisson regression example

An alternate approach for data with overdispersion is negative binomial regression.

Garden emmean SE df asymp.LCL asymp.UCL .group

Results are given on the log (not the response) scale.

Negative binomial regression example

Garden response SE df asymp.LCL asymp.UCL .group

Confidence level used: 0.95

Zero-inflated regression example

Count model coefficients (poisson with log link):

Zero-inflation model coefficients (binomial with logit link):

Garden emmean SE df asymp.LCL asymp.UCL .group

Confidence level used: 0.95

Robust Poisson regression example

An alternate method is the glmrob function in the robustbase package.

Df Deviance Resid. Df Resid. Dev Pr(>Chi)

Terms Resid. Df Resid. Dev Test Df Deviance Pr(>Chi)

Analysis of Deviance Table (Type II tests)

Garden emmean SE df asymp.LCL asymp.UCL .group

Results are given on the log (not the response) scale.

model = glm.hermite(Monarchs ~ Garden,

Post-hoc analysis: Medians and confidence intervals

Sum = groupwiseMedian(Monarchs ~ Garden,

Garden n Median Conf.level Percentile.lower Percentile.upper

### In this case, none of the confidence intervals overlap.

Plot of medians and confidence intervals

ggplot(Sum, ### The data frame to use.

Optional code for chi-square goodness-of-fit test

X-squared = 60.05, df = 2, p-value = 9.127e-14

Post-hoc chi-square tests

Chi-squared test for given probabilities

X-squared = 21.879, df = 1, p-value = 2.904e-06

observed = c(Garden.B, Garden.C) # observed frequencies

X-squared = 12.082, df = 1, p-value = 0.0005091

observed = c(Garden.A, Garden.C) # observed frequencies

X-squared = 59.259, df = 1, p-value = 1.382e-14

model.nb = glm.nb(Monarchs ~ Garden,

Vuong Non-Nested Hypothesis Test-Statistic:

### Negative Vuong z-statistic suggests that model 2 is superior.

Vuong Non-Nested Hypothesis Test-Statistic:

References for count data

©2016 by Salvatore S. Mangiafico.

You might also like