0% found this document useful (0 votes)
41 views

The Gamma-Count Distribution in The Analysis of Experimental Underdispersed Data

The document analyzes an agronomic experiment on cotton production using count data models. It compares the Gamma-count distribution, which allows for underdispersion, to the standard Poisson regression model and quasi-Poisson model. The Gamma-count model provides a better fit to the experimental data which shows evidence of underdispersion, with variances smaller than the means.

Uploaded by

David Mejia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

The Gamma-Count Distribution in The Analysis of Experimental Underdispersed Data

The document analyzes an agronomic experiment on cotton production using count data models. It compares the Gamma-count distribution, which allows for underdispersion, to the standard Poisson regression model and quasi-Poisson model. The Gamma-count model provides a better fit to the experimental data which shows evidence of underdispersion, with variances smaller than the means.

Uploaded by

David Mejia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Journal of Applied Statistics

ISSN: 0266-4763 (Print) 1360-0532 (Online) Journal homepage: www.tandfonline.com/journals/cjas20

The Gamma-count distribution in the analysis of


experimental underdispersed data

Walmes Marques Zeviani, Paulo Justiniano Ribeiro Jr, Wagner Hugo Bonat,
Silvia Emiko Shimakura & Joel Augusto Muniz

To cite this article: Walmes Marques Zeviani, Paulo Justiniano Ribeiro Jr, Wagner Hugo Bonat,
Silvia Emiko Shimakura & Joel Augusto Muniz (2014) The Gamma-count distribution in the
analysis of experimental underdispersed data, Journal of Applied Statistics, 41:12, 2616-2626,
DOI: 10.1080/02664763.2014.922168

To link to this article: https://doi.org/10.1080/02664763.2014.922168

Published online: 10 Jun 2014.

Submit your article to this journal

Article views: 550

View related articles

View Crossmark data

Citing articles: 8 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=cjas20
Journal of Applied Statistics, 2014
Vol. 41, No. 12, 2616–2626, http://dx.doi.org/10.1080/02664763.2014.922168

The Gamma-count distribution in the


analysis of experimental underdispersed
data

Walmes Marques Zeviania∗ , Paulo Justiniano Ribeiro Jra , Wagner Hugo Bonata ,
Silvia Emiko Shimakuraa and Joel Augusto Munizb
a Department of Statistics, UFPR, Centro Politecnico, Curitiba, Paraná, Brazil; b Department of Exact
Sciences, UFLA, Lavras, Minas Gerais, Brazil

(Received 18 June 2013; accepted 5 May 2014)

Event counts are response variables with non-negative integer values representing the number of times
that an event occurs within a fixed domain such as a time interval, a geographical area or a cell of a contin-
gency table. Analysis of counts by Gaussian regression models ignores the discreteness, asymmetry and
heteroscedasticity and is inefficient, providing unrealistic standard errors or possibly negative predictions
of the expected number of events. The Poisson regression is the standard model for count data with under-
lying assumptions on the generating process which may be implausible in many applications. Statisticians
have long recognized the limitation of imposing equidispersion under the Poisson regression model. A
typical situation is when the conditional variance exceeds the conditional mean, in which case models
allowing for overdispersion are routinely used. Less reported is the case of underdispersion with fewer
modeling alternatives and assessments available in the literature. One of such alternatives, the Gamma-
count model, is adopted here in the analysis of an agronomic experiment designed to investigate the effect
of levels of defoliation on different phenological states upon the number of cotton bolls. Data set and code
for analysis are available as online supplements. Results show improvements over the Poisson model and
the semi-parametric quasi-Poisson model in capturing the observed variability in the data. Estimating
rather than assuming the underlying variance process leads to important insights into the process.

Keywords: Poisson regression; likelihood inference; Gamma-count; underdispersion; quasi-Poisson;


cotton

1. Introduction
Regression models are deeply rooted in the analysis of agronomic experiments and least-squares
methods associated with the linear (Gaussian) model are widely adopted. On the other hand,
response variables in the form of counts are not uncommon. They may represent the number of
fruits produced by a tree, the number of units infected by a disease, the number of insects on a

∗ Corresponding author. Email: [email protected]


c 2014 Taylor & Francis
Journal of Applied Statistics 2617

particular plant structure, among others. Counts are random variables that assume non-negative
integer values, representing the number of times an event occurs within a fixed domain that can
be continuous, such as an interval of time or space, or discrete, such as the evaluation of an
individual or a census tract.
Gaussian regression models for count data are not efficient, typically producing inconsistent
standard errors and even negative predictions for the expected number of events [5]. The gaus-
sian linear model ignores the discreteness, heteroscedasticity, asymmetry and non-negativeness,
inherent features of count data. Impacts on the results are greater when the sample size is small
and the counts are low.
Poisson regression became the standard model for count data, in particular after the proposal
of the unifying class of generalized linear models [9] and the subsequent availability of com-
putational resources for model fitting. The Poisson distribution is an appealing option to model
count data given its domain on the non-negative integer numbers; moreover, it naturally allows
for asymmetry and heteroscedasticity that are intrinsic characteristics of this kind of data.
The assumption of variance equal to the mean (equidispersion) underlying Poisson regression
models imposes practical restrictions. Parameter estimates will be inefficient, with inconsistent
standard errors, and with larger error rates for hypothesis tests when the Poisson model is applied
to non-equidispersed data [12,13].
Overdispersion, with the variance greater than the mean, is largely reported in the literature and
may occur due to the absence of relevant covariates, heterogeneity of sampling units, sampling
levels, and excess of zeros [4]. An usual approach is to adopt a generalized linear mixed model
describing the extra variability by the inclusion of a non-observed latent random variable. An
interesting case is to assume a Poisson model with Gamma distributed random effects leading
to a negative binomial marginal distribution for the responses. El Shaarawi et al. [2] provide an
overview of this and other alternatives.
Lesser reported are the cases of underdispersion, with variances smaller than the means.
Explanatory mechanisms are more scarce and, typically, heavily dependent on the context. A
possible general description can be derived by revisiting the key property of independent expo-
nentially distributed times between events underlying the Poisson model. If inadequate, the
occurrence of an event affects the probability of another one, generating over or underdispersed
counts. Other continuous probability distributions with positive domain can be assumed such as
Gamma [11,12], lognormal [3] and Weibull [8]. Alternative approaches include weighting the
Poisson distribution [10], the COM-Poisson distribution [6,7] and heavy tail distributions [14].
Winkelmann [12] explores the connection between models for counts and models for durations
(lifetimes) relaxing the assumption of equidispersion at the cost of an extra parameter denoted by
α. The Gamma-count model is a convenient choice assuming Gamma distributed times between
events. The Poisson model becomes a particular case when the restriction α = 1 implies that the
duration distribution reduces to the exponential distribution. Varying values for the parameter α
induce a flexible probability distribution for the counts, which become underdispersed for α > 1
and overdispersed for 0 < α < 1.
We adopt the Gamma-count model for the analysis of a cotton production agronomic experi-
ment and compare the results against the ones obtained with Poisson and quasi-Poisson models.
First, the standard Poisson model is not excluded since it becomes a particular case. Second,
fitting the Gamma-count model allows for investigating whether the occurrences of bolls within
a plant are independent events, an arguable assumption under the simpler Poisson model. Third,
descriptive analyses of the data provided a clear empirical evidence that the variance is a func-
tion of the mean with a constant of proportionality below one. We also analyze the data by a
semi-parametric quasi-Poisson model as the benchmark for quantifying the observed variability
in the data.
2618 W.M. Zeviani et al.

The Gamma-count regression model is not the canonical choice amongst users of applied
statistics and not widely available in statistical software. For this reason, generic functions for
maximum-likelihood inference are available as online supplements. This includes key aspects
related to inference upon the parameters of the Gamma-count model, such as construction of
confidence intervals, either asymptotic or based on profile likelihoods; hypothesis tests; model
comparisons and prediction with corresponding confidence intervals are also included, all used
throughout the data analysis.

2. Background
Poisson regression models for count data follows directly from the generalized linear model
structure. Alternatively, the Poisson model can be derived by assuming independent and expo-
nentially distributed times between events. The latter allows for the construction of alternatives
for under or overdispersed data such as the Gamma-count model [12], as follows below.
Elementary probability arguments establish that the distribution of a count variable can be
derived from the distribution of arrival times. Let τk > 0, k ∈ N, denote a sequence of waiting
times between the (k − 1) and the kth event. Then, the arrival time of the nth event is


n
ϑn = τk , n = 1, 2, . . . . (1)
k=1

Let NT represent the total number of events within a (0, T) interval. NT is a count variable. It
follows from the definition of NT and ϑn that

NT < n ⇐⇒ ϑn ≥ T,
Pr(NT < n) = Pr(ϑn ≥ T) = 1 − Fn (T), (2)
Pr(NT = n) = Fn (T) − Fn+1 (T),

where Fn (T) is the cumulative distribution function of ϑn . Equation (2) allows obtaining the
distribution of counts NT from knowledge of the distribution of arrival times ϑn .
It is assumed that τk are identically and independently Gamma (G(α, β)) distributed with
density:
β α α−1
f (τ ; α, β) = τ exp{−βτ }, α, β ∈ R+ ,
(α)
with τ > 0, mean E(τ ) = α/β and variance Var(τ ) = α/β 2 . By Equation (1), ϑn is the sum of
n i.i.d. Gamma random variables with density Gamma(nα, β). Let G(nα, βT) be the cumulative
distribution function evaluated at T:
 T
β nα nα−1
G(nα, βT) = Fn (T) = τ exp{−β} dτ
0 (nα)
 βT  nα−1
1 u 1
= β nα exp{−u} du
(nα) 0 β β
 βT
1
= unα−1 exp{−u} du. (3)
(nα) 0

The count distribution (2) for number of events within the time interval (0, T) is given by

Pr(N = n) = G(αn, βT) − G(α(n + 1), βT), (4)


Journal of Applied Statistics 2619

with expected value given by




E(NT ) = i Pr(NT = i)
i=1


= i(G(αi, βT) − G(α(i + 1), βT))
i=1


= G(αi, βT). (5)
i=1

For α = 1, f (τ ) reduces to the exponential density and Equation (4) simplifies to the Poisson
distribution.
For the Gamma-count regression model, the parameters depend on a vector of individual
covariates, indicated by the subscript i. Assuming that the period at risk is the same for all
observations, T can be set to unity, without loss of generality. This yields the regression
α
E(τi |xi ) = = exp{−xi γ }.
β
It is important to emphasize that the regression is for the waiting times τi and not for the counts
Ni since E(Ni |xi ) = (E(τi |xi ))−1 does not holds unless α = 1. For a given γ , E(Ni |xi ) is evaluated
by Equation (5).
Figure 1 illustrates the relation between the distribution of times between events and counts
showing the graphics of density and hazard functions with corresponding simulated values.
Gamma distributions with unity mean and different variances are shown in the first line. The sec-
ond line displays the corresponding increasing, constant and decreasing hazard functions related
to smaller, equal or larger variances than the mean. The middle plots correspond to the expo-
nential distribution and its constant hazard function. The middle panel shows simulated values
with time intervals drawn from each of the above-mentioned distributions. Vertical lines indicate
fixed width intervals for which events are counted and the counts within each interval are dis-
played. The distribution of events is nearly regular and the counts have smaller variances in the
underdispersed case. For the overdispersed case, the events are clustered with large variances for
counts. Differences are evident in the resulting histograms.
For a sample if independent counts yi , i = 1 . . . n, estimates α̂ and γ̂ can be obtained by
maximizing the log-likelihood

n
(γ , α; y, x) = log(G(yi α, α exp(xi γ )) − G(yi (α + 1), α exp(xi γ ))), (6)
i=1

where γ is the vector of regression parameters describing the interval between the events, α is
the dispersion parameter, xi is a vector of covariates and G() is given by Equation (3).
Parameter estimation requires numerical maximization of Equation (6). Confidence inter-
vals and hypotheses tests can be either based upon quadratic approximations of the likelihood
function (Wald type intervals) or profile likelihoods.
For a vector x of covariates values, time between events is predicted by
η̂ = x γ̂ .
The covariance matrix for the model parameters is
 
V Vαγ
V = αα ,
Vγ α Vγ γ
2620 W.M. Zeviani et al.

Figure 1. Comparison of different distributions of time between events. Top panel: Gamma densities and
hazard functions, middle panel: simulated events and corresponding interval counts for each distribution,
and bottom panel: counts histograms.

and estimated by the negative of the inverse Hessian matrix numerically obtained around the
maximized log-likelihood. The prediction standard error is given by


se(η̂) = x Vγ |α x,

−1
where Vγ |α = Vγ γ − Vγ α Vαα Vαγ . For the particular case of the Gamma-count model considered
here, α is a scalar and found to be nearly orthogonal to γ in which case Vγ |α ≈ Vγ γ . Confidence
intervals for the mean counts are obtained by computing Equation (5) after transforming the
limits of the confidence interval on the scale of the linear predictor by the inverse link function
g−1 ().
Journal of Applied Statistics 2621

3. Data set and models


The data that motivated this paper come from a greenhouse experiment with cotton
plants (Gossypium hirsutum) obtained under a completely randomized design with five
replicates. The experiment was aimed to assess the effects of five defoliation levels
(0%, 25%, 50%, 75% and 100%) on the observed number of bolls produced by plants at five
growth stages: vegetative, flower-bud, blossom, fig and cotton boll [1]. The experimental unity
was a vase with two plants. The number of cotton bolls was recorded at the each culture cycle.
Figure 2 (left) shows the number of cotton bolls recorded for each combination of defoliation
level and growth stage. All the points in the sample means and variances dispersion diagram
(right) are below the identity line, clearly suggesting the presence of underdispersion.
The analysis and assessment of the effects of the experimental factors are based on the
Gamma-count, Poisson and quasi-Poisson models, with the following structures for the log-link
function g():

Predictor 1: g(μ) = γ0 ;
Predictor 2: g(μ) = γ0 + γ1 def (first order effect of defoliation);
Predictor 3: g(μ) = γ0 + γ1 def + γ2 def2 (second order effect of defoliation);
Predictor 4: g(μ) = γ0 + γ1j def + γ2 def2 (first order defoliation effect for each growth stage);
Predictor 5: g(μ) = γ0 + γ1j def + γ2j def2 (second order effect defoliation for each growth
stage).

The parameter μ is the expected value of N for the Poisson and quasi-Poisson models and
the expected value of the latent random variable τ equivalent to time between events for the
Gamma-count model.
The nested structure of the predictors allows relevant hypothesis to be tested by likelihood
ratios. Predictor 1 contains only the intercept and is fitted simply as a baseline to assess to which
extent the structured models improve the fit. Linear and quadratic effects of defoliation are added
by Predictor 2 and Predictor 3, respectively. Predictor 4 and Predictor 5 allow the linear and

Figure 2. (Left) Number of bolls produced for each artificial defoliation level and each growth stage. (Right)
Sample variance against the sample mean of the five replicates for each combination of defoliation level
and growth stage.
2622 W.M. Zeviani et al.

quadratic effects of defoliation to vary between the growth stages, as indicated by the subscript j.
The parameter γ0 is not allowed to vary between the growth stage once the effect of no defoliation
is the same for all growth stages.
Values of the maximized log-likelihood and the Akaike criterion are recorded for the fully
parametric Poisson and Gamma-count models. The semi-parametric quasi-Poisson model is also
fitted to assess whether the parametric models produce comparable results. This model is less
restrictive concerning model assumptions, albeit without the inferential advantages of the fully
parametric counterparts.

4. Results
Table 1 summarizes the maximized log-likelihoods and likelihood ratio tests comparing the
sequence of predictors for the Poisson and Gamma-count models, as well as fitting results for
the quasi-Poisson. The Gamma-count model has a higher log-likelihood with the hypothesis of
equidispersion (α = 1) being rejected by likelihood ratio tests, even for the predictor without
covariates. Estimates of α̂ > 1 confirm that the number of cotton bolls is underdispersed with
increasing hazard functions indicating that the probability of the development of a new cotton
boll increases as time progresses. This result supports the hypothesis of a regular sharing of
plant resources in the distribution of the number of cotton bolls. The quasi-Poisson model also
indicates underdispersion (φ < 1) even for the null model.
Unlike the others, the Poisson model does not show significant effects under Model 5.
This is attributed to the inadequate assumption of equidispersion that makes the log-likelihood

Table 1. Model fit measures and comparisons between predictors and models.

Poisson np  AIC diff np 2(diff ) P(> χ 2 )

1 1 −279.933 561.866
2 2 −272.001 548.001 1 15.864 6.805E–05
3 3 −271.354 548.709 1 1.293 2.556E–01
4 7 −258.674 531.348 4 25.360 4.258E–05
5 11 −255.803 533.606 4 5.742 2.193E–01
Gamma-count np  AIC diff np 2(diff ) P(> χ 2 ) α̂ P(> χ 2 )a

1 2 −272.396 548.792 1.764 1.034E−04


2 3 −257.350 520.701 1 30.092 4.121E−08 2.266 6.198E−08
3 4 −255.981 519.962 1 2.738 9.796E−02 2.317 2.940E−08
4 8 −220.145 456.291 4 71.671 1.007E−14 4.206 1.661E−18
5 12 −208.386 440.773 4 23.518 9.976E−05 5.112 2.071E−22
Quasi-Poisson np deviance diff np diff dev P(> F) φ̂ P(> χ 2 )a

1 1 75.514 0.567 3.660E−04


2 2 59.650 1 34.214 4.235E−08 0.464 5.134E−07
3 3 58.357 1 2.810 9.630E−02 0.460 3.661E−07
4 7 32.997 4 22.768 7.676E−14 0.278 9.154E−16
5 11 27.255 4 5.956 2.241E−04 0.241 3.566E−18

Notes: a Bilateral hypothesis test of dispersion parameter equal to 1.


np, number of parameters; , log-likelihood; diff np, difference in np; diff , difference in ; diff dev, difference in
scaled deviance.
Journal of Applied Statistics 2623

Table 2. Parameter estimates and estimate/standard error rates for the three models.

Poisson quasi-Poisson Gamma-count

Parameter Estimate Est/SE Estimate Est/SE Estimate Est/SE

γ0 2.1896 34.5724a 2.1896 70.4205a 2.2342 79.7128a


γ1vegetative 0.4369 0.8473 0.4369 1.7260 0.4122 1.8080
γ2vegetative −0.8052 −1.3790 −0.8052 −2.8089a −0.7628 −2.9544a
γ1bud 0.2897 0.5706 0.2897 1.1622 0.2744 1.2224
γ2bud −0.4879 −0.8613 −0.4879 −1.7544 −0.4642 −1.8534
γ1blossom −1.2425 −2.0581a −1.2425 −4.1921a −1.1821 −4.4348a
γ2blossom 0.6728 0.9892 0.6728 2.0149a 0.6453 2.1486a
γ1fig 0.3649 0.6449 0.3649 1.3135 0.3198 1.2797
γ2fig −1.3103 −1.9477 −1.3103 −3.9672a −1.1990 −4.0385a
γ1boll 0.0089 0.0178 0.0089 0.0362 0.0070 0.0315
γ2boll −0.0200 −0.0361 −0.0200 −0.0736 −0.0185 −0.0756
α – – – – 5.1120 7.4228a

Note: a Indicates |Est/SE| > 1.96.

Figure 3. Dispersion diagrams of observed values and curves of predicted values and confidence intervals
(95%) as functions of the defoliation level for each growth stage.

among predictors less distinguishable. Descriptive levels (p-values) are substantially smaller
for the Gamma-count and quasi-Poisson, compared with the Poisson model. In the presence
of underdispersion, the latter becomes conservative for hypothesis testing.
The Gamma-count and the quasi-Poisson models indicate that both, linear and quadratic
effects of levels of defoliation, vary between growth stages. Results in Table 2 and Figure 3 show,
for all models, no significant effects of defoliation during the floral-bud and cotton boll stages.
The ratios between the estimates and the corresponding standard errors for these stages are, in
absolute values, smaller than the reference value of 1.96 for a significance level of 5%. The Pois-
son model only detects the effect of defoliation for the blossom stage, while the Gamma-count
and quasi-Poisson models indicate a significant effect of defoliation for the vegetative, blossom
and fig stages.
Parameter estimates for the blossom stage have an opposite signal when compared with the
other stages. A negative and significant linear term indicates a rapid decay in the number of
2624 W.M. Zeviani et al.

Figure 4. Estimated probabilities from Poisson and Gamma-count models for a level zero of defoliation.

cotton bolls during the beginning of defoliation. The positive quadratic term indicates concave
up response as seen in Figure 3 for the blossom stage. Therefore, the impact of defoliation is
greater for the blossom stage and there is a tolerance up to approximately 40% of defoliation
for the vegetative stage and 24% for the fig stages. Parameter estimates between models are not
directly comparable once they are related to the number of events in the Poisson model and to
the distribution of the time between events for the Gamma-count model.
Prediction curves for each stage are shown in Figure 3 and are indistinguishable between
the three models. The confidence bands are similar between Gamma-count and quasi-Poisson
models and clearly wider for the Poisson model.
Overall the Gamma-count and the quasi-Poisson model produced very similar inferential
results, point and interval estimates, hypothesis tests, model comparisons and prediction bands.
The semi-parametric quasi-Poisson model is expected to have a better fit to a particular data
set, as there is no explicit formulation of a probability model and functional relation between
mean and variance. Such flexibility comes with drawbacks. There are no likelihood measures for
comparing models and submodels neither an estimated probability distribution for the counts,
which could address questions of scientific interest. Figure 4 provides the estimated probability
distributions for the number of cotton bolls obtained under Poisson and Gamma-count models.
At the level zero of defoliation, the expected value is 8.93 cotton bolls per two plants for either
model; however with probability distribution more concentrated around the mean value under
the Gamma-count model.
In what follows we further explore aspects of the likelihood function. The profile log-
likelihood for α is slightly skewed (left panel, Figure 5). The 95% confidence interval based
on the χ 2 distribution is (3.89, 6.59) while the asymptotic interval is (3.76, 6.46). Both have the
same range (2.70), however shifted by 0.13 units. This is a small difference and the quadratic
approximation of the likelihood is considered satisfactory. Although the precision of the inter-
vals is similar, the interval based on the log-likelihood is preferred to describe the uncertainty
associated with α since it is able to detect possible asymmetries and has limits within the (0, ∞)
parameter space.
The right panel in Figure 5 shows the confidence regions for α and γ0 obtained via profile like-
lihood and quadratic approximation of the likelihood. Axes of the confidence regions are nearly
parallel to the Cartesian axes suggesting that the parameters are nearly orthogonal. Moreover,
Journal of Applied Statistics 2625

Figure 5. Profile likelihood and quadratic approximation for: (left) α with arrows indicating the 95%
confidence intervals and (right) (90, 95, 99%) confidence regions for γ0 and α.

covariances between α̂ and each of the other parameters γ̂ (not shown) are nearly zero implying
the inferences about one parameter are not influenced by the other parameter. The confidence
regions are symmetric in the direction of γ0 and the asymptotic and profile likelihood-based
confidence intervals are therefore coincident.
Computationally, the asymptotic confidence interval is easier to obtain since it simply requires
the inversion of the Hessian matrix around the maximum of the log-likelihood function. The
profile log-likelihood requires successive optimizations for a set of values of the parameter of
interest. For a larger number of parameters obtaining individual intervals based on the profile,
the likelihoods will increase the computational burden.

5. Conclusion
The Poisson, Gamma-count and semi-parametric quasi-Poisson models were considered for the
analysis of underdispersed count responses from a greenhouse experiment with cotton plants
subjected to different artificial defoliation levels and growth stages.
Significance of experimental factors is the same for the Gamma-count and quasi-Poisson mod-
els, whereas the Poisson model is more conservative, not identifying some experimental factors
as significant. The latter have led to greater standard errors and wider prediction bands, being
unable to capture information contained in the data. The analysis suggest that, in the presence
of underdispersion, the standard Poisson model is inadequate and can lead to wrong conclusions
about the effects of experimental factors or covariates of interest.
Results under the Gamma-count model are comparable to the semi-parametric approach which
does not assume a specific probability distribution for the counts. The fully parametric approach
is advantageous since it allows for likelihood-based inference, deriving estimated prediction
probabilities besides enabling generalizations such as specifying a regression model structure
also for the dispersion parameter.
Likelihood analysis showed nearly quadratic behavior for the parameter α controlling the dis-
persion of the counts. This parameter has little influence upon point estimates of the regression
parameters, being responsible for stabilizing the estimates of variances of regression parameters,
which are often overestimated under the Poisson distribution.
Despite the advantages and potential for usage, the Gamma-count model is an uncommon
relevant addition to the suite of models to be considered for the analysis of experimental count
2626 W.M. Zeviani et al.

data. The model can be easily implemented in a statistical programming language as illustrated
by the supplementary material.1
Possible topics for further investigation and extensions include assessment of impacts of mis-
specification under different levels dispersion, increase in flexibility possibly by modeling the
dispersion parameter as a function of covariates and the addition of random effects to account
for grouped data structures such as repeated and longitudinal measures.

Note

1. Supplementary Content may be viewed online at http://www.leg.ufpr.br/doku.php/publications:


papercompanions:zeviani-jas2014.

References
[1] A.M. da Silva, P.E. Degrande, M.G. Fernandes, R. Suekane, and W.M. Zeviani, Impacto de diferentes níveis de
desfolha artificial nos estádios fenológicos do algodoeiro, Rev. Ciências Agrárias 35 (2012), pp. 163–172.
[2] A.H. El Shaarawi, R. Zhu, and H. Joe, Modelling species abundance using the Poisson-Tweedie family,
Environmetrics 22(2) (2011), pp. 152–164.
[3] U. Gonzales-Barron and F. Butler, Characterisation of within-batch and between-batch variability in micro-
bial counts in foods using Poisson-Gamma and poisson-lognormal regression models, Food Control 22 (2011),
pp. 1268–1278.
[4] G.K. Grunwald, S.L. Bruce, L. Jiang, M. Strand, and N. Rabinovitch, A statistical model for under or overdispersed
clustered and longitudinal count data, Biom. J. 53 (2011), pp. 578–594.
[5] G. King, Variance specification in event count models: From restrictive assumptions to a generalized estimator,
Am. J. Polit. Sci. 33 (1989), pp. 762–784.
[6] D. Lord, S.R. Geedipally, and S.D. Guikema, Extension of the application of Conway-Maxwell-Poisson models:
Analyzing traffic crash data exhibiting underdispersion, Risk Anal. 30 (2010), pp. 1268–1276.
[7] D. Lord, S.D. Guikema, and S.R. Geedipally, Application of the Conway-Maxwell-Poisson generalized linear model
for analyzing motor vehicle crashes, Accid. Anal. Prev. 40 (2008), pp. 1123–1134.
[8] B. McShane, M. Adrian, E.T. Bradlow, and P.S. Fader, Count models based on Weibull interarrival times, J. Bus.
Econom. Statist. 26 (2008), pp. 369–378.
[9] J.A. Nelder and R.W.M. Wedderburn, Generalized linear models, J. Roy. Stat. Soc. Ser. A 135 (1972), pp. 370–384.
[10] M.S. Ridout and P. Besbeas, An empirical model for underdispersed count data, Stat. Model. 4 (2004), pp. 77–89.
[11] N. Toft, G.T. Innocent, D.J. Mellor, and S.W. Reid, The Gamma-Poisson model as a statistical method to determine
if micro-organisms are randomly distributed in a food matrix, Food Microbiol. 23 (2006), pp. 90–94.
[12] R. Winkelmann, Duration dependence and dispersion in count-data models, J. Bus. Econom. Statist. 13 (1995),
pp. 467–474.
[13] R. Winkelmann and K. Zimmermann, Count data models for demographic data, Math. Popul. Stud. 4 (1994),
pp. 205–221.
[14] R. Zhu and H. Joe, Modelling heavy-tailed count data using a generalised Poisson-inverse Gaussian family, Statist.
Probab. Lett. 79 (2009), pp. 1695–1703.

You might also like