0% found this document useful (0 votes)
42 views

Uninformative Parameters and Model Selection Using Akaike's Information Criterion

The document discusses the problem of interpreting models as competitive based on having Akaike's Information Criterion (AIC) scores within 2 units of the top model, when they actually contain uninformative parameters. It provides an example from a study on waterfowl detection probabilities that included models with single additional parameters that increased AIC by only 2 units, despite providing no improvement to model fit. The author argues we should not interpret such models as truly competitive or supported, and discusses potential solutions like reporting only models without uninformative parameters or using model averaging to address the issue.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Uninformative Parameters and Model Selection Using Akaike's Information Criterion

The document discusses the problem of interpreting models as competitive based on having Akaike's Information Criterion (AIC) scores within 2 units of the top model, when they actually contain uninformative parameters. It provides an example from a study on waterfowl detection probabilities that included models with single additional parameters that increased AIC by only 2 units, despite providing no improvement to model fit. The author argues we should not interpret such models as truly competitive or supported, and discusses potential solutions like reporting only models without uninformative parameters or using model averaging to address the issue.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Journal of Wildlife Management 74(6):1175–1178; 2010; DOI: 10.

2193/2009-367

Commentary

Uninformative Parameters and Model


Selection Using Akaike’s
Information Criterion
TODD W. ARNOLD,1 Department of Fisheries, Wildlife and Conservation Biology, University of Minnesota, St. Paul, MN 55108, USA

ABSTRACT As use of Akaike’s Information Criterion (AIC) for model selection has become increasingly common, so has a mistake
M
involving interpretation of models that are within 2 AIC units (DAIC 2) of the top-supported model. Such models are ,2 DAIC units
because the penalty for one additional parameter is +2 AIC units, but model deviance is not reduced by an amount sufficient to overcome the 2-
unit penalty and, hence, the additional parameter provides no net reduction in AIC. Simply put, the uninformative parameter does not explain
enough variation to justify its inclusion in the model and it should not be interpreted as having any ecological effect. Models with uninformative
parameters are frequently presented as being competitive in the Journal of Wildlife Management, including 72% of all AIC-based papers in 2008,
and authors and readers need to be more aware of this problem and take appropriate steps to eliminate misinterpretation. I reviewed 5 potential
solutions to this problem: 1) report all models but ignore or dismiss those with uninformative parameters, 2) use model averaging to ameliorate
the effect of uninformative parameters, 3) use 95% confidence intervals to identify uninformative parameters, 4) perform all-possible subsets
regression and use weight-of-evidence approaches to discriminate useful from uninformative parameters, or 5) adopt a methodological approach
that allows models containing uninformative parameters to be culled from reported model sets. The first approach is preferable for small sets of
a priori models, whereas the last 2 approaches should be used for large model sets or exploratory modeling.

KEY WORDS Akaike’s Information Criterion (AIC), Akaike-best model, model averaging, model selection, parameter
selection, uninformative parameters.

In the last decade, information-theoretic approaches have units of the best model, or 3 extra parameters that fall within
largely supplanted null hypothesis testing in the wildlife approximately 6 DAIC units of the best model, distances
literature (Anderson and Burnham 2002, Burnham and that are often interpreted as meaningful.
Anderson 2002). Although this is a largely constructive
paradigm shift, I nevertheless share concerns that one A WORKED EXAMPLE
statistical ritual has replaced another and that comparative I illustrate the problem of uninformative parameters using a
ranking of models now overshadows ecological interpreta- recently published data set on detection probabilities of
tion of those models (Guthery et al. 2005, Chamberlain breeding waterfowl pairs in North Dakota, USA (Pagano
2008, Guthery 2008). One small but incessantly common and Arnold 2009). Model selection in that study was based
problem that contributes to this is the reporting and on AIC, which is defined as 22logL(hIy) + 2K, where
interpretation of models that are not truly competitive with logL(hIy) is the maximized log-likelihood of the model
top-ranking models, but appear competitive by virtue of low parameters given the data and K is the number of estimable
Akaike’s Information Criterion (AIC) scores. This occurs parameters (Burnham and Anderson 2002:61). For any
whenever a variable with poor explanatory power is added to well-supported approximating model, it is possible to add
M
an otherwise good model and the result is a model with any single parameter and achieve a new model that is 2
DAIC , 2, a distance widely interpreted as indicating a AIC units from the well-supported model, because even if
‘‘substantial level of empirical support’’ (Burnham and the additional parameter has no explanatory ability what-
Anderson 2002:170). However, this is an erroneous soever (i.e., log-likelihood is unchanged), AIC will only
interpretation, and Burnham and Anderson (2002:131) increase by 2 due to the 1-unit increase in K. For example,
found this issue important enough to put inside a text box Pagano and Arnold (2009, table 2) reported a 16-parameter
(something they did only 29 times in 454 text pages): model where detection probabilities (p) of breeding duck
pairs were described by a factorial combination of 2
Models having Di [DAIC] within about 0–2 units of the best observers (obs) and 8 species (spp). Pagano and Arnold
model should be examined to see whether they differ from the (2009) considered additional covariates that might affect
best model by 1 parameter and have essentially the same values detection probabilities and modeled these covariates to have
of the maximized log-likelihood as the best model. In this case, an additive effect over both observers and all species (i.e.,
the larger model is not really supported or competitive, but DK 5 1). Effective sample size (n) for this data set was
rather is ‘close’ only because it adds 1 parameter and therefore 6,162, so the small sample adjustment to AICc of 17 versus
will be within 2 Di units, even though the fit, as measured by the 16 parameters is a nearly negligible 0.01. Hereafter I will use
log-likelihood value, is not improved. AIC and assume n/K large and overdispersion (c) negligible,
Obviously, a similar caveat would apply to models with 2 but these criticisms also apply to model selection based on
extra parameters that fall within approximately 4 DAIC AICc and QAICc, although the boundaries are no longer
precisely restricted to ,2 DAIC units, but may be somewhat
1
E-mail: [email protected] larger depending on values of n/K. Based on their review of

Arnold N Uninformative Parameters and AIC 1175


Table 1. Models examining effects of various covariates on detection probabilities of indicated breeding pairs of waterfowl in North Dakota, USA (from
Pagano and Arnold 2009). I added single parameters assuming an additive effect to the base model, which included K 5 16 parameters (8 species 3 2
observers). Three of these covariates were considered biologically feasible (total ducks, vegetative cover, and cover type), 6 were not (random 5, 4, 8, and 1; not
Sunday, Monday, or Wednesday; and last duck seen a mallard), and I excluded 6 additional nonsense or random variables (DAIC 5 0.64–2.00) from
presentation. I evaluated all models compared to the base model using Akaike’s Information Criterion (AIC), DAIC, and changes in model deviance (Dev).
Model AIC DAIC K Dev
Total ducks 4,426.71 216.62 17 4,392.71
Random 5 4,439.95 23.38 17 4,405.95
Random 4 4,442.20 21.13 17 4,408.19
Vegetative cover 4,442.65 20.68 17 4,408.65
Random 8 4,442.81 20.52 17 4,408.80
Random 1 4,442.90 20.43 17 4,408.90
Base model 4,443.34 0 16 4,411.33
Not Sunday, Monday, or Wednesday 4,445.09 1.75 17 4,411.08
Cover type 4,445.25 1.92 17 4,411.25
Last duck seen a mallard 4,445.33 1.99 17 4,411.32

the literature, Pagano and Arnold (2009) considered 12 opposed to a 1 in 20 chance based on traditional hypothesis
additional covariates that they believed might affect testing at a 5 0.05. When sample sizes are large as in
detection probabilities and found that 7 of them were Pagano and Arnold (2009), even AIC-supported variables
supported by net reductions in AICc, whereas all 12 variables can have minimal biological effect (Guthery 2008).
M Interpreting variables that are not supported by lower AIC
produced models that were 1.92 DAICc units from model
p[obs 3 spp]. Indeed, so were 4 nonsensical variables that I would further exacerbate this problem.
considered specifically for this commentary, such as whether
the last duck seen was a mallard (Anas platyrhynchos), EXTENT OF THE PROBLEM
whether the next duck seen was a northern pintail (A. acuta), I reviewed all papers published in Volume 72 (2008) of the
whether the survey was conducted on a day that included the Journal of Wildlife Management (JWM) looking for evidence
letter n (i.e., Sunday, Monday, or Wednesday), and that authors were interpreting models that were ,2 DAIC
log[(standardized temp/standardized wind speed)2], plus 8 units from the best-approximating model and differed only
completely random variables generated using Z-distribu- in having one additional parameter. Of 60 papers that
M
tions (Table 1; DAIC 2.00 for all 12 variables, with 4 of provided tables of AIC-ranked models, 43 (72%) reported
them leading to net reductions in AIC). hierarchically more complex models (i.e., models containing
L 1 additional parameters not found in the best model) that
The ultimate objective of Pagano and Arnold (2009) was
to assess whether double-observer methodologies provided were ,2 DAIC units from the top-ranking model and 35 of
enhanced prediction of breeding duck pairs. Selection of these 43 papers (81%) contained interpretation errors
top-ranked models is only the first step in this process; involving these additional parameters. These errors ranged
biological interpretation of parameter effects is an essential from egregious (e.g., 15 papers that drew biological
second step. Total ducks had the largest influence on inference from the additional parameters), to disconcerting
detection probabilities (DAIC 5 16.62); model-based (e.g., 30 papers that considered these models to be
detection probabilities for mallards were 0.87 if there were competitive with the top-ranked model), to benign (e.g.,
no other ducks on the wetland, versus 0.75 if there were 60 18 papers that model-averaged these models with better
other ducks on the wetland, which represents a substantial supported models). If using valuable journal space to
reduction in sightability, and this effect was even larger for summarize noncompetitive models qualifies as an error
cryptic species like ruddy ducks (Oxyura jamaicensis). Extent (Guthery 2008), many additional papers could have been
of vegetative cover on surveyed wetlands led to a much lower labeled erroneous. Only 4 papers explicitly identified the
0.68-unit reduction in AIC; mallards on wetlands com- additional variables as uninformative (Bentzen et al. 2008,
pletely ringed by tall emergent vegetation had 0.84 detection Devries et al. 2008, Koneff et al. 2008, Odell et al. 2008)
probabilities, whereas mallards on wetlands with no tall without also resorting to a criterion such as 95% confidence
emergent vegetation had 0.86 detection probabilities, but intervals that could have also rejected legitimate parameters.
wetlands with less than half of their perimeters surrounded
by tall emergent comprised ,20% of sampled wetlands. POTENTIAL SOLUTIONS
L
Clearly, vegetative cover could be ignored without intro- There are 5 potential solutions to the 2 DAIC problem,
ducing important bias, even though its effect was supported and authors of 2008 JWM articles employed all of them,
by lower AIC. But if we do include covariates such as oftentimes in combination.
vegetative cover, we would by the same DAIC criterion also Full reporting.—If a truly limited set of a priori models
include the clearly spurious random variable numbers 1, 8, 4, are considered from the outset, then it probably makes sense
and 5 (Table 1). An underappreciated facet of AIC-based to report and discuss all models, including those with one
model selection is that it has about a 1 in 6 chance of additional but uninformative parameter. However, the
admitting a spurious variable based on lower AIC, as reporting should not be that these models are competitive

1176 The Journal of Wildlife Management N 74(6)


with the higher ranked models, but rather that the additional approximately 85% confidence intervals exclude zero (i.e., if
variable(s) received little to no support, depending on the likelihood-ratio x2 . 2 on 1 degree of freedom, then P ,
level of reduction in deviance versus the top-supported model 0.157). It makes little sense to select variables at P , 0.157
(see also Anderson and Burnham 2002:916). For example, using AIC and then turn around and dismiss them at P .
Odell et al. (2008) considered just 7 models to discriminate 0.05 using 95% confidence intervals. A couple of authors
active versus inactive black-tailed prairie dog (Cynomys made an important step in the right direction by using 90%
ludovicianus) colonies and although those authors included confidence intervals for their parameter estimates (Hein et
all 7 models in their table of results, they correctly ignored al. 2008, Long et al. 2008); those authors just needed to take
their second- and third-ranked models as being unsupported it 5% further and use 85% confidence intervals and they
embellishments of their top-ranked model. Koneff et al. would have been fully AIC compatible. If an ability to
(2008) went one step further and devoted additional text to generate 85% confidence intervals were widely available in
explain that an uninformative parameter in their analysis computer programs like MARK (White and Burnham
(group size of indicated waterfowl pairs) had no discernable 1999), then this might be a more highly favored solution.
effect on detection probabilities. If the a priori model set is But using 95% confidence intervals with information-
small enough that information from all a priori models can be theoretic approaches leads to variable-selection ambivalence
readily presented and authors also describe the lack of effect when b/standard error (SE)(b) 5 1.4–2.0, and ambivalence
for uninformative parameters, then this seems like an ideal is not a hallmark of good scientific writing.
solution to the problem. Full reporting is also warranted for Relative variable importance.—If the primary objective
studies testing specific hypotheses about impacts of certain of modeling is to evaluate the relative importance of many
predictor variables, at least with respect to the variables of potential predictor variables, such as in many habitat-
interest (e.g., Vercauteren et al. [2008] on testing efficacy of selection studies, then summing Akaike model weights across
dogs at deterring deer from interacting with cattle). However, all models that include that variable can be a useful approach
this approach becomes unworkable for model sets that are too (Burnham and Anderson 2002:167–169). When comparing
large to justify full reporting, which includes most of the summed model weights it is important that each of j variables
papers I reviewed. be included in an equal number of models and the easiest way
Model averaging.—An especially common practice in to achieve this is by considering all possible combinations of 2j
JWM articles was to model average over all models, over all models (even more combinations are possible with interac-
models within some cumulative weight (typically 90% or tions and quadratic terms). But unless all variables lead to
95%), or over all models within some range of DAIC lower AIC, this approach of considering all possible
(typically 2, 4, or 7). One of the apparent benefits of model combinations will produce many models that are within 2–
averaging was that it minimized the effect of uninformative 4 AIC units of the model with minimal AIC (i.e., ranges of
parameters, particularly if coefficients for these variables DAIC that are frequently used as cut-offs for interpretation).
were assumed to be zero in models where those variables But there is simply no compelling reason to put all of these
were absent (Burnham and Anderson 2002:151–153). And models and their AIC scores into a table for publication,
if uninformative parameters are truly independent (i.e., because finding an AIC-best model was not the objective. A
uncorrelated with other, more useful variables), model table that includes a list of individual variables, their
averaging will typically have little impact on the bias and cumulative model weights, and model-averaged parameter
precision of the more useful parameter estimates (T. L. estimates (or some other indication of biological effect size) is
Shaffer, United States Geological Survey, personal commu- all that is really required (e.g., Tipton et al. 2008, table 1).
nication). However, in many cases where investigators used However, if j is large, this approach misses much of the
model averaging, if models that included uninformative elegance of the modeling philosophy originally advocated by
parameters had been ignored, the top model would have Burnham and Anderson (2002:147): ‘‘just because AIC was
received 80–90% of model weight and there would have used as a selection criterion does not mean that valid inference
been little or no model-selection uncertainty. Model can be expected. The primary mistake here is a common one:
averaging is probably best employed as a tool to deal with the failure to posit a small set of a priori models, each
L
legitimate model-selection uncertainty (e.g., 2 unnested representing a plausible research hypothesis.’’
models, all with substantial support) and when the primary Discarding models with uninformative parameters.—When
goal is prediction rather than variable selection. Although a sequential modeling approach is used to evaluate a large suite
several authors used model averaging to deal with model- of potential models, as is often done in an exploratory context
selection uncertainty, I could only identify one instance after first considering a more limited set of a priori models,
where it seemed particularly useful (Saracco et al. 2008). some authors have adopted an a priori modeling approach that
Confidence intervals.—Several authors discounted the allows models with uninformative parameters to be discarded
importance of uninformative parameters, but only after without further consideration. Fondell et al. (2008) adopted a
determining that 95% confidence intervals included zero. hierarchical modeling approach wherein they retained only the
The main problem with this solution is that it can also AIC–best-ranked model from the previous step when they
discard variables in best-approximating models that are moved on to consider a new suite of covariates. Although
supported by lower AIC values. For n/K . 40, AIC-based models with uninformative parameters were reported at each
model selection will support additional variables whose stage (Fondell et al. 2008, table 1), they were not allowed to

Arnold N Uninformative Parameters and AIC 1177


propagate in subsequent steps. Devries et al. (2008:1793) reporting if the primary objective is to identify a most
included an even more eloquent recognition of the problem: parsimonious model (Devries et al. 2008, Pagano and Arnold
‘‘Among ranked models, we considered a model to be a 2009). In either case, there is no need to include models with
competitor for drawing inference if parameters in the top uninformative parameters in tables of model rankings.
model were not simply a subset of those in the competing Whatever method is ultimately adopted, the primary
model (Burnham and Anderson 2002).’’ Models that failed this objective should be to move beyond model ranking to model
test were excluded from tables of competitive models, but a interpretation (Guthery et al. 2005), and having a smaller
careful reading of the methods of Devries et al. (2008) subset of models that are deemed to be competitive would
nevertheless allows identification of all models they considered. represent a small but important step in the right direction.
Pagano and Arnold (2009:394) conducted an exploratory
analysis of covariates affecting detection probabilities by fitting ACKNOWLEDGMENTS
a full model that included all covariates, from which those I thank F. S. Guthery, D. H. Johnson, P. M. Lukacs, T. L.
authors ‘‘sequentially eliminated the least important covariate Shaffer, and 2 anonymous reviewers for helpful comments
(as identified by minimal absolute value of b/SE)… If on earlier drafts of this manuscript.
eliminating a covariate led to a reduction in AICc we discarded
the higher order model from our model set. We continued this LITERATURE CITED
approach, sequentially deleting the least important covariate, Anderson, D. R., and K. P. Burnham. 2002. Avoiding pitfalls when using
until no additional covariate could be eliminated without information-theoretic methods. Journal of Wildlife Management
leading to an increase in AICc.’’ Models that were hierarchi- 66:912–918.
Bentzen, R. L., A. N. Powell, and R. S. Suydam. 2008. Factors influencing
cally more complex versions of the top model were not nesting success of king eiders on northern Alaska’s coastal plain. Journal
reported, and valuable journal space was not wasted on models of Wildlife Management 72:1781–1789.
that were not actually competitive, nor were these models Burnham, K. P., and D. R. Anderson. 2002. Model selection and
allowed to cannibalize model weight that legitimately belonged multimodel inference: a practical information-theoretic approach. Second
edition. Springer, New York, New York, USA.
to the hierarchically simpler model. However, critiques of Chamberlain, M. J. 2008. Are we sacrificing biology for statistics? Journal
sequential model fitting include its ad hoc approach and the of Wildlife Management 72:1057–1058.
potential for model selection bias (Burnham and Anderson Devries, J. H., L. M. Armstrong, R. J. MacFarlane, L. Moats, and P. T.
2002:43–45). Thoroughgood. 2008. Waterfowl nesting in fall-seeded and spring-
seeded cropland in Saskatchewan. Journal of Wildlife Management
72:1790–1797.
RECOMMENDATIONS Fondell, T. F., D. A. Miller, J. B. Grand, and R. M. Anthony. 2008.
Recognition of the 2 DAIC problem as it applies to Survival of dusky Canada goose goslings in relation to weather and
uninformative parameters is an important first step, but annual nest success. Journal of Wildlife Management 72:1614–1621.
Guthery, F. S. 2008. Statistical ritual versus knowledge accrual in wildlife
published errors still abound even though Burnham and science. Journal of Wildlife Management 72:1872–1875.
Anderson (2002) called explicit attention to this problem Guthery, F. S., L. A. Brennan, M. J. Peterson, and J. J. Lusk. 2005.
(see also Anderson and Burnham 2002, Guthery et al. Information theory in wildlife science: critique and viewpoint. Journal of
2005). I reviewed 5 potential solutions to this problem, but Wildlife Management 69:457–465.
Hein, C. D., S. B. Castleberry, and K. V. Miller. 2008. Male Seminole bat
each solution had weaknesses, and none provided a universal winter roost-site selection in a managed forest. Journal of Wildlife
solution to the problem. This is actually a beneficial Management 72:1756–1764.
outcome because it requires researchers to carefully consider Koneff, M. D., J. A. Royle, M. C. Otto, J. S. Wortham, and J. K. Bidwell.
which approach to use and does not allow statistical ritual to 2008. A double-observer method to estimate detection rate during aerial
waterfowl surveys. Journal of Wildlife Management 72:1641–1649.
replace the practice of careful thinking (Guthery 2008). Long, R. A., J. L. Rachlow, and J. G. Kie. 2008. Effects of season and scale
For studies employing truly limited sets of a priori models on response of elk and mule deer to habitat manipulation. Journal of
M
(e.g., n 10), I recommend reporting all models and taking Wildlife Management 72:1133–1142.
care to explain to readers that models with AIC scores near Odell, E. A., F. M. Pusateri, and G. C. White. 2008. Estimation of
occupied and unoccupied black-tailed prairie dog colony acreage in
the top-ranked model might not be competitive as based on Colorado. Journal of Wildlife Management 72:1311–1317.
consideration of model deviance (Burnham and Anderson Pagano, A. M., and T. W. Arnold. 2009. Detection probabilities for
2002:131). I also recommend full reporting of any models ground-based breeding waterfowl surveys. Journal of Wildlife Manage-
that represent experimental manipulations of key variables or ment 73:392–398.
Saracco, J. F., D. F. DeSante, and D. R. Kaschube. 2008. Assessing
tests of clearly articulated a priori objectives. In both cases, landbird monitoring programs and demographic causes of population
further discussion of parameter estimates, their uncertainty, trends. Journal of Wildlife Management 72:1665–1673.
and their biological interpretation is warranted and investi- Tipton, H. C., V. J. Dreitz, and P. F. Doherty, Jr. 2008. Occupancy of
gators might consider using 85% confidence intervals so that mountain plover and burrowing owl in Colorado. Journal of Wildlife
Management 72:1001–1006.
model-selection and parameter-evaluation criteria are more Vercauteren, K. C., M. J. Lavelle, and G. E. Phillips. 2008. Livestock
congruent. For exploratory approaches that involve many protection dogs for deterring deer from cattle and feed. Journal of
variables, I recommend using balanced variable sets and Wildlife Management 72:1443–1448.
summed Akaike model weights if the primary goal is variable White, G. C., and K. P. Burnham. 1999. Program MARK: survival
estimation for populations of marked animals. Bird Study 46(Supple-
ranking and identification (Burnham and Anderson ment):120–138.
2002:167–169) or a sequential modeling approach that allows
unsupported variables to be eliminated without further Associate Editor: Brennan.

1178 The Journal of Wildlife Management N 74(6)

You might also like