Uninformative Parameters and Model Selection Using Akaike's Information Criterion
Uninformative Parameters and Model Selection Using Akaike's Information Criterion
2193/2009-367
Commentary
ABSTRACT As use of Akaike’s Information Criterion (AIC) for model selection has become increasingly common, so has a mistake
M
involving interpretation of models that are within 2 AIC units (DAIC 2) of the top-supported model. Such models are ,2 DAIC units
because the penalty for one additional parameter is +2 AIC units, but model deviance is not reduced by an amount sufficient to overcome the 2-
unit penalty and, hence, the additional parameter provides no net reduction in AIC. Simply put, the uninformative parameter does not explain
enough variation to justify its inclusion in the model and it should not be interpreted as having any ecological effect. Models with uninformative
parameters are frequently presented as being competitive in the Journal of Wildlife Management, including 72% of all AIC-based papers in 2008,
and authors and readers need to be more aware of this problem and take appropriate steps to eliminate misinterpretation. I reviewed 5 potential
solutions to this problem: 1) report all models but ignore or dismiss those with uninformative parameters, 2) use model averaging to ameliorate
the effect of uninformative parameters, 3) use 95% confidence intervals to identify uninformative parameters, 4) perform all-possible subsets
regression and use weight-of-evidence approaches to discriminate useful from uninformative parameters, or 5) adopt a methodological approach
that allows models containing uninformative parameters to be culled from reported model sets. The first approach is preferable for small sets of
a priori models, whereas the last 2 approaches should be used for large model sets or exploratory modeling.
KEY WORDS Akaike’s Information Criterion (AIC), Akaike-best model, model averaging, model selection, parameter
selection, uninformative parameters.
In the last decade, information-theoretic approaches have units of the best model, or 3 extra parameters that fall within
largely supplanted null hypothesis testing in the wildlife approximately 6 DAIC units of the best model, distances
literature (Anderson and Burnham 2002, Burnham and that are often interpreted as meaningful.
Anderson 2002). Although this is a largely constructive
paradigm shift, I nevertheless share concerns that one A WORKED EXAMPLE
statistical ritual has replaced another and that comparative I illustrate the problem of uninformative parameters using a
ranking of models now overshadows ecological interpreta- recently published data set on detection probabilities of
tion of those models (Guthery et al. 2005, Chamberlain breeding waterfowl pairs in North Dakota, USA (Pagano
2008, Guthery 2008). One small but incessantly common and Arnold 2009). Model selection in that study was based
problem that contributes to this is the reporting and on AIC, which is defined as 22logL(hIy) + 2K, where
interpretation of models that are not truly competitive with logL(hIy) is the maximized log-likelihood of the model
top-ranking models, but appear competitive by virtue of low parameters given the data and K is the number of estimable
Akaike’s Information Criterion (AIC) scores. This occurs parameters (Burnham and Anderson 2002:61). For any
whenever a variable with poor explanatory power is added to well-supported approximating model, it is possible to add
M
an otherwise good model and the result is a model with any single parameter and achieve a new model that is 2
DAIC , 2, a distance widely interpreted as indicating a AIC units from the well-supported model, because even if
‘‘substantial level of empirical support’’ (Burnham and the additional parameter has no explanatory ability what-
Anderson 2002:170). However, this is an erroneous soever (i.e., log-likelihood is unchanged), AIC will only
interpretation, and Burnham and Anderson (2002:131) increase by 2 due to the 1-unit increase in K. For example,
found this issue important enough to put inside a text box Pagano and Arnold (2009, table 2) reported a 16-parameter
(something they did only 29 times in 454 text pages): model where detection probabilities (p) of breeding duck
pairs were described by a factorial combination of 2
Models having Di [DAIC] within about 0–2 units of the best observers (obs) and 8 species (spp). Pagano and Arnold
model should be examined to see whether they differ from the (2009) considered additional covariates that might affect
best model by 1 parameter and have essentially the same values detection probabilities and modeled these covariates to have
of the maximized log-likelihood as the best model. In this case, an additive effect over both observers and all species (i.e.,
the larger model is not really supported or competitive, but DK 5 1). Effective sample size (n) for this data set was
rather is ‘close’ only because it adds 1 parameter and therefore 6,162, so the small sample adjustment to AICc of 17 versus
will be within 2 Di units, even though the fit, as measured by the 16 parameters is a nearly negligible 0.01. Hereafter I will use
log-likelihood value, is not improved. AIC and assume n/K large and overdispersion (c) negligible,
Obviously, a similar caveat would apply to models with 2 but these criticisms also apply to model selection based on
extra parameters that fall within approximately 4 DAIC AICc and QAICc, although the boundaries are no longer
precisely restricted to ,2 DAIC units, but may be somewhat
1
E-mail: [email protected] larger depending on values of n/K. Based on their review of
the literature, Pagano and Arnold (2009) considered 12 opposed to a 1 in 20 chance based on traditional hypothesis
additional covariates that they believed might affect testing at a 5 0.05. When sample sizes are large as in
detection probabilities and found that 7 of them were Pagano and Arnold (2009), even AIC-supported variables
supported by net reductions in AICc, whereas all 12 variables can have minimal biological effect (Guthery 2008).
M Interpreting variables that are not supported by lower AIC
produced models that were 1.92 DAICc units from model
p[obs 3 spp]. Indeed, so were 4 nonsensical variables that I would further exacerbate this problem.
considered specifically for this commentary, such as whether
the last duck seen was a mallard (Anas platyrhynchos), EXTENT OF THE PROBLEM
whether the next duck seen was a northern pintail (A. acuta), I reviewed all papers published in Volume 72 (2008) of the
whether the survey was conducted on a day that included the Journal of Wildlife Management (JWM) looking for evidence
letter n (i.e., Sunday, Monday, or Wednesday), and that authors were interpreting models that were ,2 DAIC
log[(standardized temp/standardized wind speed)2], plus 8 units from the best-approximating model and differed only
completely random variables generated using Z-distribu- in having one additional parameter. Of 60 papers that
M
tions (Table 1; DAIC 2.00 for all 12 variables, with 4 of provided tables of AIC-ranked models, 43 (72%) reported
them leading to net reductions in AIC). hierarchically more complex models (i.e., models containing
L 1 additional parameters not found in the best model) that
The ultimate objective of Pagano and Arnold (2009) was
to assess whether double-observer methodologies provided were ,2 DAIC units from the top-ranking model and 35 of
enhanced prediction of breeding duck pairs. Selection of these 43 papers (81%) contained interpretation errors
top-ranked models is only the first step in this process; involving these additional parameters. These errors ranged
biological interpretation of parameter effects is an essential from egregious (e.g., 15 papers that drew biological
second step. Total ducks had the largest influence on inference from the additional parameters), to disconcerting
detection probabilities (DAIC 5 16.62); model-based (e.g., 30 papers that considered these models to be
detection probabilities for mallards were 0.87 if there were competitive with the top-ranked model), to benign (e.g.,
no other ducks on the wetland, versus 0.75 if there were 60 18 papers that model-averaged these models with better
other ducks on the wetland, which represents a substantial supported models). If using valuable journal space to
reduction in sightability, and this effect was even larger for summarize noncompetitive models qualifies as an error
cryptic species like ruddy ducks (Oxyura jamaicensis). Extent (Guthery 2008), many additional papers could have been
of vegetative cover on surveyed wetlands led to a much lower labeled erroneous. Only 4 papers explicitly identified the
0.68-unit reduction in AIC; mallards on wetlands com- additional variables as uninformative (Bentzen et al. 2008,
pletely ringed by tall emergent vegetation had 0.84 detection Devries et al. 2008, Koneff et al. 2008, Odell et al. 2008)
probabilities, whereas mallards on wetlands with no tall without also resorting to a criterion such as 95% confidence
emergent vegetation had 0.86 detection probabilities, but intervals that could have also rejected legitimate parameters.
wetlands with less than half of their perimeters surrounded
by tall emergent comprised ,20% of sampled wetlands. POTENTIAL SOLUTIONS
L
Clearly, vegetative cover could be ignored without intro- There are 5 potential solutions to the 2 DAIC problem,
ducing important bias, even though its effect was supported and authors of 2008 JWM articles employed all of them,
by lower AIC. But if we do include covariates such as oftentimes in combination.
vegetative cover, we would by the same DAIC criterion also Full reporting.—If a truly limited set of a priori models
include the clearly spurious random variable numbers 1, 8, 4, are considered from the outset, then it probably makes sense
and 5 (Table 1). An underappreciated facet of AIC-based to report and discuss all models, including those with one
model selection is that it has about a 1 in 6 chance of additional but uninformative parameter. However, the
admitting a spurious variable based on lower AIC, as reporting should not be that these models are competitive