0% found this document useful (0 votes)
7 views

Adding An Egg in Algorithmic Decision Making Improving Stakeholder and User Perceptions and Predictive Validity by Enhancing Autonomy

Uploaded by

Ishan Kakkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Adding An Egg in Algorithmic Decision Making Improving Stakeholder and User Perceptions and Predictive Validity by Enhancing Autonomy

Uploaded by

Ishan Kakkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

European Journal of Work and Organizational Psychology

ISSN: (Print) (Online) Journal homepage: www.tandfonline.com/journals/pewo20

“Adding an egg” in algorithmic decision making:


improving stakeholder and user perceptions, and
predictive validity by enhancing autonomy

Marvin Neumann, A. Susan M. Niessen, Maximilian Linde, Jorge N. Tendeiro


& Rob R. Meijer

To cite this article: Marvin Neumann, A. Susan M. Niessen, Maximilian Linde, Jorge N. Tendeiro
& Rob R. Meijer (2024) “Adding an egg” in algorithmic decision making: improving stakeholder
and user perceptions, and predictive validity by enhancing autonomy, European Journal of
Work and Organizational Psychology, 33:3, 245-262, DOI: 10.1080/1359432X.2023.2260540

To link to this article: https://doi.org/10.1080/1359432X.2023.2260540

© 2023 The Author(s). Published by Informa


UK Limited, trading as Taylor & Francis
Group.

View supplementary material

Published online: 26 Sep 2023.

Submit your article to this journal

Article views: 1016

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=pewo20
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY
2024, VOL. 33, NO. 3, 245–262
https://doi.org/10.1080/1359432X.2023.2260540

“Adding an egg” in algorithmic decision making: improving stakeholder and user


perceptions, and predictive validity by enhancing autonomy
a b b c b
Marvin Neumann , A. Susan M. Niessen , Maximilian Linde , Jorge N. Tendeiro and Rob R. Meijer
a
Department of Experimental and Applied Psychology, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam,
The Netherlands; bDepartment of Psychometrics and Statistics, Faculty of Behavioral and Social Sciences, University of Groningen, Groningen, The
Netherlands; cOffice of Research and Academia-Government-Community Collaboration, Education, Research Center for Artificial Intelligence and
Data Innovation, Hiroshima University, Hiroshima, Japan

ABSTRACT ARTICLE HISTORY


Decision makers often combine multiple pieces of information to make performance predictions and Received 23 March 2023
hiring decisions. More valid predictions are made when information is combined algorithmically Accepted 13 September 2023
(mechanical prediction) rather than in the decision-maker’s mind (holistic prediction). Yet, decision KEYWORDS
makers rarely use algorithms in practice. One reason is that decision makers are worried about negative Algorithm use; personnel
evaluations from other stakeholders such as colleagues when using algorithms. We hypothesized that selection; decision making;
such stakeholders evaluate decision makers more positively when they use autonomy-enhancing algo­ human-automation
rithmic procedures (AEAPs, holistically adjust predictions from a prescribed algorithm or self-design an collaboration; decision-
algorithm), than when they use a prescribed algorithm. Relatedly, we hypothesized that decision makers support
who use AEAPs are less worried about negative stakeholder evaluations, and more likely to use algo­
rithms in performance predictions. In Study 1 (N = 582), stakeholders evaluated decision makers more
positively when they used AEAPs rather than a prescribed algorithm. In Study 2 (N = 269), decision makers
were less worried about negative stakeholder evaluations and more likely to use AEAPs compared to
a prescribed algorithm. Importantly, using AEAPs also resulted in substantially higher predictive validity
than holistic prediction. We recommend the use of self-designed algorithms to improve perceptions and
validity.

Decision makers such as psychologists and managers often use in these inherently uncertain contexts where outcome feed­
multiple pieces of information (e.g., test scores and interview back is often substantially delayed or absent (Kahneman &
ratings) to make performance predictions and hiring decisions. Klein, 2009). The inconsistency of holistic predictions is
Such information is typically combined in the decision-maker’s a major reason why algorithms result in higher predictive valid­
mind (Neumann et al., 2023), which is known as holistic (or ity; even algorithms with random positive weights that are
clinical, expert) prediction. Alternatively, information could be consistently applied across applicants can outperform holistic
combined through an algorithm, which is known as mechanical prediction (Dawes & Corrigan, 1974; Yu & Kuncel, 2020).
(or statistical, actuarial, algorithmic) prediction (Meehl, 1954). Relatedly, Yu and Kuncel (2022) found no evidence that deci­
Algorithms can take the form of very simple rules, although sion makers can accurately tailor their performance predictions
more advanced machine learning methods are possible to specific organizational contexts. Although tailoring to the
(Highhouse & Brooks, 2023). A simple algorithm would be to context could theoretically improve predictive validity, equally
give equal weight to a test score and an interview rating, and to weighting the available information and hence ignoring orga­
hire the candidate(s) with the highest score(s). Predictor nizational context outperformed “tailored” holistic predictions.
weights could also be different, and could be chosen based So, even simple algorithms result in better performance pre­
on decision-makers’ judgement or statistical analyses (Kuncel, dictions than holistic predictions.
2018). Despite these robust research findings, decision makers
A robust finding is that more valid performance predictions rarely use algorithms for subjective tasks typically performed
are made when information is combined algorithmically by humans like hiring (Castelo et al., 2019; Neumann et al.,
(Kuncel et al., 2013). The superiority of mechanical over holistic 2023). This results in suboptimal and untransparent hiring
prediction has been demonstrated in various other contexts, decisions (Meijer et al., 2020). Since algorithm use is
including medical diagnoses, parole decisions, and clinical a promising strategy to improve decision making (Milkman
treatment selection (Ægisdóttir et al., 2006; Dawes et al., 1989; et al., 2009), the aim of this paper was to identify algorithmic
Guay & Parent, 2018). This finding also holds when considering decision-making procedures that are liked by stakeholders,
experts’ holistic predictions (Grove et al., 2000; Highhouse & liked and used by decision makers, and that result in higher
Brooks, 2023; Yu & Kuncel, 2020), as expertise is hard to develop predictive validity than holistic prediction (Kuncel, 2018;

CONTACT Marvin Neumann [email protected]


Supplemental data for this article can be accessed online at https://doi.org/10.1080/1359432X.2023.2260540
© 2023 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/),
which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. The
terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent.
246 M. NEUMANN ET AL.

Neumann et al., 2021). We defined decision makers as people While teaching a workshop about these autonomy-enhancing
who make hiring decisions themselves (e.g., hiring managers, algorithmic procedures, an attendee, Georgios Fraros, noted that
HR professionals), based on their own judgement or by using the problem and solution presented was in line with a story
algorithms (Langer & Landers, 2021). In contrast, we defined about the introduction of an instant cake mix that only required
stakeholders as people who may observe a decision maker adding water. This time-saving innovation didn’t sell nearly as
making a hiring decision, and who may or may not be directly well as the proud manufacturers had expected. It turned out that
affected by the decision (e.g., higher management, direct col­ presenting cake as homemade after just adding water and stir­
leagues, team members of the future employee, or (internal) ring made people feel guilty. They wanted to put in more effort
applicants).1 and make it their own; they wanted to have to add an egg. While
One important reason why algorithms are rarely used is the this entertaining story may be a myth (Mikkelson, 2008), our
“threat of technological unemployment” (TOTU, Meehl, 1986, expectations are also in line with recent findings by Gonzalez
p. 374): Decision makers worry that they provide less value to et al. (2022). They showed that applicants’ perceptions were
their organization when using algorithms because other stake­ similar when predictions were made holistically or when using
holders may not attribute the hiring process and outcome to an autonomy-enhancing algorithm (holistically adjusting predic­
their holistic judgement anymore (Neumann et al., 2023; Nolan tions from a prescribed algorithm), but more positive compared
et al., 2016, 2020). Indeed, stakeholders give decision makers to using a prescribed algorithm. Relatedly, Kares et al. (2023)
less credit for their hiring decisions (i.e., perceive them as less found that decision makers showed similar trust in holistic pre­
responsible for and less in control of hiring decisions) when diction compared to an autonomy-enhancing algorithmic
they use algorithms rather than their holistic judgement (Nolan approach, and generally higher trust in an autonomy-
et al., 2016). Similarly, patients perceived physicians as less enhancing algorithmic approach than a prescribed algorithm.
cable and competent when they made algorithmic (vs. holistic) These results provide some first evidence that AEAPs can
diagnoses (Arkes et al., 2007). Furthermore, lay people perceive improve both stakeholders’ and decision-makers’ perceptions,
algorithm use in hiring as unprofessional and impersonal (Diab compared to prescribed algorithms that restrict autonomy.
et al., 2011), and they have less trust in algorithmic than in
human hiring decisions (Wesche et al., 2022). Importantly,
Nolan et al. (2016) showed that decision makers themselves Contributions and aims of the present studies
correctly believe that stakeholders give them less credit for
their hiring decisions when using algorithms rather than their As Langer and Landers (2021) discuss, existing studies in which
holistic judgement. These beliefs reduced decision-makers’ AEAPs have been investigated have usually focused on either
intentions to use algorithms, and this relationship was stakeholders (e.g., Newman et al., 2020) or decision makers (e.g.,
mediated by TOTU (Nolan et al., 2016, 2020). What people Nolan & Highhouse, 2014), and, importantly, almost all studies
believe others think of them is also called meta perceptions solely focused on perceptions, without investigating actual beha­
(Frey & Tropp, 2006). In line with this literature, we henceforth viour, such as effects of algorithm use on performance prediction
call decision-makers’ beliefs about how stakeholders evaluate accuracy (for exceptions in educational selection, see Dietvorst
them decision-makers’ meta-beliefs. et al., 2018; Neumann et al., 2022). Furthermore, there is a large
So, both stakeholders (Langer & Landers, 2021) and decision range of possibilities how autonomy in algorithmic decision
makers (Burton et al., 2020) have more negative attitudes making could be enhanced. However, existing studies typically
towards algorithmic compared to holistic hiring decisions. do not distinguish between different autonomy-enhancing
This is a problem because these perceptions prevent decision approaches (Burton et al., 2020; Langer & Landers, 2021), and
makers from using algorithms, which could improve the quality often introduce such approaches abstractly as “a combination
of hiring decisions (Kuncel et al., 2013). To solve this problem, between a human and an algorithm” (Gonzalez et al., 2022).
interventions need to be identified that can improve stake­ The aims of the present studies were to uniquely contribute
holder perceptions and decision-makers’ meta-beliefs, and to the literature by investigating whether AEAPs could improve
that still result in better predictions than holistic prediction. both stakeholder perceptions and decision-makers’ meta-
We expected that a useful intervention is enhancing decision- beliefs. Importantly, we also investigated whether AEAPs result
makers’ autonomy in algorithmic decision making. Autonomy in higher predictive validity compared to holistic prediction, in
could be enhanced in at least two ways, compared to a fully the context of personnel selection. This is most important to
automated approach where decision makers would have to use investigate because improving perceptions is of no avail if
a prescribed algorithm consistently and hence have no auton­ AEAPs do not also improve validity. Yet, such an investigation
omy at all. One approach is allowing decision makers to adjust is lacking in the literature, despite various calls for it (Burton
a prescribed algorithm’s job performance predictions holisti­ et al., 2020; Highhouse & Brooks, 2023; Kuncel, 2018; Langer &
cally, upwards or downwards (Dietvorst et al., 2018; Kuncel, Landers, 2021). Furthermore, we contribute to the literature by
2018). Another approach is to have decision makers design investigating the effect of two different ways to enhance auton­
their own algorithm first, by explicitly choosing predictor omy in algorithmic decision making (holistically adjusting pre­
weights (Neumann et al., 2022; Nolan & Highhouse, 2014), dictions from a prescribed algorithm or self-designing an
and then use the resulting predictions consistently, without algorithm) on perceptions and validity. Moreover, we contri­
the opportunity to adjust its predictions. We henceforth refer bute to the theoretical understanding of improving percep­
to both approaches as autonomy-enhancing algorithmic proce­ tions of algorithmic decision making through AEAPs, by
dures (AEAPs). investigating the underlying mechanisms of these perceptions.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 247

Theoretical background and hypothesis development the decision process, than when they have no autonomy (i.e.,
they must use a prescribed algorithm). Furthermore, stake­
Stakeholders should attribute hiring outcomes less to
holders may ascribe the cause of a hiring decision more to
a decision maker who uses a prescribed algorithm to make
the decision maker (internal factor) than to an algorithm (exter­
predictions than to a decision maker who makes holistic pre­
nal factor) when decision makers retain autonomy. This is
dictions. This assumption is in line with attribution theory
expected because hiring decisions would not exclusively be
(Kelley, 1973), which addresses how people explain events
determined by an external decision aid (i.e., a prescribed algo­
and perceive causality. Two key dimensions of causal attribu­
rithm), but partly by the decision-maker’s internal cognitive
tion are locus of causality and personal control (Meyer & Koelbl,
processes. Based on this reasoning, stakeholders should evalu­
1982).2 Locus of causality is the extent to which stakeholders
ate decision makers more positively when they use AEAPs
attribute a hiring decision to internal or external factors.
rather than a prescribed algorithm. We hypothesized that:
Applied to personnel selection, holistic prediction can be con­
sidered an internal factor because the decision-making process
takes place in the decision-maker’s mind. In contrast, an algo­
Hypothesis 1b: Stakeholders’ beliefs about decision-makers’
rithm may be considered an external factor because it consti­
locus of causality will be more positive when AEAPs are used,
tutes a contextual decision aid that the decision maker uses to
compared to when a prescribed algorithm is used.
make hiring decisions. Personal control is the extent to which
stakeholders perceive a decision maker as capable of changing
For brevity, we only present this hypothesis for locus of caus­
the hiring decision. Perceived personal control should be
ality. We had the same hypotheses for personal control (H2b)
higher when decision makers make holistic rather than algo­
and perceived competence (H3b).
rithmic hiring decisions. Based on attribution theory, Nolan
We expected that stakeholders perceive decision makers
et al. (2016) indeed found that stakeholders perceived decision
who use AEAPs as more in control of and more responsible
makers as less responsible for and less in control of the hiring
for the hiring decision than decision makers who use
decision when they used an algorithm rather than their holistic
a prescribed algorithm. Since people commonly believe that
judgement. Accordingly, we had the following hypotheses,
the combination of algorithmic and human judgement results
which replicate Nolan et al. (2016).
in the best decisions (Neumann et al., 2023), it may also be
expected that stakeholders’ positive beliefs about decision-
makers’ locus of causality and personal control translate into
Hypothesis 1a: Stakeholders’ beliefs about decision-makers’
more positive competence perceptions. Our conceptual model
locus of causality will be more positive3 when holistic predic­
is shown in Figure 1. We hypothesized that:
tion is used, compared to when a prescribed algorithm is used.

Hypothesis 2a: Stakeholders’ beliefs about decision-maker’s


Hypothesis 4: Enhanced autonomy in making performance
personal control will be more positive when holistic prediction
predictions will have a positive indirect effect on stakeholders’
is used, compared to when a prescribed algorithm is used.
competence perceptions of decision makers via its effect on
locus of causality and personal control.
Many people suggest that practicing holistic prediction is what
makes decision makers be seen as truly professional or compe­
Holistic prediction is popular in practice. Therefore, it is also
tent (Meehl, 1986; Neumann et al., 2023). Indeed, Diab et al.
relevant to investigate whether stakeholders perceive the use
(2011) showed that lay people considered the use of holistic
of AEAPs in a similar manner as holistic prediction. If this were
prediction as moderately more professional than mechanical
not the case, AEAPs may not be a viable method to improve
prediction when scores from interviews (but not standardized
stakeholder perceptions. Therefore, we also explored how sta­
tests) were combined. Furthermore, Nolan (2013) found that
keholders evaluate decision-makers’ use of AEAPs, compared to
decision makers felt they could better demonstrate their com­
holistic prediction.
petence when making holistic rather than algorithmic deci­
Pre-registrations including sample size justifications, analy­
sions. Thus, it can be expected that stakeholders perceive
sis code, codebooks, study materials, and raw data for both
decision makers as more competent when they determine or
studies are available on OSF https://osf.io/brc9p/?view_only=
influence the hiring decision.
211357b3a78446078715d47a633d8219.

Hypothesis 3a: Stakeholders’ beliefs about decision-maker’s Study 1


competence will be more positive when holistic prediction is
used, compared to when a prescribed algorithm is used.
Method
Participants
Autonomy in algorithm use
We recruited 644 employees via Amazon MTurk who received
Stakeholders may perceive decision makers as having more 1$ for their participation. As pre-registered, we excluded
control over hiring decisions when they retain autonomy in participants who failed at least one of three attention checks
248 M. NEUMANN ET AL.

Figure 1. Conceptual model in study 1. a AEAPs = autonomy-enhancing algorithmic procedures. AEAPs was coded as 1 and prescribed algorithm as 0.

(see Appendix A) or finished the tasks in less than a minute. prediction from a prescribed algorithm but could not influ­
After inspecting the data, we also excluded participants who ence the algorithm’s weights or the resulting predictions.
reported impossible values (e.g., longer role tenure than After reading the condition-specific descriptions, participants
organizational tenure or an incorrect MTurk ID). We retained completed the dependent measures in random order.
usable data from 582 participants. The mean age was M =
37.1 (SD = 10.6) and 55% was male. Most participants were
U.S. citizens (77%). Other participants were Indian (17%), Measures
Brazilian (3%), or had other non-EU nationalities (3%) or EU All measures were assessed on a 5-point Likert scale (1 =
nationalities (<0.01%). Furthermore, most participants were strongly disagree, 5 = strongly agree, see OSF). We used an
employed full-time (92%) and held a Bachelor or Master adapted six-item scale from Nolan and Highhouse (2014) as
degree (86%). The median study completion time was 5 a manipulation check, to measure the extent to which partici­
minutes. The ethics committee of the Heymans institute for pants perceived the manager as having autonomy in making
psychological research approved this study (code: PSY-2021- hiring decisions (α = .88). Furthermore, we used Nolan’s et al.
S-0500). (2016) three-item scales to measure personal control and locus
of causality (α = .84 and α = .83, respectively). Lastly, we used an
adapted version of Nolan’s (2013) six-item scale to measure
Design and procedure perceived competence (α = .89).
We employed a one-factorial (hiring approach) between-subjects
design with four levels (holistic, holistic adjustment, self-designed
Analytical approach
algorithm, prescribed algorithm). Participants read an introduction
of a hypothetical hiring scenario, similar to Nolan et al. (2016): We applied Bayesian parameter estimation because estimat­
“Imagine yourself in the following situation . . . The person respon­ ing the magnitude of and uncertainty around the differ­
sible for hiring decisions at your company (e.g., a manager) just hired ences of interest was the most relevant to this study. We
a new employee to fill an open position. Please read the description report posterior parameter estimates (standardized mean
of how this decision was made and answer the questions that differences, denoted as d) with corresponding 95% highest
follow”. Afterwards, participants were randomly assigned to one density intervals (HDIs). An HDI allows for the intuitive
of four conditions and read a condition-specific description of the interpretation that there is a 95% probability that the cor­
manager’s hiring approach (see Table 1). responding true parameter lies within the HDI, conditional
We varied how the manager combined a cognitive ability on the model, the priors, and the observed data (Kruschke,
test score, a score from a personality questionnaire, and 2015). We initialized a model in Stan (Stan Development
a rating from a traditional hiring interview to evaluate candi­ Team, 2022) with chains = 7, thinning = 5, warmup samples
dates. In the holistic condition, the manager used their intui­ = 5000, and actual samples = 50000, resulting in a total of 7
tion and expertise to combine the information and make (chains) * 50000 (samples)/5 (thinning) = 70000 samples.
a hiring decision. In the holistic adjustment condition, the Furthermore, we used improper (very uninformative) priors
manager saw the prediction from a prescribed algorithm for each parameter. For mediation results, we report the
and could adjust the algorithm’s prediction using their intui­ partially standardized effect size (Preacher & Kelley, 2011),
tion and expertise. In the self-designed algorithm condition, which is the ratio of the (in)direct effect to the standard
the manager designed an algorithm by choosing predictor deviation of Y. It can be interpreted as the number of
weights that reflected the importance of the information. In standard deviations of Y that two groups differ on average
the prescribed algorithm condition, the manager saw the as a result of the (in)direct mechanism.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 249

Table 1. Descriptions of the hiring approach per condition in study 1.

Condition Description of the hiring approach


Holistic The person responsible for hiring decisions used a cognitive ability test score, a score from a personality questionnaire, and a rating from
a traditional hiring interview to evaluate candidates. After this information had been obtained, the person responsible for hiring decisions
used his/her intuition and expertise to combine the information and make a hiring decision.
Holistic adjustment The person responsible for hiring decisions used a cognitive ability test score, a score from a personality questionnaire, and a rating from
a traditional hiring interview to evaluate candidates. After this information had been obtained, the information was combined into an
overall score according to the following decision rule: Applicant overall score = cognitive ability test score*weight1 + personality
questionnaire score*weight2 + traditional hiring interview rating*weight3. So, the scores and ratings of an applicant were multiplied by
weights (*) and then added up (+). The weights were chosen by an external assessment professional and reflected the relative importance
he/she gave to the cognitive ability test, the personality questionnaire, and the traditional hiring interview, respectively. The person
responsible for hiring decisions was shown the applicants’ overall scores and could use his/her intuition and expertise to adjust these
overall scores if desired. Eventually, the applicant with the highest overall score was hired.
Self-designed The person responsible for hiring decisions used a cognitive ability test score, a score from a personality questionnaire, and a rating from
algorithm a traditional hiring interview to evaluate candidates. After this information had been obtained, the information was combined into an
overall score according to the following decision rule: Applicant overall score = cognitive ability test score*weight1 + personality
questionnaire score*weight2 + traditional hiring interview rating*weight3. So, the scores and ratings of an applicant were multiplied by
weights (*) and then added up (+). The person responsible for hiring decisions chose the weights, which reflected the relative importance
he/she gave to the cognitive ability test, the personality questionnaire, and the traditional hiring interview, respectively. Eventually, the
applicant with the highest overall score was hired.
Prescribed The person responsible for hiring decisions used a cognitive ability test score, a score from a personality questionnaire, and a rating from
algorithm a traditional hiring interview to evaluate candidates. After this information had been obtained, the information was combined into an
overall score according to the following decision rule: Applicant overall score = cognitive ability test score*weight1 + personality
questionnaire score*weight2 + traditional hiring interview rating*weight3. So, the scores and ratings of an applicant were multiplied by
weights (*) and then added up (+). The weights were chosen by an external assessment professional and reflected the relative importance
he/she gave to the cognitive ability test, the personality questionnaire, and the traditional hiring interview, respectively. The person
responsible for hiring decisions had no influence on the decision-rule weights or the resulting applicant overall score. Eventually, the
applicant with the highest overall score was hired.

Results Locus of causality, personal control, and perceived


competence
Based on the results of a confirmatory factor analysis, Nolan
et al. (2016) combined the measures locus of causality and Stakeholders perceived managers who made holistic predic­
personal control into one scale. Therefore, we investigated tions as substantially more responsible for and much more in
whether such a one-factor solution would show a better fit control of the hiring decision than managers who used
than the theoretically assumed two-factor solution by compar­ a prescribed algorithm (d = 0.88, 95% HDI [0.64, 1.12] and d =
ing the RMSEA values of the two models. In sum, the two-factor 0.91, 95% HDI [0.67, 1.15], respectively). Furthermore, managers
solution showed a better fit (see Supplement S1). were also perceived as more competent when making holistic
Table 2 shows the correlations between the measures. predictions rather than using a prescribed algorithm (d = 0.51,
Figure 2 shows the observed means per condition and mea­ 95% HDI [0.27, 0.74]). Therefore, hypotheses 1a, 2a, and 3a were
sure. Table 3 shows the exact observed means and standard fully supported.
deviations. Posterior distributions with 95% HDIs for the Stakeholders also perceived managers who used AEAPs as
hypothesized effects are shown in Figure 3. As Table 3 and more responsible for and more in control of the hiring decision
Figure 2 also show, holistically adjusting predictions from than managers who used a prescribed algorithm (d = 0.58, 95%
a prescribed algorithm and self-designing an algorithm HDI [0.38, 0.78] and d = 0.76, 95% HDI [0.56, 0.96], respectively).
resulted in very similar perceptions. Therefore, these two Relatedly, managers who used AEAPs were also perceived as
AEAPs were combined in the following analyses. more competent compared to managers who used
a prescribed algorithm, although the effect size was smaller
(d = 0.30, 95% HDI [0.10, 0.50]). So, hypotheses 1b, 2b, and 3b
Autonomy (manipulation check)
were also fully supported.
As expected, stakeholders perceived managers who used To constitute a viable intervention, AEAPs should result in
AEAPs as having more autonomy than managers who used similarly positive stakeholder evaluations as holistic prediction,
a prescribed algorithm (d = 0.53, 95% HDI [0.33, 0.73]). This which is dominant in practice. Although stakeholders also per­
suggests that our manipulation was successful. ceived managers who made holistic predictions as somewhat

Table 2. Correlations between measures in study 1.

Measure 1. 2. 3.
1. Autonomya - -
2. Locus of causality .76 - -
3. Personal control .74 .78 -
4. Perceived competence .68 .65 .54
Correlations were calculated across conditions. N = 582. a Manipulation check.
250 M. NEUMANN ET AL.

Figure 2. Observed means and raw data per condition and measure in study 1. Error bars represent 95% confidence intervals. Some jittering in the x-axis direction was
added to the plot to improve readability.

Table 3. Observed means and standard deviations per condition and measure in study 1.

Autonomya Locus of causality Personal control Perceived competence

Condition M SD M SD M SD M SD
Holistic 4.12 0.50 4.21 0.58 4.11 0.65 4.18 0.56
Holistic adjustment 3.93 0.71 4.02 0.71 4.02 0.75 4.09 0.69
Self-designed algorithm 3.90 0.75 3.90 0.89 3.91 0.94 3.97 0.76
Prescribed algorithm 3.50 1.05 3.46 1.14 3.27 1.22 3.82 0.80
a
Manipulation check.

more responsible for and somewhat more in control of the competent than managers who used a prescribed algorithm
hiring decision than managers who used AEAPs (d = 0.29, 95% via its effect on locus of causality (partially standardized effect
HDI [0.09, 0.50] and d = 0.15, 95% HDI [−0.05, 0.35], respec­ size = 0.32). Furthermore, zero was a credible value for the
tively), this difference was relatively small compared to the direct effect of enhanced autonomy on perceived competence
difference between using AEAPs and using a prescribed algo­ (partially standardized effect size = −0.09). We did not find
rithm. We found similar results for perceived competence (d = enough evidence that personal control mediated the relation
0.21, 95% HDI [0.00, 0.41]). between enhanced autonomy and perceived competence (par­
tially standardized effect size = 0.06). In sum, we found partial
support for hypothesis 4.
Mediation
We first conducted feasibility checks as outlined in Murphy
Discussion
(2021) to assess whether mediation was plausible (cf. hypoth­
esis 4). These include checking whether the independent vari­ In line with findings by Nolan et al. (2016), stakeholders per­
able and the mediator(s) are indeed correlated with the ceived decision makers who made holistic predictions as being
dependent variable, and whether the semipartial correlation more competent, more responsible for, and more in control of
between the independent and dependent variable is close to hiring decisions than decision makers who used a prescribed
zero when controlling for the mediator(s). Table 4 shows the algorithm. Extending Nolan’s et al. (2016) findings, we found
correlations between the variables included in the proposed that stakeholders evaluated decision makers much more posi­
mediation model. Although the correlation between enhanced tively when they adjusted predictions from a prescribed algo­
autonomy and perceived competence was rather small (r = .13), rithm or self-designed an algorithm, compared to when they
the semi-partial correlation was close to zero when controlling strictly used a prescribed algorithm. Moreover, stakeholders
for the mediators (r = −.04). This pattern of correlations sug­ evaluated decision makers who used AEAPs only slightly
gests that mediation is plausible. more negatively than decision makers who made holistic pre­
As Table 5 shows, we found some evidence that stake­ dictions. While this may suggest that stakeholders somewhat
holders perceived managers who used AEAPs as more divided their attributions of causality, control, and competence
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 251

Figure 3. Posterior distributions for the hypothesized effects in study 1. The grey area under each curve depicts the 95% most probable effect sizes, given the model
and the data. H = holistic prediction, AEAPs = autonomy-enhancing algorithmic procedures, and PA = prescribed algorithm.

Table 4. Correlations between variables included in the pre-registered mediation model in study 1.

Measure 1. 2. 3.
1. Enhanced autonomya -
2. Locus of causality .25 -
3. Personal control .31 .78 -
4. Perceived competence .13 .67 .54
a
Enhanced autonomy was coded 1 for the two autonomy-enhancing conditions (holistic adjustment and self-designed algorithm) and
0 for the prescribed algorithm condition. Therefore, data from the holistic condition was disregarded, and hence, N = 449.

Table 5. Mediation results from study 1.

Paths Unstandardized effect estimate Partially standardized effect size


Enhanced autonomy → Locus of causality → Perceived competence (indirect) 0.28 [0.15, 0.36] .32
Enhanced autonomy → Personal control → Perceived competence (indirect) 0.04 [−0.03, 0.09] .06
Enhanced autonomy → Perceived competence (direct) −0.07 [−0.19, 0.05] −.09
95% highest density intervals are presented in brackets.

between the human decision maker and the algorithm, these AEAPs when they also themselves believe that stakeholders
effects were small. Thus, there is some positive news here: Our appreciate their use of AEAPs more than their use of
results suggest that decision makers should be much less wor­ a prescribed algorithm. This was investigated in Study 2.
ried about negative stakeholder evaluations when using AEAPs. Importantly, we also investigated whether AEAPs result in
However, decision makers may be only more likely to use higher predictive validity than holistic prediction.
252 M. NEUMANN ET AL.

Study 2 makers’ intentions to use AEAPs. Our conceptual model is


shown in Figure 4. We hypothesized that:
Attribution theory addresses how people explain events and
thus provided a useful framework for explaining under which Hypothesis 5a: Decision-makers’ meta-beliefs about their
circumstances stakeholders perceive decision makers as the locus of causality will be more positive when holistic prediction
cause of a hiring decision. However, this theory does not explain is used, compared to when a prescribed algorithm is used.
decision-makers’ intentions to use algorithms as a consequence
of their beliefs about stakeholders’ perceptions. A useful frame­ For brevity, we only formulated a hypothesis for locus of caus­
work to cover the relation between these “meta-perceptions” ality. We formulated the same hypotheses for personal control
and algorithm use is the theory of planned behaviour (Ajzen, (H6a) and competence perceptions (H7a).
1991), which suggests that subjective norms (i.e., the expecta­
Hypothesis 5b: Decision-makers’ meta-beliefs about their
tions from others to perform a certain behaviour) determine
locus of causality will be more positive when AEAPs are used,
one’s intentions to perform such behaviour.4 Therefore, we
compared to when a prescribed algorithm is used.
drew in Study 2, in addition to attribution theory, primarily on
the theory of planned behaviour to explain decision-makers’ We formulated the same hypotheses for personal control (H6b)
intentions to use different decision-making procedures. These and competence perceptions (H7b).
theories are compatible in that attribution theory suggests the
constructs on which stakeholders evaluate decision makers (i.e., Hypothesis 8: Enhanced autonomy will have a positive indir­
locus of causality, personal control, perceived competence), ect effect on use intentions via its effect on locus of causality,
while the theory of planned behaviour is used to argue how personal control, competence perceptions, and TOTU.
decision makers think stakeholders evaluate them on these
constructs. In the context of personnel selection, the theory of
planned behaviour has been used to explain decision-makers’ Predictive validity
(un)willingness to use standardized information collection (e.g.,
Even if decision makers were more likely to use AEAPs, this
structured interviews, van der Zee et al., 2002), and standardized
should still result in more valid predictions than pure holis­
information combination (mechanical prediction, Nolan et al., tic predictions to improve decision making in practice.
2016). In the present study, the theory suggests that decision Holistically adjusting predictions from a prescribed algo­
makers should be more likely to use autonomy-enhancing than rithm decreases predictive validity compared to strict algo­
autonomy-restricting prediction procedures if they recognize rithm use (Hoffman et al., 2017; Neumann et al., 2022). Yet,
that the former results in more positive evaluations by stake­ adjusted predictions can still be more valid than holistic
holders. We expected that decision makers would recognize that predictions because decision makers anchor on the algo­
stakeholders ascribe more credit to them when using autonomy- rithm’s predictions (Dietvorst et al., 2018; Neumann et al.,
enhancing (vs. autonomy-restricting) algorithmic procedures, 2022). Moreover, Neumann et al. (2022) found mixed evi­
dence that self-designed algorithms resulted in slightly
and hence perceive them as more competent. When decision
more valid predictions than pure holistic prediction, even
makers believe that using AEAPs would increase stakeholders’
when participants knew predictor validities. When predic­
perceptions of their locus of causality and control, and hence tors with at least some and similar validities are used, even
result in higher perceived competence, decision-makers’ TOTU algorithms with random predictor weights resulted in more
(i.e., the worries that their professional judgement is needless) valid predictions than holistic prediction, due to increased
should be reduced. This, in turn, should increase decision- judgement consistency (Yu & Kuncel, 2020). However, when

Figure 4. Conceptual model in study 2. The hypothesized conceptual model. The dashed line indicates that we dropped TOTU in our exploratory analyses because it
showed correlations with other variables that made the hypothesized mediation model implausible. a AEAPs = autonomy-enhancing algorithmic procedures. AEAPs
was coded as 1 and prescribed algorithm as 0. b TOTU = threat of technological unemployment.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 253

predictors substantially differ in validity, accurate weighting interview, conducted by a line manager. All applicants were
becomes important as well, as overweighting invalid pre­ hired, except for those who obtained the lowest possible interview
dictors would decrease predictive validity. The prediction rating. Supervisors had rated the applicants’ overall performance
task in this study contained predictors with near-zero to approximately three months after they were hired. It was not
moderate validity. We should be able to assume that hiring feasible to let participants make performance predictions for all
professionals have some knowledge of predictor validities 236 applicants in this study. Therefore, we selected 40 applicants
and hence would choose predictor weights that are more such that the absolute differences in correlations between the
appropriate than random weights. We expected that holi­ correlation matrices of the reduced and the full dataset did not
stically adjusted predictions from a prescribed algorithm differ by more than .015 (see the R script “Applicant selection.R” on
and predictions from a self-designed algorithm would be OSF). Table 6 shows the descriptive statistics and correlations of
more valid than pure holistic predictions. the variables of the reduced applicant dataset.

Hypothesis 9: AEAPs will result in more valid performance


Design and procedure
predictions than pure holistic predictions.
Participants indicated their primary work activities (see
Appendix B) and the amount of hiring decisions they make
per year. Afterwards, participants read that they would predict
Method the performance of 40 real applicants. Specifically, participants
Participants read that an airline was opening new offices and filled vacan­
cies for the job of a ticket agent, and that applicants completed
Employed people with hiring experience were recruited via a cognitive ability test, a conscientiousness questionnaire, and
Amazon MTurk who received 3.65$ for their participation. As an unstructured interview that was conducted by a line man­
pre-registered, we excluded participants who (1) were not ager. Furthermore, participants read that all applicants except
involved in making judgements or decisions at work (2) did for those with the lowest possible interview rating were hired.
not make at least one hiring decision per year and (3) failed at The cognitive ability test score was presented as a percentage
least one attention check (see Appendix B). After the first day of (proportion of correctly answered items on the test). The con­
data collection, we noticed that some participants completed scientiousness score was presented on a five-point scale, up to
the study in such a short time that made it unlikely they read one decimal. The interview score was an integer and ranged
the instructions properly and responded sincerely. Therefore, from two to five. We henceforth call these scores predictor
we updated our pre-registration on OSF by adding that we scores. Lastly, participants learned that they could earn up to
would also exclude participants who respond in less than 10 five additional dollars for making accurate performance predic­
minutes. In line with the pre-registered sample size, we tions (for details see the study material on OSF).
retained usable data from 269 participants. The mean age Again, we used a one-factorial (hiring approach) between-
was M = 36.0 (SD = 10.2) and 59% was male. Most participants subjects design with four levels (holistic, holistic adjustment, self-
were U.S. citizens (62%). Other participants were Indian (32%) designed algorithm, prescribed algorithm). Participants were
or had other non-European nationalities (3%) or European randomly assigned to one of four conditions, and the hiring
nationalities (4%). Almost all participants worked full time approach to making performance predictions was varied (see
(96%) and held a Bachelor or Master degree (90%). Lastly, Table 7 for condition-specific instructions). In the holistic condi­
59% indicated making six or more hiring decisions per year. tion, participants saw each applicant’s predictor scores and pre­
The median study completion time was 22 minutes. The ethics dicted their job performance up to one decimal on a five-point
committee of the Heymans institute for psychological research scale. Applicants were presented to participants one by one.
approved this study (code: PSY-2021-S-0197). In the holistic adjustment condition, participants saw each
applicant’s predictor scores and the algorithm’s prediction.
To construct the algorithm, the standardized predictor scores
Materials
were multiplied by regression weights that we derived from
Applicant data the meta-analytic correlation matrix presented in the top
Participants made performance predictions based on archival panel of Table 3 in Cortina et al. (2000), by using the
applicant data (N = 236) originally used in Kausel et al. (2016). An setCor function from the psych package in R (see the
airline company was hiring ticket agents using a cognitive ability R script “Applicant selection.R” on OSF). To aid participants’
test, a conscientiousness questionnaire, and an unstructured interpretation of the predictions, we rescaled the algorithm’s

Table 6. Descriptive statistics and Correlations between variables of the reduced applicant dataset.

Variable M SD 1. 2. 3.
1. Cognitive ability 0.68 0.14 -
2. Conscientiousness 3.92 0.42 .11 -
3. Interview 2.83 0.98 .11 .02 -
4. Job performance 3.15 0.40 .31 .23 .04
N = 40.
254 M. NEUMANN ET AL.

Table 7. Descriptions of the hiring approach per condition in study 2.

Condition Description
Holistic Please review the scores of the applicants and predict based on your intuition and expertise how the applicants will perform on the job.
Holistic adjustment In this approach, we show you, for each applicant, the performance prediction of a decision rule. An assessment professional designed this
decision-rule based on numerous empirical research findings. The decision rule looks like this:
Decision-rule prediction = General mental ability test score × 53 + Conscientiousness questionnaire score × 28 + Interview rating × 19.
As you can see above, the decision rule assigns the following weights to the information:

General Mental Ability Test: 53


Conscientiousness Questionnaire: 28
Interview Rating: 19

The chosen weights correspond to the importance assigned to each piece of the information. So, based on the numerous empirical research
findings, the assessment professional decided to give most weight to the general mental ability test score, and least weight to the interview
rating. Above you can see that the scores and ratings of an applicant were multiplied by weights (*) and then added up (+).
The higher the decision-rule prediction, the more likely it is that the applicant shows good job performance.
You can use the decision-rule prediction in different ways. You can review the scores of the applicants and consult the prediction of the
decision rule shown below and decide based on your intuition and expertise how you would like to combine the information to predict the
applicants’ job performance. If you only want to use the exact decision-rule prediction, you simply reproduce this prediction as your answer.
Although the decision rule will probably not result in perfect performance predictions, research showed that adjusting decision-rule
predictions based on one’s intuition and expertise usually decreases prediction accuracy.
Self-designed In this approach, we want you to design a decision rule. You will estimate how important each piece of information is for later job success and
algorithm create a rule that will be used to make performance predictions for all applicants. The higher the decision-rule prediction, the more likely it
is that the applicant shows good job performance.
Before you get descriptions of the applicants, you will decide how much weight you will assign to each piece of information: General Mental
Ability Test, Conscientiousness Questionnaire, and Interview Rating
The chosen weights correspond to the importance you assign to each piece of information. To illustrate this, if you think interview ratings are
most important in predicting job performance you should weigh it more heavily than the other information. Conversely, if you think the
general mental ability test or conscientiousness questionnaire will be most important in predicting later job success you should assign
more weight to them respectively.
On the basis of your designed decision rule an overall score for each applicant will be calculated. Below you can see that the scores and ratings
of an applicant are multiplied by weights (*) and then added up (+). The higher the overall score of the applicant the higher the chance of
job success.
The decision rule looks like this:
Decision-rule prediction = General mental ability test score * your chosen weight 1 + Conscientiousness questionnaire score * your chosen
weight 2 + Interview rating * your chosen weight 3
You will see the predictions made based on your decision rule. But once you designed the rule you cannot adjust its predictions.
Although your decision rule will probably not result in perfect performance predictions, research showed that using such a rule results in more
accurate performance predictions than using one’s intuition and expertise.
Prescribed In this approach, we show you, for each applicant, the performance prediction of a decision rule. An assessment professional designed this
algorithm decision rule based on numerous empirical research findings. The decision rule looks like this:
Decision-rule prediction = General mental ability test score × 53 + Conscientiousness questionnaire score × 28 + Interview rating × 19.
As you can see above, the decision rule assigns the following weights to the pieces of information:

General Mental Ability Test: 53


Conscientiousness Questionnaire: 28
Interview Rating: 19

The chosen weights correspond to the importance assigned to each piece of information. So, based on the numerous empirical research
findings, the assessment professional decided to give most weight to the general mental ability test score, and least weight to the interview
rating. Above you can see that the scores and ratings of an applicant were multiplied by weights (*) and then added up (+).
The higher the decision-rule prediction, the more likely it is that the applicant shows good job performance.
You will see the predictions made based on the decision rule. The applicants’ predicted job performance will be shown to you and you cannot
adjust this prediction based on your intuition and expertise. The decision rule will predict the performance of all 40 applicants. To
demonstrate how the rule operates, we will show you 10 performance predictions which you cannot adjust.
Although the decision rule will probably not result in perfect performance predictions, research showed that adjusting decision-rule
predictions based on one’s intuition and expertise usually decreases prediction accuracy.

predictions to a five-point scale (1 = very bad job perfor­ algorithm’s predictions, we presented participants with 10
mance, 5 = very good job performance) using min-max scaling randomly selected applicants, their predictor scores, and the
(see p. 114, formula 3.8 in Han et al., 2011). Participants rescaled self-designed algorithm’s predictions (1 = very bad
could holistically adjust the algorithm’s predictions if they job performance, 5 = very good job performance). Participants
wanted. could not holistically adjust these predictions.
In the self-designed algorithm condition, participants chose In the prescribed algorithm condition, participants had no
percentage weights, which were multiplied by the respective autonomy as they had to strictly use the algorithm’s predic­
standardized predictor scores. The resulting scores were tions. The algorithm was constructed in the same way as in
summed to yield an applicant’s job performance prediction. the holistic adjustment condition. As in the self-designed
Participants were told that their self-designed algorithm algorithm condition, participants saw the algorithm’s predic­
would be used to predict the performance of all 40 appli­ tions and the respective predictor scores for 10 applicants but
cants. To let participants experience their self-designed were told that the algorithm was used to predict the job
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 255

performance of all 40 applicants. In the holistic adjustment or the predictions that resulted from participants’ self-designed
and the prescribed algorithm condition, participants saw the algorithms (self-designed algorithm condition).
algorithm’s predictor weights as percentage weights before
they made (or saw) any predictions. So, in all conditions
where an algorithm was used (all except the holistic condi­ Results
tion), participants saw the algorithm’s predictor weights and Again, we treated locus of causality and personal control as
were told that the higher the algorithm’s prediction, the separate variables, based on the results of confirmatory factor
more likely it is that the applicant shows good job perfor­ analyses (see Supplement S2). Table 8 shows the correlations
mance, according to the algorithm. Importantly, in all condi­ between the measures. Figure 5 shows the observed means per
tions except the holistic condition, participants were warned condition and measure. Table 9 shows the exact observed
that although the algorithm’s predictions would not be per­ means and standard deviations. Furthermore, Figure 6 shows
fect, research showed that holistically adjusting the algo­ the posterior distributions with 95% HDIs for the relevant
rithm’s predictions usually decreases prediction accuracy. effects. Again, AEAPs resulted in very similar perceptions and
After the prediction task, participants imagined that the hence were combined in the following analyses.
airline used the same approach to make performance predic­
tions and hired the applicants with the highest scores. They
also imagined being the manager who delivered the decision Autonomy (manipulation check)
to applicants. Then, participants filled in the dependent mea­ Decision makers experienced much more autonomy when
sures in random order. using AEAPs than when using a prescribed algorithm (d =
1.10, 95% HDI [0.75, 1.45]). This suggests that our manipulation
Measures was successful.

Attitudinal measures
All measures were assessed on a 5-point Likert scale (1 = strongly Locus of causality, personal control, and perceived
disagree, 5 = strongly agree, see OSF). We used the same measures competence
for perceived autonomy (manipulation check, α = .90), locus of Decision makers had higher meta-beliefs about their responsi­
causality (α = .75), and personal control (α = .77) as in Study 1, bility for and control of the hiring decision when they made
but slightly adapted the wording to account for the decision- holistic predictions compared to when they used a prescribed
maker’s perspective. Furthermore, we used Nolan’s et al. (2016) algorithm (d = 0.43, 95% HDI [0.03, 0.83] and d = 0.92, 95% HDI
five-item TOTU scale (α = .93) and an adapted version of Nolan’s [0.51, 1.33], respectively). They also had much higher meta-
(2013) six-item scale to measure competence perceptions (α beliefs about their perceived competence when they made
= .89). Lastly, we used an adapted three-item scale from Nolan holistic predictions compared to when they used a prescribed
and Highhouse (2014) to measure participants’ intention to use algorithm (d = 0.94, 95% HDI [0.54, 1.35]). Therefore, hypoth­
hiring approaches. The internal consistency of this scale was quite eses 5a, 6a, and 7a were fully supported.
low (α = .59) and could be improved (α = .85) by removing Furthermore, decision makers had higher meta-beliefs
the second, reverse-coded item, which read “If I could use about their responsibility for making a hiring decision when
a different approach to make hiring decisions, I would”. We report they used AEAPs rather than a prescribed algorithm (d = 0.38,
results based on the original three-item scale and the scale after 95% HDI [0.05, 0.72]). Similarly, they had much higher meta-
removing the second item. Removing the second item did not beliefs about their control of the hiring decision and their
qualitatively change the study’s conclusions. perceived competence when they used AEAPs rather than
a prescribed algorithm (d = 0.96, 95% HDI [0.62, 1.31] and d =
Predictive validity 0.77, 95% HDI [0.43, 1.11], respectively). So, hypotheses 5b, 6b,
For each participant, we calculated the Fisher’s z transformed and 7b were also fully supported.
correlation between their 40 performance predictions and the Decision makers also had very similar meta-beliefs about
applicants’ observed performance. So, the performance predic­ their responsibility for and control of the hiring decision when
tions were participants’ holistic predictions (holistic condition), making holistic predictions and when using AEAPs (d = 0.05,
holistically adjusted predictions (holistic adjustment condition), 95% HDI [−0.25, 0.36] and d = −0.04, 95% HDI [−0.35, 0.27],

Table 8. Correlations between measures in study 2.

Measure 1. 2. 3. 4. 5. 6.
1. Autonomya - .50 .65 .70 −.04 .61
2. Locus of causality .50 - .61 .52 −.04 .39
3. Personal control .65 .61 - .64 .06 .46
4. Perceived competence .70 .52 .64 - −.22 .58
5. TOTUb −.04 −.04 .06 −.22 - −.01
6. Use intentions .45 .29 .31 .46 −.23 -
Correlations below the diagonal are based on the original three-item use intentions scale. Correlations above the diagonal are based on the use intentions scale with
the second item removed. Correlations were calculated across conditions. N = 269. aManipulation check. bThreat of technological unemployment.
256 M. NEUMANN ET AL.

Figure 5. Observed means and raw data per condition and measure in study 2. Error bars represent 95% confidence intervals. Some jittering in the x-axis direction was
added to the plot to improve readability.

respectively), although small effects in both directions were likely to use holistic prediction than a prescribed algorithm (d
also plausible. We found similar results for perceived compe­ = 0.38, 95% HDI [−0.02, 0.78]), while the difference in intentions
tence (d = 0.16, 95% HDI [−0.15, 0.47]). to use holistic prediction and AEAPs was negligible (d = 0.10,
95% HDI [−0.20, 0.41]), although holistic prediction was slightly
favoured. Removing the second item of the use intentions scale
Mediation resulted in slightly larger effect sizes. Decision makers were
We hypothesized that the effect of enhanced autonomy on use more likely to use holistic prediction (d = 0.41, 95% HDI [0.01,
intentions would be mediated by locus of causality and perso­ 0.82]) and AEAPs (d = 0.44, 95% HDI [0.10, 0.77]), compared to
nal control (in parallel), and competence perceptions and TOTU a prescribed algorithm. The difference between holistic predic­
(serially). The correlations between the variables of the pre- tion and AEAPs was near zero (d = −0.02, 95% HDI [−0.33, 0.29]).
registered mediation model are shown in Table 10. The correla­ Given that a mediation model including TOTU was implau­
tions between perceived competence and TOTU, and TOTU and sible, we did not formally investigate hypothesis 8, but instead
use intentions were small (r = −.21 and r = −.19). Therefore, the explored a mediation model without TOTU. Specifically, we
pre-registered mediation model was not plausible. estimated the indirect effect of enhanced autonomy on use
Furthermore, we did not find enough evidence that decision intentions via locus of causality and personal control (in paral­
makers perceived less TOTU when using AEAPs rather than lel), and competence perceptions. The results are shown in
a prescribed algorithm (d = 0.14, 95% HDI [−0.19, 0.48]). The Table 11. We found some evidence that enhanced autonomy
observed effect was in the opposite direction than was was related to use intentions via personal control and compe­
expected. However, decision makers were somewhat more tence perceptions (partially standardized effect size = 0.18).
likely to use AEAPs than a prescribed algorithm (d = 0.28, 95% Furthermore, zero was a credible value for the direct effect of
HDI [−0.06, 0.61]). Additionally, decision makers were more enhanced autonomy on use intentions. Yet, we did not find

Table 9. Observed means and standard deviations per condition and measure in study 2.

Autonomya Locus of causality Personal control Perceived competence TOTUb Use intentions Use intentions*

Condition M SD M SD M SD M SD M SD M SD M SD
Holistic 4.23 0.52 4.22 0.56 4.07 0.68 4.39 0.54 3.02 1.34 3.48 0.81 4.02 0.87
Holistic adjustment 4.15 0.69 4.15 0.55 4.11 0.75 4.27 0.60 3.31 1.12 3.42 0.61 4.08 0.76
Self-designed algorithm 4.05 0.68 4.22 0.64 4.10 0.67 4.30 0.49 3.00 1.16 3.38 0.73 4.00 0.92
Prescribed algorithm 3.25 1.24 3.93 1.04 3.33 1.20 3.77 1.08 2.99 1.11 3.19 0.89 3.65 1.08
a
Manipulation check. bThreat of technological unemployment. * Means and standard deviations of the use intentions scale with the second item removed.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 257

Figure 6. Posterior distributions for the hypothesized and exploratory effects in study 2. The grey area under each curve depicts the 95% most probable effect sizes,
given the model and the data. H = holistic prediction, AEAPs = autonomy-enhancing algorithmic procedures, and PA = prescribed algorithm.

Table 10. Correlations between variables included in the pre-registered mediation model in study 2.

Measure 1. 2. 3. 4. 5. 6.
1. Enhanced autonomya - .14 .35 .29 .06 .17
2. Locus of causality .14 - .59 .50 −.02 .36
3. Personal control .35 .59 - .64 .08 .45
4. Perceived competence .29 .50 .64 - −.21 .56
5. TOTUb .06 −.02 .08 −.21 - −.01
6. Use intentions .12 .25 .28 .43 −.19 -
Correlations below the diagonal are based on the original three-item use intentions scale. Correlations above the diagonal are based on the use intentions scale with
the second item removed. aEnhanced autonomy was coded 1 for the two autonomy-enhancing conditions (holistic adjustment and self-designed algorithm
condition) and 0 for the prescribed algorithm condition. Therefore, data from the holistic condition was disregarded, and hence, N = 215. bThreat of technological
unemployment.

Table 11. Exploratory mediation results from study 2.

Unstandardized effect Partially standardized effect


Paths estimate size
Enhanced autonomy → Locus of causality → Competence perceptions → Use intentions 0.02 [0.00, 0.05] .03
(indirect) *(0.03 [0.00, .07]) *(0.03)
Enhanced autonomy → Personal control → Competence perceptions → Use intentions (indirect) 0.13 [0.06, 0.21] .18
*(0.17 [0.08, 0.27]) *(0.19)
Enhanced autonomy → Use intentions (direct) −0.01 [−0.25, 0.23] −.02
*(−0.02 [−0.29, 0.25]) *(−0.02)
95% highest density intervals are presented in brackets. * The effects when the second item of the use intentions scale is removed are presented in parentheses.

much evidence that enhanced autonomy affected use inten­ from the prescribed algorithm were most valid (r = .36). In
tions via locus of causality and competence perceptions (par­ support of hypothesis 9, AEAPs resulted in much higher pre­
tially standardized effect size = 0.03). dictive validity (�r = .28 in the holistic adjustment condition and
�r = .31 in the self-designed algorithm condition, respectively)
than holistic predictions (d = 1.10, 95% HDI [0.77, 1.42]).
Predictive validity
Importantly, these results suggest that AEAPs resulted in only
Figure 7 shows the mean validity coefficient per condition. somewhat less valid predictions than the prescribed algorithm.
Holistic predictions were least valid (�r = .16), while predictions Interestingly, the variability in decision-makers’ validity was
258 M. NEUMANN ET AL.

Figure 7. Mean predictive validity per condition. Each raw data point represents the correlation between a participant’s predicted and observed performance (N = 40).
No raw data is plotted in the prescribed algorithm condition because participants could not influence the algorithm’s prediction in any way. Therefore, no variance was
observed in this condition by design. Error bars represent 95% confidence intervals. Some jittering in the x-axis direction was added to the plot to improve readability.

highest in the holistic condition and lowest in the self-designed algorithm condition than in the holistic condition, which
algorithm condition. In other words, the less decision makers likely explains the markedly higher predictive validity.
were directly involved in combining information about indivi­
dual applicants, the less it mattered who combined the infor­
Discussion
mation. Moreover, just using the cognitive ability test in a top-
down manner would have resulted in higher predictive validity In line with our hypotheses and earlier findings (Nolan
than the validity that was observed in the holistic and holistic et al., 2016), decision makers believed stakeholders would
perceive them as more competent, more responsible for,
adjustment condition (cf. Table 6 and Figure 7).
and more in control of the hiring decision when making
holistic predictions rather than using a prescribed algo­
rithm. Extending earlier findings, we also found that deci­
Judgement consistency and relative weights analysis sion-makers’ meta-beliefs were much higher when they
We also explored how, and how consistently decision used AEAPs rather than a prescribed algorithm. Relatedly,
makers weighted the information, by regressing the partici­ decision-makers’ meta-beliefs were very similar when mak­
pant’s (or the algorithm’s) predictions on the three predic­ ing holistic predictions compared to using AEAPs.
tors. The resulting multiple R reflects how consistent Importantly, we also found that using AEAPs substantially
decision makers apply their own judgement policy and is increased predictive validity compared to holistic
also called response consistency or cognitive control prediction.
(Hammond & Summers, 1972). We also calculated relative
weights (Grömping, 2006), which reflect how important
a predictor was in forming the participant’s (or algorithm’s) General Discussion
prediction. Table 12 shows participants’ mean judgement One reason why decision makers rarely use algorithms in
consistency and mean relative weights for each of the three hiring decisions is that they worry about negative stake­
predictors per condition. Providing decision makers with holder evaluations (Nolan et al., 2016). In line with existing
a valid anchor (the algorithm’s prediction) increased judge­ research (Gonzalez et al., 2022; Kares et al., 2023), we
ment consistency compared to the holistic condition, and expected and found that the use of AEAPs improved stake­
resulted in relative weights that were similar to relative holder evaluations and decision-makers’ meta-beliefs, com­
weights obtained from the meta-analytically derived algo­ pared to the use of a prescribed algorithm. Furthermore,
rithm. Furthermore, despite similar relative weights, judge­ both stakeholder evaluations and decision-makers’ meta-
ment consistency was much higher in the self-designed beliefs were similar for holistic prediction and AEAPs.

Table 12. Mean judgement consistency and mean relative predictor weights per condition.

Condition Judgement consistency RW a Cognitive ability RW Conscientiousness RW Interview


Holistic .63 .47 .23 .30
Holistic adjustment .79 .63 .20 .17
Self-designed algorithm 1 .44 .30 .26
Prescribed algorithm 1 .67 .21 .12
a
RW = Relative weight. Relative weights per condition sum to 100% and were computed using the relaimpo package in R (Grömping, 2006).
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 259

Extending earlier findings, we also showed that stakeholder mixed findings (Balzer et al., 1989; Neumann et al., 2022).
and decision-makers’ perceptions were relatively indifferent Relatedly, Kahneman et al. (2021) recently suggested to delay
to how autonomy was enhanced (algorithm design vs. con­ the time between collecting information (e.g., interview
trol over a prescribed algorithm’s predictions). Lastly, we impressions) and making hiring decisions, and to present infor­
showed that AEAPs substantially improved predictive validity mation sequentially to decision makers. We encourage
compared to holistic prediction. Yet, neither holistic predic­ researchers to test whether these interventions can improve
tion nor AEAPs resulted in higher predictive validity than the decision making in a similar manner as enhancing autonomy in
meta-analytically derived algorithm. In summary, our results algorithmic decision making.
on predictive validity are largely in line with findings by We presented decision makers with quantitative test scores
Dietvorst et al. (2018) and Neumann et al. (2022), and gen­ and a quantified rating from an interview conducted by some­
erally suggest that enhancing autonomy in algorithmic deci­ one else. If decision makers had conducted interviews them­
sion making effectively encourages algorithm use and selves, they may have overweighted such qualitative and “rich”
improves validity compared to holistic prediction. information compared to a quantified interview rating from
Intriguingly, our results also showed that strict top-down someone else (Kuncel, 2018; Niessen et al., 2022). Therefore,
use of the single best predictor would have resulted in more future research could vary decision-makers’ involvement in the
valid predictions (r = .31) than holistic prediction (r = .16). This data collection and use applicant data that contains qualitative
demonstrates again that “less can be more”; invalid information and quantitative information. Furthermore, it should be inves­
can distract decision makers from valid information and tigated whether decision makers are less likely to holistically
thereby reduce predictive validity, which is also known as the adjust predictions from a self-designed rather than a prescribed
“dilution effect” (Niessen et al., 2022). Relatedly, conscientious­ algorithm. Decision makers may trust their own algorithm more
ness and the unstructured interview had only slight incremen­ than an algorithm presented by others (Kuncel, 2018).
tal validity over and above cognitive ability when this Relatedly, characteristics such as an algorithm’s transparency
information was combined through our meta-analytic, pre­ and complexity, and decision-makers’ understanding of an
scribed algorithm, but not when combined holistically or algorithm’s design could affect algorithm use (Langer &
through a self-designed algorithm. This finding underscores Landers, 2021). Although algorithms with equal and even ran­
that results from studies on combinations of predictors, in dom but consistent predictor weights can outperform holistic
which predictors are typically combined using optimal regres­ predictions when predictors are similar in validity (Yu & Kuncel,
sion (e.g., Schmidt & Hunter, 1998), may not generalize to 2020), it is plausible that decision makers would holistically
practice (Murphy, 2019; Sackett et al., 2017), where optimal adjust predictions from such simple and unfancy algorithms
regression or other, suboptimal mechanical combination meth­ more, compared to more complex algorithms that may, for
ods are rarely used to make hiring decisions (Neumann et al., example, take predictor interactions and non-linearities into
2023; Ryan & Sackett, 1987). Hence, we should think about account. Moreover, future studies may present information on
personnel selection as judgement and decision science the same scale. We deliberately presented information on dif­
(Meijer & Niessen, 2022), and investigate not only the validity ferent scales because this is representative of practice. Yet, this
of (optimal combinations of) test scores, as is now often the may have made consistent information combination in the
case in research, but also the validity of our decisions. This holistic and holistic adjustment condition more difficult.
changing attitude would greatly increase the usefulness of
personnel selection research for practice.
Theoretical and practical implications
In line with the discounting principle from attribution theory
Limitations and directions for future research
and earlier research (Nolan et al., 2016), stakeholders ascribed
We conducted experiments to test the effect of the hiring more credit to decision makers for holistic compared to
approach on stakeholder perceptions, decision-makers’ meta- algorithmic hiring decisions. Importantly, decision makers
beliefs, and predictive validity. Ideally, our findings would be who used AEAPs were still largely seen as responsible for
replicated in the field, although this is very challenging, and in control of the hiring decision by stakeholders, which
because very few organizations contain the required number was related to higher competence perceptions. We also
of assessors for these studies. Furthermore, we relied on self- found support for the hypothesis that decision-makers’ meta-
reported intentions to use a hiring approach for future deci­ beliefs relate to their intentions to use algorithmic decision-
sions. Ideally, we would also observe actual behaviour (i.e., making procedures, as proposed by the theory of planned
implementation of AEAPs). Lastly, we note that our cross- behaviour. Decision-makers’ concerns about negative stake­
sectional design limits the conclusions that can be drawn holder perceptions were largely alleviated when they
from our mediation analyses (Spector, 2019). retained autonomy in algorithmic decision making.
As our results showed, retaining decision-makers’ autonomy Moreover, our results advance our understanding of whether
in algorithmic decision making can be a promising way to stakeholders and decision makers are sensitive to different
improve decision making. However, interventions that primar­ autonomy-enhancing features (Langer & Landers, 2021).
ily focus on improving the validity of pure holistic predictions Design control (by choosing an algorithm’s predictor
may also be fruitful. Such interventions may include providing weights) resulted in very similar perceptions by both stake­
decision makers with knowledge on the validity of predictors, holders and decision makers as outcome control (adjusting
although research on this intervention produced somewhat a prescribed algorithm’s predictions holistically). We
260 M. NEUMANN ET AL.

tentatively conclude that enhancing autonomy in the first ORCID


place seems more important for the acceptance of algorith­
Marvin Neumann http://orcid.org/0000-0003-0193-8159
mic decision making than how exactly autonomy is A. Susan M. Niessen http://orcid.org/0000-0001-8249-9295
enhanced. Although we used a very simple and transparent Maximilian Linde http://orcid.org/0000-0001-8421-090X
algorithm in our studies, our results may also have implica­ Jorge N. Tendeiro http://orcid.org/0000-0003-1660-3642
tions for the design and acceptance of more complex AI Rob R. Meijer http://orcid.org/0000-0001-5368-992X
systems. Lastly, another theoretical implication concerns the
underlying mechanisms that explain the relationship References
between enhanced autonomy and use intentions. We found
some support for the indirect effect of autonomy on use Ægisdóttir, S., White, M. J., Spengler, P. M., Maugherman, A. S.,
intentions, via personal control and competence perceptions Anderson, L. A., Cook, R. S., Nichols, C. N., Lampropoulos, G. K.,
Walker, B. S., Cohen, G., & Rush, J. D. (2006). The meta-analysis of clinical
(but not TOTU). Yet, we caution not to over interpret our judgment project: Fifty-six years of accumulated research on clinical
partially exploratory mediation results. versus statistical prediction. The Counseling Psychologist, 34(3),
Our results also have important practical implications. To 341–382. https://doi.org/10.1177/0011000005285875
improve decision making, AEAPs should not only improve Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior
perceptions, but first and foremost result in higher predictive and Human Decision Processes, 50(2), 179–211. https://doi.org/10.1016/
0749-5978(91)90020-T
validity than pure holistic prediction. Based on our results, we Arkes, H. R., Shaffer, V. A., & Medow, M. A. (2007). Patients derogate
recommend decision makers to combine valid predictors physicians who use a computer-assisted diagnostic aid. Medical
consistently according to a simple algorithm. In line with Decision Making, 27(2), 189–202. https://doi.org/10.1177/
Dietvorst et al. (2018), holistically adjusting predictions from 0272989X06297391
a prescribed algorithm also improved predictive validity and Balzer, W. K., Doherty, M. E., & O’Connor, R., Jr. (1989). Effects of cognitive
feedback on performance. Psychological Bulletin, 106(3), 410–433.
stakeholder perceptions. Yet, we found that this approach https://doi.org/10.1037/0033-2909.106.3.410
resulted in much more variability in predictive validity com­ Burton, J. W., Stein, M. K., & Jensen, T. B. (2020). A systematic review of
pared to the self-designed algorithm condition, in which algorithm aversion in augmented decision making. Journal of Behavioral
holistic adjustment was not possible. Thus, in an ideal Decision Making, 33(2), 220–239. https://doi.org/10.1002/bdm.2155
world, decision makers should design an algorithm and fol­ Castelo, N., Bos, M. W., & Lehmann, D. R. (2019). Task-dependent algorithm
aversion. Journal of Marketing Research, 56(5), 809–825. https://doi.org/
low it consistently. 10.1177/0022243719851788
Cortina, J. M., Goldstein, N. B., Payne, S. C., Davison, H. K., & Gilliland, S. W.
(2000). The incremental validity of interview scores over and above
Conclusion cognitive ability and conscientiousness scores. Personnel Psychology, 53
Using AEAPs improves perceptions and results in substantially (2), 325–351. https://doi.org/10.1111/j.1744-6570.2000.tb00204.x
Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making.
higher validity than holistic prediction. Or, in other words, Psychological Bulletin, 81(2), 95–106. https://doi.org/10.1037/h0037613
allowing room to “add an egg” to a prescribed recipe makes Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial
the baker feel and be more appreciated, more likely to use judgment. Science, 243(4899), 1668–1674. https://doi.org/10.1126/
a recipe in the first place, and it makes the cake taste better science.2648573
than one made without a recipe! Diab, D. L., Pui, S., Yankelevich, M., & Highhouse, S. (2011). Lay perceptions
of selection decision aids in US and non-US samples. International
Journal of Selection and Assessment, 19(2), 209–216. https://doi.org/10.
Notes 1111/j.1468-2389.2011.00548.x
Dietvorst, B. J., Simmons, J. P., & Massey, C. (2018). Overcoming algorithm
1. We note that decision makers could also be stakeholders them­ aversion: People will use imperfect algorithms if they can (even slightly)
selves, such as when observing other decision makers making hiring modify them. Management Science, 64(3), 1155–1170. https://doi.org/10.
decisions. 1287/mnsc.2016.2643
2. Stability, which is another key dimension, is the extent to which Frey, F. E., & Tropp, L. R. (2006). Being seen as individuals versus as group
an outcome is perceived as stable or consistent over time and members: Extending research on metaperception to intergroup
across trials. We had no theoretical argument how it would relate contexts. Personality and Social Psychology Review, 10(3), 265–280.
to stakeholder perceptions and only measured it for theoretical https://doi.org/10.1207/s15327957pspr1003_5
completeness. Gonzalez, M. F., Liu, W., Shirase, L., Tomczak, D. L., Lobbe, C. E.,
3. The word “positive” reflects that stakeholders perceive decision- Justenhoven, R., & Martin, N. R. (2022). Allying with AI? Reactions toward
makers’ locus of causality to be higher (i.e., more internal) when human-based, AI/ML-based, and augmented hiring processes.
holistic prediction is used, compared to when a prescribed algo­ Computers in Human Behavior, 130, 107179. https://doi.org/10.1016/j.
rithm is used, in line with earlier research (Nolan et al., 2016). chb.2022.107179
Similarly, the word “positive” reflects higher personal control and Grömping, U. (2006). Relative importance for linear regression in R: The
higher perceived competence in the following hypotheses. package relaimpo. Journal of Statistical Software, 17(1), 1–27. https://doi.
4. Although the theory of planned behaviour also suggests attitude org/10.18637/jss.v017.i01
and control as two other independent determinants of intentions Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical
(van der Zee et al., 2002), we solely focused on subjective norms in versus mechanical prediction: A meta-analysis. Psychological Assessment,
this study because this antecedent is theoretically most relevant for 12(1), 19–30. https://doi.org/10.1037/1040-3590.12.1.19
stakeholder perceptions and decision-makers’ meta-beliefs. Guay, J. P., & Parent, G. (2018). Broken legs, clinical overrides, and recidivism
risk: An analysis of decisions to adjust risk levels with the LS/CMI.
Criminal Justice and Behavior, 45(1), 82–100. https://doi.org/10.1177/
Disclosure statement 0093854817719482
Hammond, K. R., & Summers, D. A. (1972). Cognitive control. Psychological
No potential conflict of interest was reported by the author(s). Review, 79(1), 58–67. https://doi.org/10.1037/h0031851
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 261

Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques: education and incentives to improve decision making. Journal of
Vol. 3rd ed. Morgan Kaufmann. Experimental Psychology: Applied, 28(1), 166–178. https://doi.org/10.
Highhouse, S., & Brooks, M. E. (2023). Improving workplace judgments by 1037/xap0000372
reducing noise: Lessons learned from a century of selection research. Neumann, M., Niessen, A. S. M., Hurks, P. M., & Meijer, R. R. (2023). Holistic
Annual Review of Organizational Psychology and Organizational Behavior, and mechanical combination in psychological assessment: Why algo­
10(1), 519–533. https://doi.org/10.1146/annurev-orgpsych-120920- rithms are underutilized and what is needed to increase their use.
050708 International Journal of Selection and Assessment, 31(2), 267–285.
Hoffman, M., Kahn, L. B., & Li, D. (2017). Discretion in hiring. The Quarterly Journal https://doi.org/10.1111/ijsa.12416
of Economics, 133(2), 765–800. https://doi.org/10.1093/qje/qjx042 Neumann, M., Niessen, A. S. M., & Meijer, R. R. (2021). Implementing
Kahneman, D., & Klein, G. (2009). Conditions for intuitive expertise: A failure evidence-based assessment and selection in organizations: A review
to disagree. American Psychologist, 64(6), 515–526. https://doi.org/10. and an agenda for future research. Organizational Psychology Review,
1037/a0016755 11(3), 205–239. https://doi.org/10.1177/2041386620983419
Kahneman, D., Sibony, O., & Sunstein, C. R. (2021). Noise: A flaw in human Neumann, M., Niessen, A. S. M., Tendeiro, J. N., & Meijer, R. R. (2022). The
judgment. Little, Brown Spark. autonomy-validity dilemma in mechanical prediction procedures: The
Kares, F., König, C. J., Bergs, R., Protzel, C., & Langer, M. (2023). Trust in hybrid quest for a compromise. Journal of Behavioral Decision Making, 35(4),
human-automated decision-support. International Journal of Selection e2270. https://doi.org/10.1002/bdm.2270
and Assessment, 31(3), 388–402. https://doi.org/10.1111/ijsa.12423 Newman, D. T., Fast, N. J., & Harmon, D. J. (2020). When eliminating bias isn’t
Kausel, E. E., Culbertson, S. S., & Madrid, H. P. (2016). Overconfidence in fair: Algorithmic reductionism and procedural justice in human resource
personnel selection: When and why unstructured interview information decisions. Organizational Behavior and Human Decision Processes, 160,
can hurt hiring decisions. Organizational Behavior and Human Decision 149–167. https://doi.org/10.1016/j.obhdp.2020.03.008
Processes, 137, 27–44. https://doi.org/10.1016/j.obhdp.2016.07.005 Niessen, A. S. M., Kausel, E. E., & Neumann, M. (2022). Using narratives and
Kelley, H. H. (1973). The processes of causal attribution. American numbers in performance prediction: Attitudes, confidence, and validity.
Psychologist, 28(2), 107–128. https://doi.org/10.1037/h0034225 International Journal of Selection and Assessment, 30(2), 216–229. https://
Kruschke, J. K. (2015). Doing bayesian data analysis: A tutorial with R, JAGS, doi.org/10.1111/IJSA.12364
and Stan (2nd ed.). Academic Press. https://doi.org/10.1016/B978-0-12- Nolan, K. P. (2013). Basic psychological need fulfillment and user resistance to
405888-0.00008-8 objective and analytical decision-making practices in employee selection
Kuncel, N. R. (2018). Judgment and decision making in staffing research and (Doctoral dissertation).
practice. In D. S. Ones, N. Anderson, C. Viswesvaran, & H. K. Sinangil Nolan, K. P., Carter, N. T., & Dalal, D. K. (2016). Threat of technological
(Eds.), The sage handbook of industrial, work and organizational psychol­ unemployment: Are hiring managers discounted for using standardized
ogy (2nd ed., pp. 474–487). SAGE Publications Ltd. https://doi.org/10. employee selection practices? Personnel Assessment and Decisions, 2(1),
4135/9781473914940.n17 30–47. https://doi.org/10.25035/pad.2016.004
Kuncel, N. R., Klieger, D. M., Connelly, B. S., & Ones, D. S. (2013). Mechanical Nolan, K. P., Dalal, D. K., & Carter, N. (2020). Threat of technological unem­
versus clinical data combination in selection and admissions decisions: A ployment, use intentions, and the promotion of structured interviews in
meta-analysis. Journal of Applied Psychology, 98(6), 1060–1072. https:// personnel selection. Personnel Assessment and Decisions, 6(2), 38–53.
doi.org/10.1037/a0034156 https://doi.org/10.25035/pad.2020.02.006
Langer, M., & Landers, R. N. (2021). The future of artificial intelligence at work: Nolan, K. P., & Highhouse, S. (2014). Need for autonomy and resistance to
A review on effects of decision automation and augmentation on workers standardized employee selection practices. Human Performance, 27(4),
targeted by algorithms and third-party observers. Computers in Human 328–346. https://doi.org/10.1080/08959285.2014.929691
Behavior, 123, 106878. https://doi.org/10.1016/J.CHB.2021.106878 Preacher, K. J., & Kelley, K. (2011). Effect size measures for mediation models:
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis Quantitative strategies for communicating indirect effects. Psychological
and a review of the evidence. University of Minnesota Press. https://doi. Methods, 16(2), 93–115. https://doi.org/10.1037/a0022658
org/10.1037/11281-000 Ryan, A. M., & Sackett, P. R. (1987). A survey of individual assessment
Meehl, P. E. (1986). Causes and effects of my disturbing little book. Journal practices by I/O psychologists. Personnel Psychology, 40(3), 455–488.
of Personality Assessment, 50(3), 370–375. https://doi.org/10.1207/ https://doi.org/10.1111/j.1744-6570.1987.tb00610.x
s15327752jpa5003_6 Sackett, P. R., Dahlke, J. A., Shewach, O. R., & Kuncel, N. R. (2017). Effects of
Meijer, R. R., Neumann, M., Hemker, B. T., & Niessen, A. S. M. (2020). A tutorial on predictor weighting methods on incremental validity. Journal of Applied
mechanical decision-making for personnel and educational selection. Psychology, 102(10), 1421–1434. https://doi.org/10.1037/apl0000235
Frontiers in Psychology, 10. Article 3002. https://doi.org/10.3389/fpsyg.2019. Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection
03002 methods in personnel psychology: Practical and theoretical implications
Meijer, R. R., & Niessen, A. S. M. (2022). Personnel selection as judgment and of 85 years of research findings. Psychological Bulletin, 124(2), 262–274.
decision science: An introduction. International Journal of Selection and https://doi.org/10.1037/0033-2909.124.2.262
Assessment, 30(2), 193–194. https://doi.org/10.1111/IJSA.12377 Spector, P. E. (2019). Do not cross me: Optimizing the use of cross-sectional
Meyer, J. P., & Koelbl, S. L. M. (1982). Students’ test performances: designs. Journal of Business and Psychology, 34(2), 125–137. https://doi.
Dimensionality of causal attributions. Personality and Social Psychology org/10.1007/s10869-018-09613-8
Bulletin, 8(1), 31–36. https://doi.org/10.1177/014616728281005 van der Zee, K. I., Bakker, A. B., & Bakker, P. (2002). Why are structured
Mikkelson, D. (2008, January 31). Requiring an Egg Made Instant Cake Mixes interviews so rarely used in personnel selection? Journal of Applied
Sell? Snopes. https://www.snopes.com/fact-check/something-eggstra/ Psychology, 87(1), 176–184. https://doi.org/10.1037/0021-9010.87.1.176
Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can decision Wesche, J. S., Hennig, F., Kollhed, C. S., Quade, J., Kluge, S., & Sonderegger, A.
making be improved? Perspectives on Psychological Science, 4(4), (2022). People’s reactions to decisions by human vs. algorithmic deci­
379–383. https://doi.org/10.1111/j.1745-6924.2009.01142.x sion-makers: The role of explanations and type of selection tests.
Murphy, K. R. (2019). Understanding how and why adding valid predictors can European Journal of Work and Organizational Psychology, 1–12. https://
decrease the validity of selection composites: A generalization of Sackett, doi.org/10.1080/1359432X.2022.2132940
Dahlke, shewach, and Kuncel (2017). International Journal of Selection and Yu, M. C., & Kuncel, N. R. (2020). Pushing the limits for judgmental consis­
Assessment, 27(3), 249–255. https://doi.org/10.1111/ijsa.12253 tency: Comparing random weighting schemes with expert judgments.
Murphy, K. R. (2021). In praise of table 1: The importance of making better Personnel Assessment and Decisions, 6(2), 1–10. https://doi.org/10.25035/
use of descriptive statistics. Industrial and Organizational Psychology: pad.2020.02.002
Perspectives on Science and Practice, 14(4), 461–477. https://doi.org/10. Yu, M. C., & Kuncel, N. R. (2022). Testing the value of expert insight:
1017/iop.2021.90 Comparing local versus general expert judgment models. International
Neumann, M., Hengeveld, M., Niessen, A. S. M., Tendeiro, J. N., & Meijer, R. R. Journal of Selection and Assessment, 30(2), 202–215. https://doi.org/10.
(2022). Education increases decision-rule use: An investigation of 1111/IJSA.12356
262 M. NEUMANN ET AL.

Appendix A. Attention checks in Study 1


Attention check 1: In all conditions, participants read that the hiring manager used a cognitive ability test, a score from a personality questionnaire, and
a rating from a traditional hiring interview to evaluate candidates. Upon reading the condition-specific hiring approach (see Table 1), participants were
presented with four assessment instruments (cognitive ability test, personality questionnaire, traditional hiring interview, curriculum vitae) and had to
indicate on the next slide which instruments the manager did not use to evaluate candidates (multiple answer options possible). Only participants who
solely selected “curriculum vitae” were included in the study.
Attention check 2: An attention check item was included in the perceived competence scale, and participants were asked to respond with “somewhat
disagree”. Participants who did not respond with “somewhat disagree” were excluded.
Attention check 3: Participants who did not respond to the statement “I have 17 fingers” with “no” were excluded.

Appendix B. Attention checks and other Exclusion Criteria in Study 2


We presented participants with 12 work activities (in random order) that we obtained from the O*NET (https://www.onetonline.org/find/
descriptor/browse/Work_Activities/). Participants had to indicate up to five work activities that they primarily engage in at work. Participants
were excluded if none of the options “Staffing organizational units”, “Making decisions and solving problems”, or “Judging the qualities of
things, services, or people” was chosen. Furthermore, we excluded participants who indicated not making a single hiring decision per year.
Moreover, we excluded participants who failed at least one of two attention checks. Participants were presented with predictor scores from
two applicants and indicated whether applicant A outperformed applicant B on all predictors (true or false). Participants who failed this
attention check were prompted to review the information again and to answer the question. Participants who failed the attention check
a second time were excluded. In the second attention check, participants were presented with descriptions of the prediction tasks of the four
conditions, and they had to indicate what type of prediction task they had completed. Participants who did not choose the correct description
were excluded.

You might also like