0% found this document useful (0 votes)

7 views

Adding An Egg in Algorithmic Decision Making Improving Stakeholder and User Perceptions and Predictive Validity by Enhancing Autonomy

Uploaded by

Ishan Kakkar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Adding An Egg in Algorithmic Decision Making Improving Stakeholder and User Perceptions and Predictive Validity by Enhancing Autonomy

Uploaded by

Ishan Kakkar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

European Journal of Work and Organizational Psychology

ISSN: (Print) (Online) Journal homepage: www.tandfonline.com/journals/pewo20

“Adding an egg” in algorithmic decision making:

improving stakeholder and user perceptions, and
predictive validity by enhancing autonomy

Marvin Neumann, A. Susan M. Niessen, Maximilian Linde, Jorge N. Tendeiro

& Rob R. Meijer

To cite this article: Marvin Neumann, A. Susan M. Niessen, Maximilian Linde, Jorge N. Tendeiro
& Rob R. Meijer (2024) “Adding an egg” in algorithmic decision making: improving stakeholder
and user perceptions, and predictive validity by enhancing autonomy, European Journal of
Work and Organizational Psychology, 33:3, 245-262, DOI: 10.1080/1359432X.2023.2260540

To link to this article: https://doi.org/10.1080/1359432X.2023.2260540

© 2023 The Author(s). Published by Informa

UK Limited, trading as Taylor & Francis
Group.

View supplementary material

Published online: 26 Sep 2023.

Submit your article to this journal

Article views: 1016

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=pewo20
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY
2024, VOL. 33, NO. 3, 245–262
https://doi.org/10.1080/1359432X.2023.2260540

“Adding an egg” in algorithmic decision making: improving stakeholder and user

perceptions, and predictive validity by enhancing autonomy
a b b c b
Marvin Neumann , A. Susan M. Niessen , Maximilian Linde , Jorge N. Tendeiro and Rob R. Meijer
a
Department of Experimental and Applied Psychology, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam,
The Netherlands; bDepartment of Psychometrics and Statistics, Faculty of Behavioral and Social Sciences, University of Groningen, Groningen, The
Netherlands; cOffice of Research and Academia-Government-Community Collaboration, Education, Research Center for Artificial Intelligence and
Data Innovation, Hiroshima University, Hiroshima, Japan

ABSTRACT ARTICLE HISTORY

Decision makers often combine multiple pieces of information to make performance predictions and Received 23 March 2023
hiring decisions. More valid predictions are made when information is combined algorithmically Accepted 13 September 2023
(mechanical prediction) rather than in the decision-maker’s mind (holistic prediction). Yet, decision KEYWORDS
makers rarely use algorithms in practice. One reason is that decision makers are worried about negative Algorithm use; personnel
evaluations from other stakeholders such as colleagues when using algorithms. We hypothesized that selection; decision making;
such stakeholders evaluate decision makers more positively when they use autonomy-enhancing algo human-automation
rithmic procedures (AEAPs, holistically adjust predictions from a prescribed algorithm or self-design an collaboration; decision-
algorithm), than when they use a prescribed algorithm. Relatedly, we hypothesized that decision makers support
who use AEAPs are less worried about negative stakeholder evaluations, and more likely to use algo
rithms in performance predictions. In Study 1 (N = 582), stakeholders evaluated decision makers more
positively when they used AEAPs rather than a prescribed algorithm. In Study 2 (N = 269), decision makers
were less worried about negative stakeholder evaluations and more likely to use AEAPs compared to
a prescribed algorithm. Importantly, using AEAPs also resulted in substantially higher predictive validity
than holistic prediction. We recommend the use of self-designed algorithms to improve perceptions and
validity.

Decision makers such as psychologists and managers often use in these inherently uncertain contexts where outcome feed
multiple pieces of information (e.g., test scores and interview back is often substantially delayed or absent (Kahneman &
ratings) to make performance predictions and hiring decisions. Klein, 2009). The inconsistency of holistic predictions is
Such information is typically combined in the decision-maker’s a major reason why algorithms result in higher predictive valid
mind (Neumann et al., 2023), which is known as holistic (or ity; even algorithms with random positive weights that are
clinical, expert) prediction. Alternatively, information could be consistently applied across applicants can outperform holistic
combined through an algorithm, which is known as mechanical prediction (Dawes & Corrigan, 1974; Yu & Kuncel, 2020).
(or statistical, actuarial, algorithmic) prediction (Meehl, 1954). Relatedly, Yu and Kuncel (2022) found no evidence that deci
Algorithms can take the form of very simple rules, although sion makers can accurately tailor their performance predictions
more advanced machine learning methods are possible to specific organizational contexts. Although tailoring to the
(Highhouse & Brooks, 2023). A simple algorithm would be to context could theoretically improve predictive validity, equally
give equal weight to a test score and an interview rating, and to weighting the available information and hence ignoring orga
hire the candidate(s) with the highest score(s). Predictor nizational context outperformed “tailored” holistic predictions.
weights could also be different, and could be chosen based So, even simple algorithms result in better performance pre
on decision-makers’ judgement or statistical analyses (Kuncel, dictions than holistic predictions.
2018). Despite these robust research findings, decision makers
A robust finding is that more valid performance predictions rarely use algorithms for subjective tasks typically performed
are made when information is combined algorithmically by humans like hiring (Castelo et al., 2019; Neumann et al.,
(Kuncel et al., 2013). The superiority of mechanical over holistic 2023). This results in suboptimal and untransparent hiring
prediction has been demonstrated in various other contexts, decisions (Meijer et al., 2020). Since algorithm use is
including medical diagnoses, parole decisions, and clinical a promising strategy to improve decision making (Milkman
treatment selection (Ægisdóttir et al., 2006; Dawes et al., 1989; et al., 2009), the aim of this paper was to identify algorithmic
Guay & Parent, 2018). This finding also holds when considering decision-making procedures that are liked by stakeholders,
experts’ holistic predictions (Grove et al., 2000; Highhouse & liked and used by decision makers, and that result in higher
Brooks, 2023; Yu & Kuncel, 2020), as expertise is hard to develop predictive validity than holistic prediction (Kuncel, 2018;

CONTACT Marvin Neumann [email protected]

Supplemental data for this article can be accessed online at https://doi.org/10.1080/1359432X.2023.2260540
© 2023 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/),
which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. The
terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent.
246 M. NEUMANN ET AL.

Neumann et al., 2021). We defined decision makers as people While teaching a workshop about these autonomy-enhancing
who make hiring decisions themselves (e.g., hiring managers, algorithmic procedures, an attendee, Georgios Fraros, noted that
HR professionals), based on their own judgement or by using the problem and solution presented was in line with a story
algorithms (Langer & Landers, 2021). In contrast, we defined about the introduction of an instant cake mix that only required
stakeholders as people who may observe a decision maker adding water. This time-saving innovation didn’t sell nearly as
making a hiring decision, and who may or may not be directly well as the proud manufacturers had expected. It turned out that
affected by the decision (e.g., higher management, direct col presenting cake as homemade after just adding water and stir
leagues, team members of the future employee, or (internal) ring made people feel guilty. They wanted to put in more effort
applicants).1 and make it their own; they wanted to have to add an egg. While
One important reason why algorithms are rarely used is the this entertaining story may be a myth (Mikkelson, 2008), our
“threat of technological unemployment” (TOTU, Meehl, 1986, expectations are also in line with recent findings by Gonzalez
p. 374): Decision makers worry that they provide less value to et al. (2022). They showed that applicants’ perceptions were
their organization when using algorithms because other stake similar when predictions were made holistically or when using
holders may not attribute the hiring process and outcome to an autonomy-enhancing algorithm (holistically adjusting predic
their holistic judgement anymore (Neumann et al., 2023; Nolan tions from a prescribed algorithm), but more positive compared
et al., 2016, 2020). Indeed, stakeholders give decision makers to using a prescribed algorithm. Relatedly, Kares et al. (2023)
less credit for their hiring decisions (i.e., perceive them as less found that decision makers showed similar trust in holistic pre
responsible for and less in control of hiring decisions) when diction compared to an autonomy-enhancing algorithmic
they use algorithms rather than their holistic judgement (Nolan approach, and generally higher trust in an autonomy-
et al., 2016). Similarly, patients perceived physicians as less enhancing algorithmic approach than a prescribed algorithm.
cable and competent when they made algorithmic (vs. holistic) These results provide some first evidence that AEAPs can
diagnoses (Arkes et al., 2007). Furthermore, lay people perceive improve both stakeholders’ and decision-makers’ perceptions,
algorithm use in hiring as unprofessional and impersonal (Diab compared to prescribed algorithms that restrict autonomy.
et al., 2011), and they have less trust in algorithmic than in
human hiring decisions (Wesche et al., 2022). Importantly,
Nolan et al. (2016) showed that decision makers themselves Contributions and aims of the present studies
correctly believe that stakeholders give them less credit for
their hiring decisions when using algorithms rather than their As Langer and Landers (2021) discuss, existing studies in which
holistic judgement. These beliefs reduced decision-makers’ AEAPs have been investigated have usually focused on either
intentions to use algorithms, and this relationship was stakeholders (e.g., Newman et al., 2020) or decision makers (e.g.,
mediated by TOTU (Nolan et al., 2016, 2020). What people Nolan & Highhouse, 2014), and, importantly, almost all studies
believe others think of them is also called meta perceptions solely focused on perceptions, without investigating actual beha
(Frey & Tropp, 2006). In line with this literature, we henceforth viour, such as effects of algorithm use on performance prediction
call decision-makers’ beliefs about how stakeholders evaluate accuracy (for exceptions in educational selection, see Dietvorst
them decision-makers’ meta-beliefs. et al., 2018; Neumann et al., 2022). Furthermore, there is a large
So, both stakeholders (Langer & Landers, 2021) and decision range of possibilities how autonomy in algorithmic decision
makers (Burton et al., 2020) have more negative attitudes making could be enhanced. However, existing studies typically
towards algorithmic compared to holistic hiring decisions. do not distinguish between different autonomy-enhancing
This is a problem because these perceptions prevent decision approaches (Burton et al., 2020; Langer & Landers, 2021), and
makers from using algorithms, which could improve the quality often introduce such approaches abstractly as “a combination
of hiring decisions (Kuncel et al., 2013). To solve this problem, between a human and an algorithm” (Gonzalez et al., 2022).
interventions need to be identified that can improve stake The aims of the present studies were to uniquely contribute
holder perceptions and decision-makers’ meta-beliefs, and to the literature by investigating whether AEAPs could improve
that still result in better predictions than holistic prediction. both stakeholder perceptions and decision-makers’ meta-
We expected that a useful intervention is enhancing decision- beliefs. Importantly, we also investigated whether AEAPs result
makers’ autonomy in algorithmic decision making. Autonomy in higher predictive validity compared to holistic prediction, in
could be enhanced in at least two ways, compared to a fully the context of personnel selection. This is most important to
automated approach where decision makers would have to use investigate because improving perceptions is of no avail if
a prescribed algorithm consistently and hence have no auton AEAPs do not also improve validity. Yet, such an investigation
omy at all. One approach is allowing decision makers to adjust is lacking in the literature, despite various calls for it (Burton
a prescribed algorithm’s job performance predictions holisti et al., 2020; Highhouse & Brooks, 2023; Kuncel, 2018; Langer &
cally, upwards or downwards (Dietvorst et al., 2018; Kuncel, Landers, 2021). Furthermore, we contribute to the literature by
2018). Another approach is to have decision makers design investigating the effect of two different ways to enhance auton
their own algorithm first, by explicitly choosing predictor omy in algorithmic decision making (holistically adjusting pre
weights (Neumann et al., 2022; Nolan & Highhouse, 2014), dictions from a prescribed algorithm or self-designing an
and then use the resulting predictions consistently, without algorithm) on perceptions and validity. Moreover, we contri
the opportunity to adjust its predictions. We henceforth refer bute to the theoretical understanding of improving percep
to both approaches as autonomy-enhancing algorithmic proce tions of algorithmic decision making through AEAPs, by
dures (AEAPs). investigating the underlying mechanisms of these perceptions.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 247

Theoretical background and hypothesis development the decision process, than when they have no autonomy (i.e.,
they must use a prescribed algorithm). Furthermore, stake
Stakeholders should attribute hiring outcomes less to
holders may ascribe the cause of a hiring decision more to
a decision maker who uses a prescribed algorithm to make
the decision maker (internal factor) than to an algorithm (exter
predictions than to a decision maker who makes holistic pre
nal factor) when decision makers retain autonomy. This is
dictions. This assumption is in line with attribution theory
expected because hiring decisions would not exclusively be
(Kelley, 1973), which addresses how people explain events
determined by an external decision aid (i.e., a prescribed algo
and perceive causality. Two key dimensions of causal attribu
rithm), but partly by the decision-maker’s internal cognitive
tion are locus of causality and personal control (Meyer & Koelbl,
processes. Based on this reasoning, stakeholders should evalu
1982).2 Locus of causality is the extent to which stakeholders
ate decision makers more positively when they use AEAPs
attribute a hiring decision to internal or external factors.
rather than a prescribed algorithm. We hypothesized that:
Applied to personnel selection, holistic prediction can be con
sidered an internal factor because the decision-making process
takes place in the decision-maker’s mind. In contrast, an algo
Hypothesis 1b: Stakeholders’ beliefs about decision-makers’
rithm may be considered an external factor because it consti
locus of causality will be more positive when AEAPs are used,
tutes a contextual decision aid that the decision maker uses to
compared to when a prescribed algorithm is used.
make hiring decisions. Personal control is the extent to which
stakeholders perceive a decision maker as capable of changing
For brevity, we only present this hypothesis for locus of caus
the hiring decision. Perceived personal control should be
ality. We had the same hypotheses for personal control (H2b)
higher when decision makers make holistic rather than algo
and perceived competence (H3b).
rithmic hiring decisions. Based on attribution theory, Nolan
We expected that stakeholders perceive decision makers
et al. (2016) indeed found that stakeholders perceived decision
who use AEAPs as more in control of and more responsible
makers as less responsible for and less in control of the hiring
for the hiring decision than decision makers who use
decision when they used an algorithm rather than their holistic
a prescribed algorithm. Since people commonly believe that
judgement. Accordingly, we had the following hypotheses,
the combination of algorithmic and human judgement results
which replicate Nolan et al. (2016).
in the best decisions (Neumann et al., 2023), it may also be
expected that stakeholders’ positive beliefs about decision-
makers’ locus of causality and personal control translate into
Hypothesis 1a: Stakeholders’ beliefs about decision-makers’
more positive competence perceptions. Our conceptual model
locus of causality will be more positive3 when holistic predic
is shown in Figure 1. We hypothesized that:
tion is used, compared to when a prescribed algorithm is used.

Hypothesis 2a: Stakeholders’ beliefs about decision-maker’s

Hypothesis 4: Enhanced autonomy in making performance
personal control will be more positive when holistic prediction
predictions will have a positive indirect effect on stakeholders’
is used, compared to when a prescribed algorithm is used.
competence perceptions of decision makers via its effect on
locus of causality and personal control.
Many people suggest that practicing holistic prediction is what
makes decision makers be seen as truly professional or compe
Holistic prediction is popular in practice. Therefore, it is also
tent (Meehl, 1986; Neumann et al., 2023). Indeed, Diab et al.
relevant to investigate whether stakeholders perceive the use
(2011) showed that lay people considered the use of holistic
of AEAPs in a similar manner as holistic prediction. If this were
prediction as moderately more professional than mechanical
not the case, AEAPs may not be a viable method to improve
prediction when scores from interviews (but not standardized
stakeholder perceptions. Therefore, we also explored how sta
tests) were combined. Furthermore, Nolan (2013) found that
keholders evaluate decision-makers’ use of AEAPs, compared to
decision makers felt they could better demonstrate their com
holistic prediction.
petence when making holistic rather than algorithmic deci
Pre-registrations including sample size justifications, analy
sions. Thus, it can be expected that stakeholders perceive
sis code, codebooks, study materials, and raw data for both
decision makers as more competent when they determine or
studies are available on OSF https://osf.io/brc9p/?view_only=
influence the hiring decision.
211357b3a78446078715d47a633d8219.

Hypothesis 3a: Stakeholders’ beliefs about decision-maker’s Study 1

competence will be more positive when holistic prediction is
used, compared to when a prescribed algorithm is used.
Method
Participants
Autonomy in algorithm use
We recruited 644 employees via Amazon MTurk who received
Stakeholders may perceive decision makers as having more 1$ for their participation. As pre-registered, we excluded
control over hiring decisions when they retain autonomy in participants who failed at least one of three attention checks
248 M. NEUMANN ET AL.

Figure 1. Conceptual model in study 1. a AEAPs = autonomy-enhancing algorithmic procedures. AEAPs was coded as 1 and prescribed algorithm as 0.

(see Appendix A) or finished the tasks in less than a minute. prediction from a prescribed algorithm but could not influ
After inspecting the data, we also excluded participants who ence the algorithm’s weights or the resulting predictions.
reported impossible values (e.g., longer role tenure than After reading the condition-specific descriptions, participants
organizational tenure or an incorrect MTurk ID). We retained completed the dependent measures in random order.
usable data from 582 participants. The mean age was M =
37.1 (SD = 10.6) and 55% was male. Most participants were
U.S. citizens (77%). Other participants were Indian (17%), Measures
Brazilian (3%), or had other non-EU nationalities (3%) or EU All measures were assessed on a 5-point Likert scale (1 =
nationalities (<0.01%). Furthermore, most participants were strongly disagree, 5 = strongly agree, see OSF). We used an
employed full-time (92%) and held a Bachelor or Master adapted six-item scale from Nolan and Highhouse (2014) as
degree (86%). The median study completion time was 5 a manipulation check, to measure the extent to which partici
minutes. The ethics committee of the Heymans institute for pants perceived the manager as having autonomy in making
psychological research approved this study (code: PSY-2021- hiring decisions (α = .88). Furthermore, we used Nolan’s et al.
S-0500). (2016) three-item scales to measure personal control and locus
of causality (α = .84 and α = .83, respectively). Lastly, we used an
adapted version of Nolan’s (2013) six-item scale to measure
Design and procedure perceived competence (α = .89).
We employed a one-factorial (hiring approach) between-subjects
design with four levels (holistic, holistic adjustment, self-designed
Analytical approach
algorithm, prescribed algorithm). Participants read an introduction
of a hypothetical hiring scenario, similar to Nolan et al. (2016): We applied Bayesian parameter estimation because estimat
“Imagine yourself in the following situation . . . The person respon ing the magnitude of and uncertainty around the differ
sible for hiring decisions at your company (e.g., a manager) just hired ences of interest was the most relevant to this study. We
a new employee to fill an open position. Please read the description report posterior parameter estimates (standardized mean
of how this decision was made and answer the questions that differences, denoted as d) with corresponding 95% highest
follow”. Afterwards, participants were randomly assigned to one density intervals (HDIs). An HDI allows for the intuitive
of four conditions and read a condition-specific description of the interpretation that there is a 95% probability that the cor
manager’s hiring approach (see Table 1). responding true parameter lies within the HDI, conditional
We varied how the manager combined a cognitive ability on the model, the priors, and the observed data (Kruschke,
test score, a score from a personality questionnaire, and 2015). We initialized a model in Stan (Stan Development
a rating from a traditional hiring interview to evaluate candi Team, 2022) with chains = 7, thinning = 5, warmup samples
dates. In the holistic condition, the manager used their intui = 5000, and actual samples = 50000, resulting in a total of 7
tion and expertise to combine the information and make (chains) * 50000 (samples)/5 (thinning) = 70000 samples.
a hiring decision. In the holistic adjustment condition, the Furthermore, we used improper (very uninformative) priors
manager saw the prediction from a prescribed algorithm for each parameter. For mediation results, we report the
and could adjust the algorithm’s prediction using their intui partially standardized effect size (Preacher & Kelley, 2011),
tion and expertise. In the self-designed algorithm condition, which is the ratio of the (in)direct effect to the standard
the manager designed an algorithm by choosing predictor deviation of Y. It can be interpreted as the number of
weights that reflected the importance of the information. In standard deviations of Y that two groups differ on average
the prescribed algorithm condition, the manager saw the as a result of the (in)direct mechanism.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 249

Table 1. Descriptions of the hiring approach per condition in study 1.

Condition Description of the hiring approach

Holistic The person responsible for hiring decisions used a cognitive ability test score, a score from a personality questionnaire, and a rating from
a traditional hiring interview to evaluate candidates. After this information had been obtained, the person responsible for hiring decisions
used his/her intuition and expertise to combine the information and make a hiring decision.
Holistic adjustment The person responsible for hiring decisions used a cognitive ability test score, a score from a personality questionnaire, and a rating from
a traditional hiring interview to evaluate candidates. After this information had been obtained, the information was combined into an
overall score according to the following decision rule: Applicant overall score = cognitive ability test score*weight1 + personality
questionnaire score*weight2 + traditional hiring interview rating*weight3. So, the scores and ratings of an applicant were multiplied by
weights (*) and then added up (+). The weights were chosen by an external assessment professional and reflected the relative importance
he/she gave to the cognitive ability test, the personality questionnaire, and the traditional hiring interview, respectively. The person
responsible for hiring decisions was shown the applicants’ overall scores and could use his/her intuition and expertise to adjust these
overall scores if desired. Eventually, the applicant with the highest overall score was hired.
Self-designed The person responsible for hiring decisions used a cognitive ability test score, a score from a personality questionnaire, and a rating from
algorithm a traditional hiring interview to evaluate candidates. After this information had been obtained, the information was combined into an
overall score according to the following decision rule: Applicant overall score = cognitive ability test score*weight1 + personality
questionnaire score*weight2 + traditional hiring interview rating*weight3. So, the scores and ratings of an applicant were multiplied by
weights (*) and then added up (+). The person responsible for hiring decisions chose the weights, which reflected the relative importance
he/she gave to the cognitive ability test, the personality questionnaire, and the traditional hiring interview, respectively. Eventually, the
applicant with the highest overall score was hired.
Prescribed The person responsible for hiring decisions used a cognitive ability test score, a score from a personality questionnaire, and a rating from
algorithm a traditional hiring interview to evaluate candidates. After this information had been obtained, the information was combined into an
overall score according to the following decision rule: Applicant overall score = cognitive ability test score*weight1 + personality
questionnaire score*weight2 + traditional hiring interview rating*weight3. So, the scores and ratings of an applicant were multiplied by
weights (*) and then added up (+). The weights were chosen by an external assessment professional and reflected the relative importance
he/she gave to the cognitive ability test, the personality questionnaire, and the traditional hiring interview, respectively. The person
responsible for hiring decisions had no influence on the decision-rule weights or the resulting applicant overall score. Eventually, the
applicant with the highest overall score was hired.

Results Locus of causality, personal control, and perceived

competence
Based on the results of a confirmatory factor analysis, Nolan
et al. (2016) combined the measures locus of causality and Stakeholders perceived managers who made holistic predic
personal control into one scale. Therefore, we investigated tions as substantially more responsible for and much more in
whether such a one-factor solution would show a better fit control of the hiring decision than managers who used
than the theoretically assumed two-factor solution by compar a prescribed algorithm (d = 0.88, 95% HDI [0.64, 1.12] and d =
ing the RMSEA values of the two models. In sum, the two-factor 0.91, 95% HDI [0.67, 1.15], respectively). Furthermore, managers
solution showed a better fit (see Supplement S1). were also perceived as more competent when making holistic
Table 2 shows the correlations between the measures. predictions rather than using a prescribed algorithm (d = 0.51,
Figure 2 shows the observed means per condition and mea 95% HDI [0.27, 0.74]). Therefore, hypotheses 1a, 2a, and 3a were
sure. Table 3 shows the exact observed means and standard fully supported.
deviations. Posterior distributions with 95% HDIs for the Stakeholders also perceived managers who used AEAPs as
hypothesized effects are shown in Figure 3. As Table 3 and more responsible for and more in control of the hiring decision
Figure 2 also show, holistically adjusting predictions from than managers who used a prescribed algorithm (d = 0.58, 95%
a prescribed algorithm and self-designing an algorithm HDI [0.38, 0.78] and d = 0.76, 95% HDI [0.56, 0.96], respectively).
resulted in very similar perceptions. Therefore, these two Relatedly, managers who used AEAPs were also perceived as
AEAPs were combined in the following analyses. more competent compared to managers who used
a prescribed algorithm, although the effect size was smaller
(d = 0.30, 95% HDI [0.10, 0.50]). So, hypotheses 1b, 2b, and 3b
Autonomy (manipulation check)
were also fully supported.
As expected, stakeholders perceived managers who used To constitute a viable intervention, AEAPs should result in
AEAPs as having more autonomy than managers who used similarly positive stakeholder evaluations as holistic prediction,
a prescribed algorithm (d = 0.53, 95% HDI [0.33, 0.73]). This which is dominant in practice. Although stakeholders also per
suggests that our manipulation was successful. ceived managers who made holistic predictions as somewhat

Table 2. Correlations between measures in study 1.

Measure 1. 2. 3.
1. Autonomya - -
2. Locus of causality .76 - -
3. Personal control .74 .78 -
4. Perceived competence .68 .65 .54
Correlations were calculated across conditions. N = 582. a Manipulation check.
250 M. NEUMANN ET AL.

Figure 2. Observed means and raw data per condition and measure in study 1. Error bars represent 95% confidence intervals. Some jittering in the x-axis direction was
added to the plot to improve readability.

Table 3. Observed means and standard deviations per condition and measure in study 1.

Autonomya Locus of causality Personal control Perceived competence

Condition M SD M SD M SD M SD
Holistic 4.12 0.50 4.21 0.58 4.11 0.65 4.18 0.56
Holistic adjustment 3.93 0.71 4.02 0.71 4.02 0.75 4.09 0.69
Self-designed algorithm 3.90 0.75 3.90 0.89 3.91 0.94 3.97 0.76
Prescribed algorithm 3.50 1.05 3.46 1.14 3.27 1.22 3.82 0.80
a
Manipulation check.

more responsible for and somewhat more in control of the competent than managers who used a prescribed algorithm
hiring decision than managers who used AEAPs (d = 0.29, 95% via its effect on locus of causality (partially standardized effect
HDI [0.09, 0.50] and d = 0.15, 95% HDI [−0.05, 0.35], respec size = 0.32). Furthermore, zero was a credible value for the
tively), this difference was relatively small compared to the direct effect of enhanced autonomy on perceived competence
difference between using AEAPs and using a prescribed algo (partially standardized effect size = −0.09). We did not find
rithm. We found similar results for perceived competence (d = enough evidence that personal control mediated the relation
0.21, 95% HDI [0.00, 0.41]). between enhanced autonomy and perceived competence (par
tially standardized effect size = 0.06). In sum, we found partial
support for hypothesis 4.
Mediation
We first conducted feasibility checks as outlined in Murphy
Discussion
(2021) to assess whether mediation was plausible (cf. hypoth
esis 4). These include checking whether the independent vari In line with findings by Nolan et al. (2016), stakeholders per
able and the mediator(s) are indeed correlated with the ceived decision makers who made holistic predictions as being
dependent variable, and whether the semipartial correlation more competent, more responsible for, and more in control of
between the independent and dependent variable is close to hiring decisions than decision makers who used a prescribed
zero when controlling for the mediator(s). Table 4 shows the algorithm. Extending Nolan’s et al. (2016) findings, we found
correlations between the variables included in the proposed that stakeholders evaluated decision makers much more posi
mediation model. Although the correlation between enhanced tively when they adjusted predictions from a prescribed algo
autonomy and perceived competence was rather small (r = .13), rithm or self-designed an algorithm, compared to when they
the semi-partial correlation was close to zero when controlling strictly used a prescribed algorithm. Moreover, stakeholders
for the mediators (r = −.04). This pattern of correlations sug evaluated decision makers who used AEAPs only slightly
gests that mediation is plausible. more negatively than decision makers who made holistic pre
As Table 5 shows, we found some evidence that stake dictions. While this may suggest that stakeholders somewhat
holders perceived managers who used AEAPs as more divided their attributions of causality, control, and competence
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 251

Figure 3. Posterior distributions for the hypothesized effects in study 1. The grey area under each curve depicts the 95% most probable effect sizes, given the model
and the data. H = holistic prediction, AEAPs = autonomy-enhancing algorithmic procedures, and PA = prescribed algorithm.

Table 4. Correlations between variables included in the pre-registered mediation model in study 1.

Measure 1. 2. 3.
1. Enhanced autonomya -
2. Locus of causality .25 -
3. Personal control .31 .78 -
4. Perceived competence .13 .67 .54
a
Enhanced autonomy was coded 1 for the two autonomy-enhancing conditions (holistic adjustment and self-designed algorithm) and
0 for the prescribed algorithm condition. Therefore, data from the holistic condition was disregarded, and hence, N = 449.

Table 5. Mediation results from study 1.

Paths Unstandardized effect estimate Partially standardized effect size

Enhanced autonomy → Locus of causality → Perceived competence (indirect) 0.28 [0.15, 0.36] .32
Enhanced autonomy → Personal control → Perceived competence (indirect) 0.04 [−0.03, 0.09] .06
Enhanced autonomy → Perceived competence (direct) −0.07 [−0.19, 0.05] −.09
95% highest density intervals are presented in brackets.

between the human decision maker and the algorithm, these AEAPs when they also themselves believe that stakeholders
effects were small. Thus, there is some positive news here: Our appreciate their use of AEAPs more than their use of
results suggest that decision makers should be much less wor a prescribed algorithm. This was investigated in Study 2.
ried about negative stakeholder evaluations when using AEAPs. Importantly, we also investigated whether AEAPs result in
However, decision makers may be only more likely to use higher predictive validity than holistic prediction.
252 M. NEUMANN ET AL.

Study 2 makers’ intentions to use AEAPs. Our conceptual model is

shown in Figure 4. We hypothesized that:
Attribution theory addresses how people explain events and
thus provided a useful framework for explaining under which Hypothesis 5a: Decision-makers’ meta-beliefs about their
circumstances stakeholders perceive decision makers as the locus of causality will be more positive when holistic prediction
cause of a hiring decision. However, this theory does not explain is used, compared to when a prescribed algorithm is used.
decision-makers’ intentions to use algorithms as a consequence
of their beliefs about stakeholders’ perceptions. A useful frame For brevity, we only formulated a hypothesis for locus of caus
work to cover the relation between these “meta-perceptions” ality. We formulated the same hypotheses for personal control
and algorithm use is the theory of planned behaviour (Ajzen, (H6a) and competence perceptions (H7a).
1991), which suggests that subjective norms (i.e., the expecta
Hypothesis 5b: Decision-makers’ meta-beliefs about their
tions from others to perform a certain behaviour) determine
locus of causality will be more positive when AEAPs are used,
one’s intentions to perform such behaviour.4 Therefore, we
compared to when a prescribed algorithm is used.
drew in Study 2, in addition to attribution theory, primarily on
the theory of planned behaviour to explain decision-makers’ We formulated the same hypotheses for personal control (H6b)
intentions to use different decision-making procedures. These and competence perceptions (H7b).
theories are compatible in that attribution theory suggests the
constructs on which stakeholders evaluate decision makers (i.e., Hypothesis 8: Enhanced autonomy will have a positive indir
locus of causality, personal control, perceived competence), ect effect on use intentions via its effect on locus of causality,
while the theory of planned behaviour is used to argue how personal control, competence perceptions, and TOTU.
decision makers think stakeholders evaluate them on these
constructs. In the context of personnel selection, the theory of
planned behaviour has been used to explain decision-makers’ Predictive validity
(un)willingness to use standardized information collection (e.g.,
Even if decision makers were more likely to use AEAPs, this
structured interviews, van der Zee et al., 2002), and standardized
should still result in more valid predictions than pure holis
information combination (mechanical prediction, Nolan et al., tic predictions to improve decision making in practice.
2016). In the present study, the theory suggests that decision Holistically adjusting predictions from a prescribed algo
makers should be more likely to use autonomy-enhancing than rithm decreases predictive validity compared to strict algo
autonomy-restricting prediction procedures if they recognize rithm use (Hoffman et al., 2017; Neumann et al., 2022). Yet,
that the former results in more positive evaluations by stake adjusted predictions can still be more valid than holistic
holders. We expected that decision makers would recognize that predictions because decision makers anchor on the algo
stakeholders ascribe more credit to them when using autonomy- rithm’s predictions (Dietvorst et al., 2018; Neumann et al.,
enhancing (vs. autonomy-restricting) algorithmic procedures, 2022). Moreover, Neumann et al. (2022) found mixed evi
dence that self-designed algorithms resulted in slightly
and hence perceive them as more competent. When decision
more valid predictions than pure holistic prediction, even
makers believe that using AEAPs would increase stakeholders’
when participants knew predictor validities. When predic
perceptions of their locus of causality and control, and hence tors with at least some and similar validities are used, even
result in higher perceived competence, decision-makers’ TOTU algorithms with random predictor weights resulted in more
(i.e., the worries that their professional judgement is needless) valid predictions than holistic prediction, due to increased
should be reduced. This, in turn, should increase decision- judgement consistency (Yu & Kuncel, 2020). However, when

Figure 4. Conceptual model in study 2. The hypothesized conceptual model. The dashed line indicates that we dropped TOTU in our exploratory analyses because it
showed correlations with other variables that made the hypothesized mediation model implausible. a AEAPs = autonomy-enhancing algorithmic procedures. AEAPs
was coded as 1 and prescribed algorithm as 0. b TOTU = threat of technological unemployment.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 253

predictors substantially differ in validity, accurate weighting interview, conducted by a line manager. All applicants were
becomes important as well, as overweighting invalid pre hired, except for those who obtained the lowest possible interview
dictors would decrease predictive validity. The prediction rating. Supervisors had rated the applicants’ overall performance
task in this study contained predictors with near-zero to approximately three months after they were hired. It was not
moderate validity. We should be able to assume that hiring feasible to let participants make performance predictions for all
professionals have some knowledge of predictor validities 236 applicants in this study. Therefore, we selected 40 applicants
and hence would choose predictor weights that are more such that the absolute differences in correlations between the
appropriate than random weights. We expected that holi correlation matrices of the reduced and the full dataset did not
stically adjusted predictions from a prescribed algorithm differ by more than .015 (see the R script “Applicant selection.R” on
and predictions from a self-designed algorithm would be OSF). Table 6 shows the descriptive statistics and correlations of
more valid than pure holistic predictions. the variables of the reduced applicant dataset.

Hypothesis 9: AEAPs will result in more valid performance

Design and procedure
predictions than pure holistic predictions.
Participants indicated their primary work activities (see
Appendix B) and the amount of hiring decisions they make
per year. Afterwards, participants read that they would predict
Method the performance of 40 real applicants. Specifically, participants
Participants read that an airline was opening new offices and filled vacan
cies for the job of a ticket agent, and that applicants completed
Employed people with hiring experience were recruited via a cognitive ability test, a conscientiousness questionnaire, and
Amazon MTurk who received 3.65$ for their participation. As an unstructured interview that was conducted by a line man
pre-registered, we excluded participants who (1) were not ager. Furthermore, participants read that all applicants except
involved in making judgements or decisions at work (2) did for those with the lowest possible interview rating were hired.
not make at least one hiring decision per year and (3) failed at The cognitive ability test score was presented as a percentage
least one attention check (see Appendix B). After the first day of (proportion of correctly answered items on the test). The con
data collection, we noticed that some participants completed scientiousness score was presented on a five-point scale, up to
the study in such a short time that made it unlikely they read one decimal. The interview score was an integer and ranged
the instructions properly and responded sincerely. Therefore, from two to five. We henceforth call these scores predictor
we updated our pre-registration on OSF by adding that we scores. Lastly, participants learned that they could earn up to
would also exclude participants who respond in less than 10 five additional dollars for making accurate performance predic
minutes. In line with the pre-registered sample size, we tions (for details see the study material on OSF).
retained usable data from 269 participants. The mean age Again, we used a one-factorial (hiring approach) between-
was M = 36.0 (SD = 10.2) and 59% was male. Most participants subjects design with four levels (holistic, holistic adjustment, self-
were U.S. citizens (62%). Other participants were Indian (32%) designed algorithm, prescribed algorithm). Participants were
or had other non-European nationalities (3%) or European randomly assigned to one of four conditions, and the hiring
nationalities (4%). Almost all participants worked full time approach to making performance predictions was varied (see
(96%) and held a Bachelor or Master degree (90%). Lastly, Table 7 for condition-specific instructions). In the holistic condi
59% indicated making six or more hiring decisions per year. tion, participants saw each applicant’s predictor scores and pre
The median study completion time was 22 minutes. The ethics dicted their job performance up to one decimal on a five-point
committee of the Heymans institute for psychological research scale. Applicants were presented to participants one by one.
approved this study (code: PSY-2021-S-0197). In the holistic adjustment condition, participants saw each
applicant’s predictor scores and the algorithm’s prediction.
To construct the algorithm, the standardized predictor scores
Materials
were multiplied by regression weights that we derived from
Applicant data the meta-analytic correlation matrix presented in the top
Participants made performance predictions based on archival panel of Table 3 in Cortina et al. (2000), by using the
applicant data (N = 236) originally used in Kausel et al. (2016). An setCor function from the psych package in R (see the
airline company was hiring ticket agents using a cognitive ability R script “Applicant selection.R” on OSF). To aid participants’
test, a conscientiousness questionnaire, and an unstructured interpretation of the predictions, we rescaled the algorithm’s

Table 6. Descriptive statistics and Correlations between variables of the reduced applicant dataset.

Variable M SD 1. 2. 3.
1. Cognitive ability 0.68 0.14 -
2. Conscientiousness 3.92 0.42 .11 -
3. Interview 2.83 0.98 .11 .02 -
4. Job performance 3.15 0.40 .31 .23 .04
N = 40.
254 M. NEUMANN ET AL.

Table 7. Descriptions of the hiring approach per condition in study 2.

Condition Description
Holistic Please review the scores of the applicants and predict based on your intuition and expertise how the applicants will perform on the job.
Holistic adjustment In this approach, we show you, for each applicant, the performance prediction of a decision rule. An assessment professional designed this
decision-rule based on numerous empirical research findings. The decision rule looks like this:
Decision-rule prediction = General mental ability test score × 53 + Conscientiousness questionnaire score × 28 + Interview rating × 19.
As you can see above, the decision rule assigns the following weights to the information:

General Mental Ability Test: 53

Conscientiousness Questionnaire: 28
Interview Rating: 19

The chosen weights correspond to the importance assigned to each piece of the information. So, based on the numerous empirical research
findings, the assessment professional decided to give most weight to the general mental ability test score, and least weight to the interview
rating. Above you can see that the scores and ratings of an applicant were multiplied by weights (*) and then added up (+).
The higher the decision-rule prediction, the more likely it is that the applicant shows good job performance.
You can use the decision-rule prediction in different ways. You can review the scores of the applicants and consult the prediction of the
decision rule shown below and decide based on your intuition and expertise how you would like to combine the information to predict the
applicants’ job performance. If you only want to use the exact decision-rule prediction, you simply reproduce this prediction as your answer.
Although the decision rule will probably not result in perfect performance predictions, research showed that adjusting decision-rule
predictions based on one’s intuition and expertise usually decreases prediction accuracy.
Self-designed In this approach, we want you to design a decision rule. You will estimate how important each piece of information is for later job success and
algorithm create a rule that will be used to make performance predictions for all applicants. The higher the decision-rule prediction, the more likely it
is that the applicant shows good job performance.
Before you get descriptions of the applicants, you will decide how much weight you will assign to each piece of information: General Mental
Ability Test, Conscientiousness Questionnaire, and Interview Rating
The chosen weights correspond to the importance you assign to each piece of information. To illustrate this, if you think interview ratings are
most important in predicting job performance you should weigh it more heavily than the other information. Conversely, if you think the
general mental ability test or conscientiousness questionnaire will be most important in predicting later job success you should assign
more weight to them respectively.
On the basis of your designed decision rule an overall score for each applicant will be calculated. Below you can see that the scores and ratings
of an applicant are multiplied by weights (*) and then added up (+). The higher the overall score of the applicant the higher the chance of
job success.
The decision rule looks like this:
Decision-rule prediction = General mental ability test score * your chosen weight 1 + Conscientiousness questionnaire score * your chosen
weight 2 + Interview rating * your chosen weight 3
You will see the predictions made based on your decision rule. But once you designed the rule you cannot adjust its predictions.
Although your decision rule will probably not result in perfect performance predictions, research showed that using such a rule results in more
accurate performance predictions than using one’s intuition and expertise.
Prescribed In this approach, we show you, for each applicant, the performance prediction of a decision rule. An assessment professional designed this
algorithm decision rule based on numerous empirical research findings. The decision rule looks like this:
Decision-rule prediction = General mental ability test score × 53 + Conscientiousness questionnaire score × 28 + Interview rating × 19.
As you can see above, the decision rule assigns the following weights to the pieces of information:

General Mental Ability Test: 53

Conscientiousness Questionnaire: 28
Interview Rating: 19

The chosen weights correspond to the importance assigned to each piece of information. So, based on the numerous empirical research
findings, the assessment professional decided to give most weight to the general mental ability test score, and least weight to the interview
rating. Above you can see that the scores and ratings of an applicant were multiplied by weights (*) and then added up (+).
The higher the decision-rule prediction, the more likely it is that the applicant shows good job performance.
You will see the predictions made based on the decision rule. The applicants’ predicted job performance will be shown to you and you cannot
adjust this prediction based on your intuition and expertise. The decision rule will predict the performance of all 40 applicants. To
demonstrate how the rule operates, we will show you 10 performance predictions which you cannot adjust.
Although the decision rule will probably not result in perfect performance predictions, research showed that adjusting decision-rule
predictions based on one’s intuition and expertise usually decreases prediction accuracy.

predictions to a five-point scale (1 = very bad job perfor algorithm’s predictions, we presented participants with 10
mance, 5 = very good job performance) using min-max scaling randomly selected applicants, their predictor scores, and the
(see p. 114, formula 3.8 in Han et al., 2011). Participants rescaled self-designed algorithm’s predictions (1 = very bad
could holistically adjust the algorithm’s predictions if they job performance, 5 = very good job performance). Participants
wanted. could not holistically adjust these predictions.
In the self-designed algorithm condition, participants chose In the prescribed algorithm condition, participants had no
percentage weights, which were multiplied by the respective autonomy as they had to strictly use the algorithm’s predic
standardized predictor scores. The resulting scores were tions. The algorithm was constructed in the same way as in
summed to yield an applicant’s job performance prediction. the holistic adjustment condition. As in the self-designed
Participants were told that their self-designed algorithm algorithm condition, participants saw the algorithm’s predic
would be used to predict the performance of all 40 appli tions and the respective predictor scores for 10 applicants but
cants. To let participants experience their self-designed were told that the algorithm was used to predict the job
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 255

performance of all 40 applicants. In the holistic adjustment or the predictions that resulted from participants’ self-designed
and the prescribed algorithm condition, participants saw the algorithms (self-designed algorithm condition).
algorithm’s predictor weights as percentage weights before
they made (or saw) any predictions. So, in all conditions
where an algorithm was used (all except the holistic condi Results
tion), participants saw the algorithm’s predictor weights and Again, we treated locus of causality and personal control as
were told that the higher the algorithm’s prediction, the separate variables, based on the results of confirmatory factor
more likely it is that the applicant shows good job perfor analyses (see Supplement S2). Table 8 shows the correlations
mance, according to the algorithm. Importantly, in all condi between the measures. Figure 5 shows the observed means per
tions except the holistic condition, participants were warned condition and measure. Table 9 shows the exact observed
that although the algorithm’s predictions would not be per means and standard deviations. Furthermore, Figure 6 shows
fect, research showed that holistically adjusting the algo the posterior distributions with 95% HDIs for the relevant
rithm’s predictions usually decreases prediction accuracy. effects. Again, AEAPs resulted in very similar perceptions and
After the prediction task, participants imagined that the hence were combined in the following analyses.
airline used the same approach to make performance predic
tions and hired the applicants with the highest scores. They
also imagined being the manager who delivered the decision Autonomy (manipulation check)
to applicants. Then, participants filled in the dependent mea Decision makers experienced much more autonomy when
sures in random order. using AEAPs than when using a prescribed algorithm (d =
1.10, 95% HDI [0.75, 1.45]). This suggests that our manipulation
Measures was successful.

Attitudinal measures
All measures were assessed on a 5-point Likert scale (1 = strongly Locus of causality, personal control, and perceived
disagree, 5 = strongly agree, see OSF). We used the same measures competence
for perceived autonomy (manipulation check, α = .90), locus of Decision makers had higher meta-beliefs about their responsi
causality (α = .75), and personal control (α = .77) as in Study 1, bility for and control of the hiring decision when they made
but slightly adapted the wording to account for the decision- holistic predictions compared to when they used a prescribed
maker’s perspective. Furthermore, we used Nolan’s et al. (2016) algorithm (d = 0.43, 95% HDI [0.03, 0.83] and d = 0.92, 95% HDI
five-item TOTU scale (α = .93) and an adapted version of Nolan’s [0.51, 1.33], respectively). They also had much higher meta-
(2013) six-item scale to measure competence perceptions (α beliefs about their perceived competence when they made
= .89). Lastly, we used an adapted three-item scale from Nolan holistic predictions compared to when they used a prescribed
and Highhouse (2014) to measure participants’ intention to use algorithm (d = 0.94, 95% HDI [0.54, 1.35]). Therefore, hypoth
hiring approaches. The internal consistency of this scale was quite eses 5a, 6a, and 7a were fully supported.
low (α = .59) and could be improved (α = .85) by removing Furthermore, decision makers had higher meta-beliefs
the second, reverse-coded item, which read “If I could use about their responsibility for making a hiring decision when
a different approach to make hiring decisions, I would”. We report they used AEAPs rather than a prescribed algorithm (d = 0.38,
results based on the original three-item scale and the scale after 95% HDI [0.05, 0.72]). Similarly, they had much higher meta-
removing the second item. Removing the second item did not beliefs about their control of the hiring decision and their
qualitatively change the study’s conclusions. perceived competence when they used AEAPs rather than
a prescribed algorithm (d = 0.96, 95% HDI [0.62, 1.31] and d =
Predictive validity 0.77, 95% HDI [0.43, 1.11], respectively). So, hypotheses 5b, 6b,
For each participant, we calculated the Fisher’s z transformed and 7b were also fully supported.
correlation between their 40 performance predictions and the Decision makers also had very similar meta-beliefs about
applicants’ observed performance. So, the performance predic their responsibility for and control of the hiring decision when
tions were participants’ holistic predictions (holistic condition), making holistic predictions and when using AEAPs (d = 0.05,
holistically adjusted predictions (holistic adjustment condition), 95% HDI [−0.25, 0.36] and d = −0.04, 95% HDI [−0.35, 0.27],

Table 8. Correlations between measures in study 2.

Measure 1. 2. 3. 4. 5. 6.
1. Autonomya - .50 .65 .70 −.04 .61
2. Locus of causality .50 - .61 .52 −.04 .39
3. Personal control .65 .61 - .64 .06 .46
4. Perceived competence .70 .52 .64 - −.22 .58
5. TOTUb −.04 −.04 .06 −.22 - −.01
6. Use intentions .45 .29 .31 .46 −.23 -
Correlations below the diagonal are based on the original three-item use intentions scale. Correlations above the diagonal are based on the use intentions scale with
the second item removed. Correlations were calculated across conditions. N = 269. aManipulation check. bThreat of technological unemployment.
256 M. NEUMANN ET AL.

Figure 5. Observed means and raw data per condition and measure in study 2. Error bars represent 95% confidence intervals. Some jittering in the x-axis direction was
added to the plot to improve readability.

respectively), although small effects in both directions were likely to use holistic prediction than a prescribed algorithm (d
also plausible. We found similar results for perceived compe = 0.38, 95% HDI [−0.02, 0.78]), while the difference in intentions
tence (d = 0.16, 95% HDI [−0.15, 0.47]). to use holistic prediction and AEAPs was negligible (d = 0.10,
95% HDI [−0.20, 0.41]), although holistic prediction was slightly
favoured. Removing the second item of the use intentions scale
Mediation resulted in slightly larger effect sizes. Decision makers were
We hypothesized that the effect of enhanced autonomy on use more likely to use holistic prediction (d = 0.41, 95% HDI [0.01,
intentions would be mediated by locus of causality and perso 0.82]) and AEAPs (d = 0.44, 95% HDI [0.10, 0.77]), compared to
nal control (in parallel), and competence perceptions and TOTU a prescribed algorithm. The difference between holistic predic
(serially). The correlations between the variables of the pre- tion and AEAPs was near zero (d = −0.02, 95% HDI [−0.33, 0.29]).
registered mediation model are shown in Table 10. The correla Given that a mediation model including TOTU was implau
tions between perceived competence and TOTU, and TOTU and sible, we did not formally investigate hypothesis 8, but instead
use intentions were small (r = −.21 and r = −.19). Therefore, the explored a mediation model without TOTU. Specifically, we
pre-registered mediation model was not plausible. estimated the indirect effect of enhanced autonomy on use
Furthermore, we did not find enough evidence that decision intentions via locus of causality and personal control (in paral
makers perceived less TOTU when using AEAPs rather than lel), and competence perceptions. The results are shown in
a prescribed algorithm (d = 0.14, 95% HDI [−0.19, 0.48]). The Table 11. We found some evidence that enhanced autonomy
observed effect was in the opposite direction than was was related to use intentions via personal control and compe
expected. However, decision makers were somewhat more tence perceptions (partially standardized effect size = 0.18).
likely to use AEAPs than a prescribed algorithm (d = 0.28, 95% Furthermore, zero was a credible value for the direct effect of
HDI [−0.06, 0.61]). Additionally, decision makers were more enhanced autonomy on use intentions. Yet, we did not find

Table 9. Observed means and standard deviations per condition and measure in study 2.

Autonomya Locus of causality Personal control Perceived competence TOTUb Use intentions Use intentions*

Condition M SD M SD M SD M SD M SD M SD M SD
Holistic 4.23 0.52 4.22 0.56 4.07 0.68 4.39 0.54 3.02 1.34 3.48 0.81 4.02 0.87
Holistic adjustment 4.15 0.69 4.15 0.55 4.11 0.75 4.27 0.60 3.31 1.12 3.42 0.61 4.08 0.76
Self-designed algorithm 4.05 0.68 4.22 0.64 4.10 0.67 4.30 0.49 3.00 1.16 3.38 0.73 4.00 0.92
Prescribed algorithm 3.25 1.24 3.93 1.04 3.33 1.20 3.77 1.08 2.99 1.11 3.19 0.89 3.65 1.08
a
Manipulation check. bThreat of technological unemployment. * Means and standard deviations of the use intentions scale with the second item removed.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 257

Figure 6. Posterior distributions for the hypothesized and exploratory effects in study 2. The grey area under each curve depicts the 95% most probable effect sizes,
given the model and the data. H = holistic prediction, AEAPs = autonomy-enhancing algorithmic procedures, and PA = prescribed algorithm.

Table 10. Correlations between variables included in the pre-registered mediation model in study 2.

Measure 1. 2. 3. 4. 5. 6.
1. Enhanced autonomya - .14 .35 .29 .06 .17
2. Locus of causality .14 - .59 .50 −.02 .36
3. Personal control .35 .59 - .64 .08 .45
4. Perceived competence .29 .50 .64 - −.21 .56
5. TOTUb .06 −.02 .08 −.21 - −.01
6. Use intentions .12 .25 .28 .43 −.19 -
Correlations below the diagonal are based on the original three-item use intentions scale. Correlations above the diagonal are based on the use intentions scale with
the second item removed. aEnhanced autonomy was coded 1 for the two autonomy-enhancing conditions (holistic adjustment and self-designed algorithm
condition) and 0 for the prescribed algorithm condition. Therefore, data from the holistic condition was disregarded, and hence, N = 215. bThreat of technological
unemployment.

Table 11. Exploratory mediation results from study 2.

Unstandardized effect Partially standardized effect

Paths estimate size
Enhanced autonomy → Locus of causality → Competence perceptions → Use intentions 0.02 [0.00, 0.05] .03
(indirect) *(0.03 [0.00, .07]) *(0.03)
Enhanced autonomy → Personal control → Competence perceptions → Use intentions (indirect) 0.13 [0.06, 0.21] .18
*(0.17 [0.08, 0.27]) *(0.19)
Enhanced autonomy → Use intentions (direct) −0.01 [−0.25, 0.23] −.02
*(−0.02 [−0.29, 0.25]) *(−0.02)
95% highest density intervals are presented in brackets. * The effects when the second item of the use intentions scale is removed are presented in parentheses.

much evidence that enhanced autonomy affected use inten from the prescribed algorithm were most valid (r = .36). In
tions via locus of causality and competence perceptions (par support of hypothesis 9, AEAPs resulted in much higher pre
tially standardized effect size = 0.03). dictive validity (�r = .28 in the holistic adjustment condition and
�r = .31 in the self-designed algorithm condition, respectively)
than holistic predictions (d = 1.10, 95% HDI [0.77, 1.42]).
Predictive validity
Importantly, these results suggest that AEAPs resulted in only
Figure 7 shows the mean validity coefficient per condition. somewhat less valid predictions than the prescribed algorithm.
Holistic predictions were least valid (�r = .16), while predictions Interestingly, the variability in decision-makers’ validity was
258 M. NEUMANN ET AL.

Figure 7. Mean predictive validity per condition. Each raw data point represents the correlation between a participant’s predicted and observed performance (N = 40).
No raw data is plotted in the prescribed algorithm condition because participants could not influence the algorithm’s prediction in any way. Therefore, no variance was
observed in this condition by design. Error bars represent 95% confidence intervals. Some jittering in the x-axis direction was added to the plot to improve readability.

highest in the holistic condition and lowest in the self-designed algorithm condition than in the holistic condition, which
algorithm condition. In other words, the less decision makers likely explains the markedly higher predictive validity.
were directly involved in combining information about indivi
dual applicants, the less it mattered who combined the infor
Discussion
mation. Moreover, just using the cognitive ability test in a top-
down manner would have resulted in higher predictive validity In line with our hypotheses and earlier findings (Nolan
than the validity that was observed in the holistic and holistic et al., 2016), decision makers believed stakeholders would
perceive them as more competent, more responsible for,
adjustment condition (cf. Table 6 and Figure 7).
and more in control of the hiring decision when making
holistic predictions rather than using a prescribed algo
rithm. Extending earlier findings, we also found that deci
Judgement consistency and relative weights analysis sion-makers’ meta-beliefs were much higher when they
We also explored how, and how consistently decision used AEAPs rather than a prescribed algorithm. Relatedly,
makers weighted the information, by regressing the partici decision-makers’ meta-beliefs were very similar when mak
pant’s (or the algorithm’s) predictions on the three predic ing holistic predictions compared to using AEAPs.
tors. The resulting multiple R reflects how consistent Importantly, we also found that using AEAPs substantially
decision makers apply their own judgement policy and is increased predictive validity compared to holistic
also called response consistency or cognitive control prediction.
(Hammond & Summers, 1972). We also calculated relative
weights (Grömping, 2006), which reflect how important
a predictor was in forming the participant’s (or algorithm’s) General Discussion
prediction. Table 12 shows participants’ mean judgement One reason why decision makers rarely use algorithms in
consistency and mean relative weights for each of the three hiring decisions is that they worry about negative stake
predictors per condition. Providing decision makers with holder evaluations (Nolan et al., 2016). In line with existing
a valid anchor (the algorithm’s prediction) increased judge research (Gonzalez et al., 2022; Kares et al., 2023), we
ment consistency compared to the holistic condition, and expected and found that the use of AEAPs improved stake
resulted in relative weights that were similar to relative holder evaluations and decision-makers’ meta-beliefs, com
weights obtained from the meta-analytically derived algo pared to the use of a prescribed algorithm. Furthermore,
rithm. Furthermore, despite similar relative weights, judge both stakeholder evaluations and decision-makers’ meta-
ment consistency was much higher in the self-designed beliefs were similar for holistic prediction and AEAPs.

Table 12. Mean judgement consistency and mean relative predictor weights per condition.

Condition Judgement consistency RW a Cognitive ability RW Conscientiousness RW Interview

Holistic .63 .47 .23 .30
Holistic adjustment .79 .63 .20 .17
Self-designed algorithm 1 .44 .30 .26
Prescribed algorithm 1 .67 .21 .12
a
RW = Relative weight. Relative weights per condition sum to 100% and were computed using the relaimpo package in R (Grömping, 2006).
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 259

Extending earlier findings, we also showed that stakeholder mixed findings (Balzer et al., 1989; Neumann et al., 2022).
and decision-makers’ perceptions were relatively indifferent Relatedly, Kahneman et al. (2021) recently suggested to delay
to how autonomy was enhanced (algorithm design vs. con the time between collecting information (e.g., interview
trol over a prescribed algorithm’s predictions). Lastly, we impressions) and making hiring decisions, and to present infor
showed that AEAPs substantially improved predictive validity mation sequentially to decision makers. We encourage
compared to holistic prediction. Yet, neither holistic predic researchers to test whether these interventions can improve
tion nor AEAPs resulted in higher predictive validity than the decision making in a similar manner as enhancing autonomy in
meta-analytically derived algorithm. In summary, our results algorithmic decision making.
on predictive validity are largely in line with findings by We presented decision makers with quantitative test scores
Dietvorst et al. (2018) and Neumann et al. (2022), and gen and a quantified rating from an interview conducted by some
erally suggest that enhancing autonomy in algorithmic deci one else. If decision makers had conducted interviews them
sion making effectively encourages algorithm use and selves, they may have overweighted such qualitative and “rich”
improves validity compared to holistic prediction. information compared to a quantified interview rating from
Intriguingly, our results also showed that strict top-down someone else (Kuncel, 2018; Niessen et al., 2022). Therefore,
use of the single best predictor would have resulted in more future research could vary decision-makers’ involvement in the
valid predictions (r = .31) than holistic prediction (r = .16). This data collection and use applicant data that contains qualitative
demonstrates again that “less can be more”; invalid information and quantitative information. Furthermore, it should be inves
can distract decision makers from valid information and tigated whether decision makers are less likely to holistically
thereby reduce predictive validity, which is also known as the adjust predictions from a self-designed rather than a prescribed
“dilution effect” (Niessen et al., 2022). Relatedly, conscientious algorithm. Decision makers may trust their own algorithm more
ness and the unstructured interview had only slight incremen than an algorithm presented by others (Kuncel, 2018).
tal validity over and above cognitive ability when this Relatedly, characteristics such as an algorithm’s transparency
information was combined through our meta-analytic, pre and complexity, and decision-makers’ understanding of an
scribed algorithm, but not when combined holistically or algorithm’s design could affect algorithm use (Langer &
through a self-designed algorithm. This finding underscores Landers, 2021). Although algorithms with equal and even ran
that results from studies on combinations of predictors, in dom but consistent predictor weights can outperform holistic
which predictors are typically combined using optimal regres predictions when predictors are similar in validity (Yu & Kuncel,
sion (e.g., Schmidt & Hunter, 1998), may not generalize to 2020), it is plausible that decision makers would holistically
practice (Murphy, 2019; Sackett et al., 2017), where optimal adjust predictions from such simple and unfancy algorithms
regression or other, suboptimal mechanical combination meth more, compared to more complex algorithms that may, for
ods are rarely used to make hiring decisions (Neumann et al., example, take predictor interactions and non-linearities into
2023; Ryan & Sackett, 1987). Hence, we should think about account. Moreover, future studies may present information on
personnel selection as judgement and decision science the same scale. We deliberately presented information on dif
(Meijer & Niessen, 2022), and investigate not only the validity ferent scales because this is representative of practice. Yet, this
of (optimal combinations of) test scores, as is now often the may have made consistent information combination in the
case in research, but also the validity of our decisions. This holistic and holistic adjustment condition more difficult.
changing attitude would greatly increase the usefulness of
personnel selection research for practice.
Theoretical and practical implications
In line with the discounting principle from attribution theory
Limitations and directions for future research
and earlier research (Nolan et al., 2016), stakeholders ascribed
We conducted experiments to test the effect of the hiring more credit to decision makers for holistic compared to
approach on stakeholder perceptions, decision-makers’ meta- algorithmic hiring decisions. Importantly, decision makers
beliefs, and predictive validity. Ideally, our findings would be who used AEAPs were still largely seen as responsible for
replicated in the field, although this is very challenging, and in control of the hiring decision by stakeholders, which
because very few organizations contain the required number was related to higher competence perceptions. We also
of assessors for these studies. Furthermore, we relied on self- found support for the hypothesis that decision-makers’ meta-
reported intentions to use a hiring approach for future deci beliefs relate to their intentions to use algorithmic decision-
sions. Ideally, we would also observe actual behaviour (i.e., making procedures, as proposed by the theory of planned
implementation of AEAPs). Lastly, we note that our cross- behaviour. Decision-makers’ concerns about negative stake
sectional design limits the conclusions that can be drawn holder perceptions were largely alleviated when they
from our mediation analyses (Spector, 2019). retained autonomy in algorithmic decision making.
As our results showed, retaining decision-makers’ autonomy Moreover, our results advance our understanding of whether
in algorithmic decision making can be a promising way to stakeholders and decision makers are sensitive to different
improve decision making. However, interventions that primar autonomy-enhancing features (Langer & Landers, 2021).
ily focus on improving the validity of pure holistic predictions Design control (by choosing an algorithm’s predictor
may also be fruitful. Such interventions may include providing weights) resulted in very similar perceptions by both stake
decision makers with knowledge on the validity of predictors, holders and decision makers as outcome control (adjusting
although research on this intervention produced somewhat a prescribed algorithm’s predictions holistically). We
260 M. NEUMANN ET AL.

tentatively conclude that enhancing autonomy in the first ORCID

place seems more important for the acceptance of algorith
Marvin Neumann http://orcid.org/0000-0003-0193-8159
mic decision making than how exactly autonomy is A. Susan M. Niessen http://orcid.org/0000-0001-8249-9295
enhanced. Although we used a very simple and transparent Maximilian Linde http://orcid.org/0000-0001-8421-090X
algorithm in our studies, our results may also have implica Jorge N. Tendeiro http://orcid.org/0000-0003-1660-3642
tions for the design and acceptance of more complex AI Rob R. Meijer http://orcid.org/0000-0001-5368-992X
systems. Lastly, another theoretical implication concerns the
underlying mechanisms that explain the relationship References
between enhanced autonomy and use intentions. We found
some support for the indirect effect of autonomy on use Ægisdóttir, S., White, M. J., Spengler, P. M., Maugherman, A. S.,
intentions, via personal control and competence perceptions Anderson, L. A., Cook, R. S., Nichols, C. N., Lampropoulos, G. K.,
Walker, B. S., Cohen, G., & Rush, J. D. (2006). The meta-analysis of clinical
(but not TOTU). Yet, we caution not to over interpret our judgment project: Fifty-six years of accumulated research on clinical
partially exploratory mediation results. versus statistical prediction. The Counseling Psychologist, 34(3),
Our results also have important practical implications. To 341–382. https://doi.org/10.1177/0011000005285875
improve decision making, AEAPs should not only improve Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior
perceptions, but first and foremost result in higher predictive and Human Decision Processes, 50(2), 179–211. https://doi.org/10.1016/
0749-5978(91)90020-T
validity than pure holistic prediction. Based on our results, we Arkes, H. R., Shaffer, V. A., & Medow, M. A. (2007). Patients derogate
recommend decision makers to combine valid predictors physicians who use a computer-assisted diagnostic aid. Medical
consistently according to a simple algorithm. In line with Decision Making, 27(2), 189–202. https://doi.org/10.1177/
Dietvorst et al. (2018), holistically adjusting predictions from 0272989X06297391
a prescribed algorithm also improved predictive validity and Balzer, W. K., Doherty, M. E., & O’Connor, R., Jr. (1989). Effects of cognitive
feedback on performance. Psychological Bulletin, 106(3), 410–433.
stakeholder perceptions. Yet, we found that this approach https://doi.org/10.1037/0033-2909.106.3.410
resulted in much more variability in predictive validity com Burton, J. W., Stein, M. K., & Jensen, T. B. (2020). A systematic review of
pared to the self-designed algorithm condition, in which algorithm aversion in augmented decision making. Journal of Behavioral
holistic adjustment was not possible. Thus, in an ideal Decision Making, 33(2), 220–239. https://doi.org/10.1002/bdm.2155
world, decision makers should design an algorithm and fol Castelo, N., Bos, M. W., & Lehmann, D. R. (2019). Task-dependent algorithm
aversion. Journal of Marketing Research, 56(5), 809–825. https://doi.org/
low it consistently. 10.1177/0022243719851788
Cortina, J. M., Goldstein, N. B., Payne, S. C., Davison, H. K., & Gilliland, S. W.
(2000). The incremental validity of interview scores over and above
Conclusion cognitive ability and conscientiousness scores. Personnel Psychology, 53
Using AEAPs improves perceptions and results in substantially (2), 325–351. https://doi.org/10.1111/j.1744-6570.2000.tb00204.x
Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making.
higher validity than holistic prediction. Or, in other words, Psychological Bulletin, 81(2), 95–106. https://doi.org/10.1037/h0037613
allowing room to “add an egg” to a prescribed recipe makes Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial
the baker feel and be more appreciated, more likely to use judgment. Science, 243(4899), 1668–1674. https://doi.org/10.1126/
a recipe in the first place, and it makes the cake taste better science.2648573
than one made without a recipe! Diab, D. L., Pui, S., Yankelevich, M., & Highhouse, S. (2011). Lay perceptions
of selection decision aids in US and non-US samples. International
Journal of Selection and Assessment, 19(2), 209–216. https://doi.org/10.
Notes 1111/j.1468-2389.2011.00548.x
Dietvorst, B. J., Simmons, J. P., & Massey, C. (2018). Overcoming algorithm
1. We note that decision makers could also be stakeholders them aversion: People will use imperfect algorithms if they can (even slightly)
selves, such as when observing other decision makers making hiring modify them. Management Science, 64(3), 1155–1170. https://doi.org/10.
decisions. 1287/mnsc.2016.2643
2. Stability, which is another key dimension, is the extent to which Frey, F. E., & Tropp, L. R. (2006). Being seen as individuals versus as group
an outcome is perceived as stable or consistent over time and members: Extending research on metaperception to intergroup
across trials. We had no theoretical argument how it would relate contexts. Personality and Social Psychology Review, 10(3), 265–280.
to stakeholder perceptions and only measured it for theoretical https://doi.org/10.1207/s15327957pspr1003_5
completeness. Gonzalez, M. F., Liu, W., Shirase, L., Tomczak, D. L., Lobbe, C. E.,
3. The word “positive” reflects that stakeholders perceive decision- Justenhoven, R., & Martin, N. R. (2022). Allying with AI? Reactions toward
makers’ locus of causality to be higher (i.e., more internal) when human-based, AI/ML-based, and augmented hiring processes.
holistic prediction is used, compared to when a prescribed algo Computers in Human Behavior, 130, 107179. https://doi.org/10.1016/j.
rithm is used, in line with earlier research (Nolan et al., 2016). chb.2022.107179
Similarly, the word “positive” reflects higher personal control and Grömping, U. (2006). Relative importance for linear regression in R: The
higher perceived competence in the following hypotheses. package relaimpo. Journal of Statistical Software, 17(1), 1–27. https://doi.
4. Although the theory of planned behaviour also suggests attitude org/10.18637/jss.v017.i01
and control as two other independent determinants of intentions Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical
(van der Zee et al., 2002), we solely focused on subjective norms in versus mechanical prediction: A meta-analysis. Psychological Assessment,
this study because this antecedent is theoretically most relevant for 12(1), 19–30. https://doi.org/10.1037/1040-3590.12.1.19
stakeholder perceptions and decision-makers’ meta-beliefs. Guay, J. P., & Parent, G. (2018). Broken legs, clinical overrides, and recidivism
risk: An analysis of decisions to adjust risk levels with the LS/CMI.
Criminal Justice and Behavior, 45(1), 82–100. https://doi.org/10.1177/
Disclosure statement 0093854817719482
Hammond, K. R., & Summers, D. A. (1972). Cognitive control. Psychological
No potential conflict of interest was reported by the author(s). Review, 79(1), 58–67. https://doi.org/10.1037/h0031851
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 261

Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques: education and incentives to improve decision making. Journal of
Vol. 3rd ed. Morgan Kaufmann. Experimental Psychology: Applied, 28(1), 166–178. https://doi.org/10.
Highhouse, S., & Brooks, M. E. (2023). Improving workplace judgments by 1037/xap0000372
reducing noise: Lessons learned from a century of selection research. Neumann, M., Niessen, A. S. M., Hurks, P. M., & Meijer, R. R. (2023). Holistic
Annual Review of Organizational Psychology and Organizational Behavior, and mechanical combination in psychological assessment: Why algo
10(1), 519–533. https://doi.org/10.1146/annurev-orgpsych-120920- rithms are underutilized and what is needed to increase their use.
050708 International Journal of Selection and Assessment, 31(2), 267–285.
Hoffman, M., Kahn, L. B., & Li, D. (2017). Discretion in hiring. The Quarterly Journal https://doi.org/10.1111/ijsa.12416
of Economics, 133(2), 765–800. https://doi.org/10.1093/qje/qjx042 Neumann, M., Niessen, A. S. M., & Meijer, R. R. (2021). Implementing
Kahneman, D., & Klein, G. (2009). Conditions for intuitive expertise: A failure evidence-based assessment and selection in organizations: A review
to disagree. American Psychologist, 64(6), 515–526. https://doi.org/10. and an agenda for future research. Organizational Psychology Review,
1037/a0016755 11(3), 205–239. https://doi.org/10.1177/2041386620983419
Kahneman, D., Sibony, O., & Sunstein, C. R. (2021). Noise: A flaw in human Neumann, M., Niessen, A. S. M., Tendeiro, J. N., & Meijer, R. R. (2022). The
judgment. Little, Brown Spark. autonomy-validity dilemma in mechanical prediction procedures: The
Kares, F., König, C. J., Bergs, R., Protzel, C., & Langer, M. (2023). Trust in hybrid quest for a compromise. Journal of Behavioral Decision Making, 35(4),
human-automated decision-support. International Journal of Selection e2270. https://doi.org/10.1002/bdm.2270
and Assessment, 31(3), 388–402. https://doi.org/10.1111/ijsa.12423 Newman, D. T., Fast, N. J., & Harmon, D. J. (2020). When eliminating bias isn’t
Kausel, E. E., Culbertson, S. S., & Madrid, H. P. (2016). Overconfidence in fair: Algorithmic reductionism and procedural justice in human resource
personnel selection: When and why unstructured interview information decisions. Organizational Behavior and Human Decision Processes, 160,
can hurt hiring decisions. Organizational Behavior and Human Decision 149–167. https://doi.org/10.1016/j.obhdp.2020.03.008
Processes, 137, 27–44. https://doi.org/10.1016/j.obhdp.2016.07.005 Niessen, A. S. M., Kausel, E. E., & Neumann, M. (2022). Using narratives and
Kelley, H. H. (1973). The processes of causal attribution. American numbers in performance prediction: Attitudes, confidence, and validity.
Psychologist, 28(2), 107–128. https://doi.org/10.1037/h0034225 International Journal of Selection and Assessment, 30(2), 216–229. https://
Kruschke, J. K. (2015). Doing bayesian data analysis: A tutorial with R, JAGS, doi.org/10.1111/IJSA.12364
and Stan (2nd ed.). Academic Press. https://doi.org/10.1016/B978-0-12- Nolan, K. P. (2013). Basic psychological need fulfillment and user resistance to
405888-0.00008-8 objective and analytical decision-making practices in employee selection
Kuncel, N. R. (2018). Judgment and decision making in staffing research and (Doctoral dissertation).
practice. In D. S. Ones, N. Anderson, C. Viswesvaran, & H. K. Sinangil Nolan, K. P., Carter, N. T., & Dalal, D. K. (2016). Threat of technological
(Eds.), The sage handbook of industrial, work and organizational psychol unemployment: Are hiring managers discounted for using standardized
ogy (2nd ed., pp. 474–487). SAGE Publications Ltd. https://doi.org/10. employee selection practices? Personnel Assessment and Decisions, 2(1),
4135/9781473914940.n17 30–47. https://doi.org/10.25035/pad.2016.004
Kuncel, N. R., Klieger, D. M., Connelly, B. S., & Ones, D. S. (2013). Mechanical Nolan, K. P., Dalal, D. K., & Carter, N. (2020). Threat of technological unem
versus clinical data combination in selection and admissions decisions: A ployment, use intentions, and the promotion of structured interviews in
meta-analysis. Journal of Applied Psychology, 98(6), 1060–1072. https:// personnel selection. Personnel Assessment and Decisions, 6(2), 38–53.
doi.org/10.1037/a0034156 https://doi.org/10.25035/pad.2020.02.006
Langer, M., & Landers, R. N. (2021). The future of artificial intelligence at work: Nolan, K. P., & Highhouse, S. (2014). Need for autonomy and resistance to
A review on effects of decision automation and augmentation on workers standardized employee selection practices. Human Performance, 27(4),
targeted by algorithms and third-party observers. Computers in Human 328–346. https://doi.org/10.1080/08959285.2014.929691
Behavior, 123, 106878. https://doi.org/10.1016/J.CHB.2021.106878 Preacher, K. J., & Kelley, K. (2011). Effect size measures for mediation models:
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis Quantitative strategies for communicating indirect effects. Psychological
and a review of the evidence. University of Minnesota Press. https://doi. Methods, 16(2), 93–115. https://doi.org/10.1037/a0022658
org/10.1037/11281-000 Ryan, A. M., & Sackett, P. R. (1987). A survey of individual assessment
Meehl, P. E. (1986). Causes and effects of my disturbing little book. Journal practices by I/O psychologists. Personnel Psychology, 40(3), 455–488.
of Personality Assessment, 50(3), 370–375. https://doi.org/10.1207/ https://doi.org/10.1111/j.1744-6570.1987.tb00610.x
s15327752jpa5003_6 Sackett, P. R., Dahlke, J. A., Shewach, O. R., & Kuncel, N. R. (2017). Effects of
Meijer, R. R., Neumann, M., Hemker, B. T., & Niessen, A. S. M. (2020). A tutorial on predictor weighting methods on incremental validity. Journal of Applied
mechanical decision-making for personnel and educational selection. Psychology, 102(10), 1421–1434. https://doi.org/10.1037/apl0000235
Frontiers in Psychology, 10. Article 3002. https://doi.org/10.3389/fpsyg.2019. Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection
03002 methods in personnel psychology: Practical and theoretical implications
Meijer, R. R., & Niessen, A. S. M. (2022). Personnel selection as judgment and of 85 years of research findings. Psychological Bulletin, 124(2), 262–274.
decision science: An introduction. International Journal of Selection and https://doi.org/10.1037/0033-2909.124.2.262
Assessment, 30(2), 193–194. https://doi.org/10.1111/IJSA.12377 Spector, P. E. (2019). Do not cross me: Optimizing the use of cross-sectional
Meyer, J. P., & Koelbl, S. L. M. (1982). Students’ test performances: designs. Journal of Business and Psychology, 34(2), 125–137. https://doi.
Dimensionality of causal attributions. Personality and Social Psychology org/10.1007/s10869-018-09613-8
Bulletin, 8(1), 31–36. https://doi.org/10.1177/014616728281005 van der Zee, K. I., Bakker, A. B., & Bakker, P. (2002). Why are structured
Mikkelson, D. (2008, January 31). Requiring an Egg Made Instant Cake Mixes interviews so rarely used in personnel selection? Journal of Applied
Sell? Snopes. https://www.snopes.com/fact-check/something-eggstra/ Psychology, 87(1), 176–184. https://doi.org/10.1037/0021-9010.87.1.176
Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can decision Wesche, J. S., Hennig, F., Kollhed, C. S., Quade, J., Kluge, S., & Sonderegger, A.
making be improved? Perspectives on Psychological Science, 4(4), (2022). People’s reactions to decisions by human vs. algorithmic deci
379–383. https://doi.org/10.1111/j.1745-6924.2009.01142.x sion-makers: The role of explanations and type of selection tests.
Murphy, K. R. (2019). Understanding how and why adding valid predictors can European Journal of Work and Organizational Psychology, 1–12. https://
decrease the validity of selection composites: A generalization of Sackett, doi.org/10.1080/1359432X.2022.2132940
Dahlke, shewach, and Kuncel (2017). International Journal of Selection and Yu, M. C., & Kuncel, N. R. (2020). Pushing the limits for judgmental consis
Assessment, 27(3), 249–255. https://doi.org/10.1111/ijsa.12253 tency: Comparing random weighting schemes with expert judgments.
Murphy, K. R. (2021). In praise of table 1: The importance of making better Personnel Assessment and Decisions, 6(2), 1–10. https://doi.org/10.25035/
use of descriptive statistics. Industrial and Organizational Psychology: pad.2020.02.002
Perspectives on Science and Practice, 14(4), 461–477. https://doi.org/10. Yu, M. C., & Kuncel, N. R. (2022). Testing the value of expert insight:
1017/iop.2021.90 Comparing local versus general expert judgment models. International
Neumann, M., Hengeveld, M., Niessen, A. S. M., Tendeiro, J. N., & Meijer, R. R. Journal of Selection and Assessment, 30(2), 202–215. https://doi.org/10.
(2022). Education increases decision-rule use: An investigation of 1111/IJSA.12356
262 M. NEUMANN ET AL.

Appendix A. Attention checks in Study 1

Attention check 1: In all conditions, participants read that the hiring manager used a cognitive ability test, a score from a personality questionnaire, and
a rating from a traditional hiring interview to evaluate candidates. Upon reading the condition-specific hiring approach (see Table 1), participants were
presented with four assessment instruments (cognitive ability test, personality questionnaire, traditional hiring interview, curriculum vitae) and had to
indicate on the next slide which instruments the manager did not use to evaluate candidates (multiple answer options possible). Only participants who
solely selected “curriculum vitae” were included in the study.
Attention check 2: An attention check item was included in the perceived competence scale, and participants were asked to respond with “somewhat
disagree”. Participants who did not respond with “somewhat disagree” were excluded.
Attention check 3: Participants who did not respond to the statement “I have 17 fingers” with “no” were excluded.

Appendix B. Attention checks and other Exclusion Criteria in Study 2

We presented participants with 12 work activities (in random order) that we obtained from the O*NET (https://www.onetonline.org/find/
descriptor/browse/Work_Activities/). Participants had to indicate up to five work activities that they primarily engage in at work. Participants
were excluded if none of the options “Staffing organizational units”, “Making decisions and solving problems”, or “Judging the qualities of
things, services, or people” was chosen. Furthermore, we excluded participants who indicated not making a single hiring decision per year.
Moreover, we excluded participants who failed at least one of two attention checks. Participants were presented with predictor scores from
two applicants and indicated whether applicant A outperformed applicant B on all predictors (true or false). Participants who failed this
attention check were prompted to review the information again and to answer the question. Participants who failed the attention check
a second time were excluded. In the second attention check, participants were presented with descriptions of the prediction tasks of the four
conditions, and they had to indicate what type of prediction task they had completed. Participants who did not choose the correct description
were excluded.

Introduction To Statistics: Business Statistics - Naval Bajpai
0% (1)
Introduction To Statistics: Business Statistics - Naval Bajpai
14 pages
8 Different Types of Risk Assessment
60% (5)
8 Different Types of Risk Assessment
51 pages
s40685-020-00134-w 2
No ratings yet
s40685-020-00134-w 2
54 pages
Deep Learning-Enabled Human Resource Analytics in Predicting Employee Performance-shaikh Paper Iee
No ratings yet
Deep Learning-Enabled Human Resource Analytics in Predicting Employee Performance-shaikh Paper Iee
5 pages
e Tarjome E17733
No ratings yet
e Tarjome E17733
11 pages
Ijresm V7 I4 31
No ratings yet
Ijresm V7 I4 31
7 pages
How To Hire With Algorithms
No ratings yet
How To Hire With Algorithms
4 pages
12510013
No ratings yet
12510013
14 pages
1 s2.0 S2666307423000232 Main
No ratings yet
1 s2.0 S2666307423000232 Main
8 pages
EBSCO FullText 10 - 20 - 2024 2
No ratings yet
EBSCO FullText 10 - 20 - 2024 2
25 pages
AI in HRM
100% (1)
AI in HRM
30 pages
Artificial Intelligence in Human Resources Management - Challenges and A Path Forward SSRN-id3263878
No ratings yet
Artificial Intelligence in Human Resources Management - Challenges and A Path Forward SSRN-id3263878
29 pages
1 s2.0 S0090261618300974 Main
No ratings yet
1 s2.0 S0090261618300974 Main
12 pages
Heuristics in Decision Making
No ratings yet
Heuristics in Decision Making
8 pages
Human Res MGMT Journal - 2022 - Vassilopoulou - Scientism As Illusio in HR Algorithms Towards A Framework For Algorithmic
No ratings yet
Human Res MGMT Journal - 2022 - Vassilopoulou - Scientism As Illusio in HR Algorithms Towards A Framework For Algorithmic
15 pages
Enhancing Decision-Making in Workforce Planning through Artificial Intelligence-Driven Predictive Analytics.
No ratings yet
Enhancing Decision-Making in Workforce Planning through Artificial Intelligence-Driven Predictive Analytics.
15 pages
The Predictive Index - Eng
No ratings yet
The Predictive Index - Eng
3 pages
Decision Analysis: Fundamentals and Applications
From Everand
Decision Analysis: Fundamentals and Applications
Fouad Sabry
No ratings yet
AIHRApathforward
No ratings yet
AIHRApathforward
28 pages
40% of HR Departments: Benefits of Algorithmic Management
No ratings yet
40% of HR Departments: Benefits of Algorithmic Management
4 pages
A Fairness Evaluation of Algorithmic Video Analysis in The Recruitment Context
No ratings yet
A Fairness Evaluation of Algorithmic Video Analysis in The Recruitment Context
16 pages
Untitled
No ratings yet
Untitled
16 pages
Artificial Intelligence in Human resource Management
No ratings yet
Artificial Intelligence in Human resource Management
7 pages
ch7 - Selection-Pt-2 - Read-Only
No ratings yet
ch7 - Selection-Pt-2 - Read-Only
8 pages
Local Validation Study SA 2021
No ratings yet
Local Validation Study SA 2021
12 pages
Cappelli Tambe 2019
No ratings yet
Cappelli Tambe 2019
30 pages
Module 3
No ratings yet
Module 3
33 pages
Algorithmic Versus Human Advice
No ratings yet
Algorithmic Versus Human Advice
31 pages
[PDF] Practical and Ethical Perspectives on AI-Based Employee Performance Evaluation
No ratings yet
[PDF] Practical and Ethical Perspectives on AI-Based Employee Performance Evaluation
11 pages
Machine Learning Structural Equation Modeling Algorithm To Measure Performance
No ratings yet
Machine Learning Structural Equation Modeling Algorithm To Measure Performance
10 pages
Decision Making Is A Complex
No ratings yet
Decision Making Is A Complex
4 pages
Decision Making Problem Solving Notes 2
No ratings yet
Decision Making Problem Solving Notes 2
6 pages
2501.02368v1
No ratings yet
2501.02368v1
11 pages
Combined Version of Knowing Your Psychometrics
No ratings yet
Combined Version of Knowing Your Psychometrics
28 pages
Competency Mapping Models
100% (1)
Competency Mapping Models
23 pages
Can Artificial intellignece improve air force talent management
No ratings yet
Can Artificial intellignece improve air force talent management
76 pages
10 1108 - Ejim 01 2023 0048
No ratings yet
10 1108 - Ejim 01 2023 0048
39 pages
Reading 1
No ratings yet
Reading 1
4 pages
Hina+Riaz,+Journal+of+Informatics+Education+and+Research+Center+for+Research
No ratings yet
Hina+Riaz,+Journal+of+Informatics+Education+and+Research+Center+for+Research
9 pages
week3_lec3
No ratings yet
week3_lec3
13 pages
annurev.psych.59.103006.093716
No ratings yet
annurev.psych.59.103006.093716
35 pages
CIPD
No ratings yet
CIPD
9 pages
HUMAN RESOURCE MANAGMENT 1
No ratings yet
HUMAN RESOURCE MANAGMENT 1
5 pages
Competency Models: BY Malathi.N 12 MD O42
No ratings yet
Competency Models: BY Malathi.N 12 MD O42
32 pages
Competency Model - ENG
No ratings yet
Competency Model - ENG
32 pages
W6 - L2 - Predictive Analytics
No ratings yet
W6 - L2 - Predictive Analytics
4 pages
_82
No ratings yet
_82
8 pages
18 Intellisys Employee
No ratings yet
18 Intellisys Employee
22 pages
Modeling Organizational Performanc - 2022 - Journal of Open Innovation Technolo
No ratings yet
Modeling Organizational Performanc - 2022 - Journal of Open Innovation Technolo
19 pages
Welcom E
No ratings yet
Welcom E
93 pages
The Competency Modeling Approach
No ratings yet
The Competency Modeling Approach
4 pages
IBM Analysis
No ratings yet
IBM Analysis
17 pages
Foreseeing Employee Attritions Using Div
No ratings yet
Foreseeing Employee Attritions Using Div
7 pages
Predicting Employees Performance Using Data Mining Techniques
No ratings yet
Predicting Employees Performance Using Data Mining Techniques
12 pages
HR Analyst Program
No ratings yet
HR Analyst Program
60 pages
Advanced Methods Employee Selection
No ratings yet
Advanced Methods Employee Selection
32 pages
Artificial Intelligence in Personnel Management - AM
No ratings yet
Artificial Intelligence in Personnel Management - AM
20 pages
Human Resource Planning
No ratings yet
Human Resource Planning
49 pages
Enhancing Employee Performance Through Strategic Initiatives
No ratings yet
Enhancing Employee Performance Through Strategic Initiatives
14 pages
7dxstZalR3qcbLWWpRd6aQ Module1
No ratings yet
7dxstZalR3qcbLWWpRd6aQ Module1
58 pages
Assignment HRP
No ratings yet
Assignment HRP
8 pages
Influence of AI in HR Skill Optimizing
No ratings yet
Influence of AI in HR Skill Optimizing
8 pages
Bridging The Gap
No ratings yet
Bridging The Gap
31 pages
Bibliometric Analysis On The Philosophy Behind Agile
No ratings yet
Bibliometric Analysis On The Philosophy Behind Agile
7 pages
MBA 2021-22, Term I Financial Accounting Quiz 1
No ratings yet
MBA 2021-22, Term I Financial Accounting Quiz 1
1 page
ME Mid-Term Exam 2020
No ratings yet
ME Mid-Term Exam 2020
3 pages
Accenture Creating Customer Loyalty A Customer Centric Approach
100% (1)
Accenture Creating Customer Loyalty A Customer Centric Approach
12 pages
Recreating Drug Discovery
No ratings yet
Recreating Drug Discovery
30 pages
Cultural Sustainability and Social Sustainability Influences On The Sustainability of Rural Museums in China
No ratings yet
Cultural Sustainability and Social Sustainability Influences On The Sustainability of Rural Museums in China
19 pages
BLEP Reviewer - IOPsych
No ratings yet
BLEP Reviewer - IOPsych
10 pages
Quiz 3
No ratings yet
Quiz 3
7 pages
Kepentingan Peta Minda Sebagai Alat Berfikir
100% (1)
Kepentingan Peta Minda Sebagai Alat Berfikir
6 pages
Impact of Chatgpt On Esl Students' Academic Writing Skills: A Mixed Methods Intervention Study
No ratings yet
Impact of Chatgpt On Esl Students' Academic Writing Skills: A Mixed Methods Intervention Study
18 pages
3D Printed Teeth For The Preclinical Education
No ratings yet
3D Printed Teeth For The Preclinical Education
7 pages
QAM-II Outline2023 - Final
No ratings yet
QAM-II Outline2023 - Final
7 pages
Step by Step Guide To Writing A CV
No ratings yet
Step by Step Guide To Writing A CV
24 pages
Buku The Routledge companion to financial accounting theory.
No ratings yet
Buku The Routledge companion to financial accounting theory.
95 pages
Eapp 4TH Quarter Exam Reviewer
No ratings yet
Eapp 4TH Quarter Exam Reviewer
8 pages
SIP Research Paper
No ratings yet
SIP Research Paper
61 pages
Sergei K. Suslov Auth. An Introduction To Basic Fourier Series
No ratings yet
Sergei K. Suslov Auth. An Introduction To Basic Fourier Series
378 pages
Strategic Marketing Management - Slide 04
No ratings yet
Strategic Marketing Management - Slide 04
17 pages
Child Assessment
No ratings yet
Child Assessment
11 pages
Ehbauer - Et - Al - Measuring and Managing Service Performance of Luxury Stores
No ratings yet
Ehbauer - Et - Al - Measuring and Managing Service Performance of Luxury Stores
20 pages
Airline Benchmarking and Performance Improvement
No ratings yet
Airline Benchmarking and Performance Improvement
27 pages
Seminar On Integrative Paper and Dissertation Writing
No ratings yet
Seminar On Integrative Paper and Dissertation Writing
31 pages
The Relationship Among Linguistic Intelligence, Ethnic Identity, and Bilingual Iranian EFL Learners Reading Comprehension
No ratings yet
The Relationship Among Linguistic Intelligence, Ethnic Identity, and Bilingual Iranian EFL Learners Reading Comprehension
7 pages
Chapter-5 Sampling and Data Collection
No ratings yet
Chapter-5 Sampling and Data Collection
62 pages
Literature Review On Consumer Behaviour Towards Chocolates
No ratings yet
Literature Review On Consumer Behaviour Towards Chocolates
5 pages
Instant Download Human Animal Relationships in Transformation Scientific Moral and Legal Perspectives 1st Edition Augusto Vitale Simone Pollo PDF All Chapters
100% (4)
Instant Download Human Animal Relationships in Transformation Scientific Moral and Legal Perspectives 1st Edition Augusto Vitale Simone Pollo PDF All Chapters
55 pages
Paramount Bayah Au-Ag
No ratings yet
Paramount Bayah Au-Ag
7 pages
Unit-2
No ratings yet
Unit-2
9 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
EL Towards Management Efficiency TVET
No ratings yet
EL Towards Management Efficiency TVET
77 pages
Statistical Report On Philippine Arguments and Dissenting Opinions of The Filipino Youths
No ratings yet
Statistical Report On Philippine Arguments and Dissenting Opinions of The Filipino Youths
10 pages
The Scope and Development of Political Sociology
No ratings yet
The Scope and Development of Political Sociology
4 pages

Adding An Egg in Algorithmic Decision Making Improving Stakeholder and User Perceptions and Predictive Validity by Enhancing Autonomy

Uploaded by

Adding An Egg in Algorithmic Decision Making Improving Stakeholder and User Perceptions and Predictive Validity by Enhancing Autonomy

Uploaded by

European Journal of Work and Organizational Psychology

ISSN: (Print) (Online) Journal homepage: www.tandfonline.com/journals/pewo20

“Adding an egg” in algorithmic decision making:

Marvin Neumann, A. Susan M. Niessen, Maximilian Linde, Jorge N. Tendeiro

To link to this article: https://doi.org/10.1080/1359432X.2023.2260540

© 2023 The Author(s). Published by Informa

View supplementary material

Published online: 26 Sep 2023.

Submit your article to this journal

Article views: 1016

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

“Adding an egg” in algorithmic decision making: improving stakeholder and user

ABSTRACT ARTICLE HISTORY

CONTACT Marvin Neumann [email protected]

Hypothesis 2a: Stakeholders’ beliefs about decision-maker’s

Hypothesis 3a: Stakeholders’ beliefs about decision-maker’s Study 1

Table 1. Descriptions of the hiring approach per condition in study 1.

Condition Description of the hiring approach

Results Locus of causality, personal control, and perceived

Table 2. Correlations between measures in study 1.

Autonomya Locus of causality Personal control Perceived competence

Table 5. Mediation results from study 1.

Paths Unstandardized effect estimate Partially standardized effect size

Study 2 makers’ intentions to use AEAPs. Our conceptual model is

Hypothesis 9: AEAPs will result in more valid performance

Table 7. Descriptions of the hiring approach per condition in study 2.

General Mental Ability Test: 53

General Mental Ability Test: 53

Table 8. Correlations between measures in study 2.

Table 11. Exploratory mediation results from study 2.

Unstandardized effect Partially standardized effect

Condition Judgement consistency RW a Cognitive ability RW Conscientiousness RW Interview

tentatively conclude that enhancing autonomy in the first ORCID

Appendix A. Attention checks in Study 1

Appendix B. Attention checks and other Exclusion Criteria in Study 2

You might also like