Adding An Egg in Algorithmic Decision Making Improving Stakeholder and User Perceptions and Predictive Validity by Enhancing Autonomy
Adding An Egg in Algorithmic Decision Making Improving Stakeholder and User Perceptions and Predictive Validity by Enhancing Autonomy
To cite this article: Marvin Neumann, A. Susan M. Niessen, Maximilian Linde, Jorge N. Tendeiro
& Rob R. Meijer (2024) “Adding an egg” in algorithmic decision making: improving stakeholder
and user perceptions, and predictive validity by enhancing autonomy, European Journal of
Work and Organizational Psychology, 33:3, 245-262, DOI: 10.1080/1359432X.2023.2260540
Decision makers such as psychologists and managers often use in these inherently uncertain contexts where outcome feed
multiple pieces of information (e.g., test scores and interview back is often substantially delayed or absent (Kahneman &
ratings) to make performance predictions and hiring decisions. Klein, 2009). The inconsistency of holistic predictions is
Such information is typically combined in the decision-maker’s a major reason why algorithms result in higher predictive valid
mind (Neumann et al., 2023), which is known as holistic (or ity; even algorithms with random positive weights that are
clinical, expert) prediction. Alternatively, information could be consistently applied across applicants can outperform holistic
combined through an algorithm, which is known as mechanical prediction (Dawes & Corrigan, 1974; Yu & Kuncel, 2020).
(or statistical, actuarial, algorithmic) prediction (Meehl, 1954). Relatedly, Yu and Kuncel (2022) found no evidence that deci
Algorithms can take the form of very simple rules, although sion makers can accurately tailor their performance predictions
more advanced machine learning methods are possible to specific organizational contexts. Although tailoring to the
(Highhouse & Brooks, 2023). A simple algorithm would be to context could theoretically improve predictive validity, equally
give equal weight to a test score and an interview rating, and to weighting the available information and hence ignoring orga
hire the candidate(s) with the highest score(s). Predictor nizational context outperformed “tailored” holistic predictions.
weights could also be different, and could be chosen based So, even simple algorithms result in better performance pre
on decision-makers’ judgement or statistical analyses (Kuncel, dictions than holistic predictions.
2018). Despite these robust research findings, decision makers
A robust finding is that more valid performance predictions rarely use algorithms for subjective tasks typically performed
are made when information is combined algorithmically by humans like hiring (Castelo et al., 2019; Neumann et al.,
(Kuncel et al., 2013). The superiority of mechanical over holistic 2023). This results in suboptimal and untransparent hiring
prediction has been demonstrated in various other contexts, decisions (Meijer et al., 2020). Since algorithm use is
including medical diagnoses, parole decisions, and clinical a promising strategy to improve decision making (Milkman
treatment selection (Ægisdóttir et al., 2006; Dawes et al., 1989; et al., 2009), the aim of this paper was to identify algorithmic
Guay & Parent, 2018). This finding also holds when considering decision-making procedures that are liked by stakeholders,
experts’ holistic predictions (Grove et al., 2000; Highhouse & liked and used by decision makers, and that result in higher
Brooks, 2023; Yu & Kuncel, 2020), as expertise is hard to develop predictive validity than holistic prediction (Kuncel, 2018;
Neumann et al., 2021). We defined decision makers as people While teaching a workshop about these autonomy-enhancing
who make hiring decisions themselves (e.g., hiring managers, algorithmic procedures, an attendee, Georgios Fraros, noted that
HR professionals), based on their own judgement or by using the problem and solution presented was in line with a story
algorithms (Langer & Landers, 2021). In contrast, we defined about the introduction of an instant cake mix that only required
stakeholders as people who may observe a decision maker adding water. This time-saving innovation didn’t sell nearly as
making a hiring decision, and who may or may not be directly well as the proud manufacturers had expected. It turned out that
affected by the decision (e.g., higher management, direct col presenting cake as homemade after just adding water and stir
leagues, team members of the future employee, or (internal) ring made people feel guilty. They wanted to put in more effort
applicants).1 and make it their own; they wanted to have to add an egg. While
One important reason why algorithms are rarely used is the this entertaining story may be a myth (Mikkelson, 2008), our
“threat of technological unemployment” (TOTU, Meehl, 1986, expectations are also in line with recent findings by Gonzalez
p. 374): Decision makers worry that they provide less value to et al. (2022). They showed that applicants’ perceptions were
their organization when using algorithms because other stake similar when predictions were made holistically or when using
holders may not attribute the hiring process and outcome to an autonomy-enhancing algorithm (holistically adjusting predic
their holistic judgement anymore (Neumann et al., 2023; Nolan tions from a prescribed algorithm), but more positive compared
et al., 2016, 2020). Indeed, stakeholders give decision makers to using a prescribed algorithm. Relatedly, Kares et al. (2023)
less credit for their hiring decisions (i.e., perceive them as less found that decision makers showed similar trust in holistic pre
responsible for and less in control of hiring decisions) when diction compared to an autonomy-enhancing algorithmic
they use algorithms rather than their holistic judgement (Nolan approach, and generally higher trust in an autonomy-
et al., 2016). Similarly, patients perceived physicians as less enhancing algorithmic approach than a prescribed algorithm.
cable and competent when they made algorithmic (vs. holistic) These results provide some first evidence that AEAPs can
diagnoses (Arkes et al., 2007). Furthermore, lay people perceive improve both stakeholders’ and decision-makers’ perceptions,
algorithm use in hiring as unprofessional and impersonal (Diab compared to prescribed algorithms that restrict autonomy.
et al., 2011), and they have less trust in algorithmic than in
human hiring decisions (Wesche et al., 2022). Importantly,
Nolan et al. (2016) showed that decision makers themselves Contributions and aims of the present studies
correctly believe that stakeholders give them less credit for
their hiring decisions when using algorithms rather than their As Langer and Landers (2021) discuss, existing studies in which
holistic judgement. These beliefs reduced decision-makers’ AEAPs have been investigated have usually focused on either
intentions to use algorithms, and this relationship was stakeholders (e.g., Newman et al., 2020) or decision makers (e.g.,
mediated by TOTU (Nolan et al., 2016, 2020). What people Nolan & Highhouse, 2014), and, importantly, almost all studies
believe others think of them is also called meta perceptions solely focused on perceptions, without investigating actual beha
(Frey & Tropp, 2006). In line with this literature, we henceforth viour, such as effects of algorithm use on performance prediction
call decision-makers’ beliefs about how stakeholders evaluate accuracy (for exceptions in educational selection, see Dietvorst
them decision-makers’ meta-beliefs. et al., 2018; Neumann et al., 2022). Furthermore, there is a large
So, both stakeholders (Langer & Landers, 2021) and decision range of possibilities how autonomy in algorithmic decision
makers (Burton et al., 2020) have more negative attitudes making could be enhanced. However, existing studies typically
towards algorithmic compared to holistic hiring decisions. do not distinguish between different autonomy-enhancing
This is a problem because these perceptions prevent decision approaches (Burton et al., 2020; Langer & Landers, 2021), and
makers from using algorithms, which could improve the quality often introduce such approaches abstractly as “a combination
of hiring decisions (Kuncel et al., 2013). To solve this problem, between a human and an algorithm” (Gonzalez et al., 2022).
interventions need to be identified that can improve stake The aims of the present studies were to uniquely contribute
holder perceptions and decision-makers’ meta-beliefs, and to the literature by investigating whether AEAPs could improve
that still result in better predictions than holistic prediction. both stakeholder perceptions and decision-makers’ meta-
We expected that a useful intervention is enhancing decision- beliefs. Importantly, we also investigated whether AEAPs result
makers’ autonomy in algorithmic decision making. Autonomy in higher predictive validity compared to holistic prediction, in
could be enhanced in at least two ways, compared to a fully the context of personnel selection. This is most important to
automated approach where decision makers would have to use investigate because improving perceptions is of no avail if
a prescribed algorithm consistently and hence have no auton AEAPs do not also improve validity. Yet, such an investigation
omy at all. One approach is allowing decision makers to adjust is lacking in the literature, despite various calls for it (Burton
a prescribed algorithm’s job performance predictions holisti et al., 2020; Highhouse & Brooks, 2023; Kuncel, 2018; Langer &
cally, upwards or downwards (Dietvorst et al., 2018; Kuncel, Landers, 2021). Furthermore, we contribute to the literature by
2018). Another approach is to have decision makers design investigating the effect of two different ways to enhance auton
their own algorithm first, by explicitly choosing predictor omy in algorithmic decision making (holistically adjusting pre
weights (Neumann et al., 2022; Nolan & Highhouse, 2014), dictions from a prescribed algorithm or self-designing an
and then use the resulting predictions consistently, without algorithm) on perceptions and validity. Moreover, we contri
the opportunity to adjust its predictions. We henceforth refer bute to the theoretical understanding of improving percep
to both approaches as autonomy-enhancing algorithmic proce tions of algorithmic decision making through AEAPs, by
dures (AEAPs). investigating the underlying mechanisms of these perceptions.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 247
Theoretical background and hypothesis development the decision process, than when they have no autonomy (i.e.,
they must use a prescribed algorithm). Furthermore, stake
Stakeholders should attribute hiring outcomes less to
holders may ascribe the cause of a hiring decision more to
a decision maker who uses a prescribed algorithm to make
the decision maker (internal factor) than to an algorithm (exter
predictions than to a decision maker who makes holistic pre
nal factor) when decision makers retain autonomy. This is
dictions. This assumption is in line with attribution theory
expected because hiring decisions would not exclusively be
(Kelley, 1973), which addresses how people explain events
determined by an external decision aid (i.e., a prescribed algo
and perceive causality. Two key dimensions of causal attribu
rithm), but partly by the decision-maker’s internal cognitive
tion are locus of causality and personal control (Meyer & Koelbl,
processes. Based on this reasoning, stakeholders should evalu
1982).2 Locus of causality is the extent to which stakeholders
ate decision makers more positively when they use AEAPs
attribute a hiring decision to internal or external factors.
rather than a prescribed algorithm. We hypothesized that:
Applied to personnel selection, holistic prediction can be con
sidered an internal factor because the decision-making process
takes place in the decision-maker’s mind. In contrast, an algo
Hypothesis 1b: Stakeholders’ beliefs about decision-makers’
rithm may be considered an external factor because it consti
locus of causality will be more positive when AEAPs are used,
tutes a contextual decision aid that the decision maker uses to
compared to when a prescribed algorithm is used.
make hiring decisions. Personal control is the extent to which
stakeholders perceive a decision maker as capable of changing
For brevity, we only present this hypothesis for locus of caus
the hiring decision. Perceived personal control should be
ality. We had the same hypotheses for personal control (H2b)
higher when decision makers make holistic rather than algo
and perceived competence (H3b).
rithmic hiring decisions. Based on attribution theory, Nolan
We expected that stakeholders perceive decision makers
et al. (2016) indeed found that stakeholders perceived decision
who use AEAPs as more in control of and more responsible
makers as less responsible for and less in control of the hiring
for the hiring decision than decision makers who use
decision when they used an algorithm rather than their holistic
a prescribed algorithm. Since people commonly believe that
judgement. Accordingly, we had the following hypotheses,
the combination of algorithmic and human judgement results
which replicate Nolan et al. (2016).
in the best decisions (Neumann et al., 2023), it may also be
expected that stakeholders’ positive beliefs about decision-
makers’ locus of causality and personal control translate into
Hypothesis 1a: Stakeholders’ beliefs about decision-makers’
more positive competence perceptions. Our conceptual model
locus of causality will be more positive3 when holistic predic
is shown in Figure 1. We hypothesized that:
tion is used, compared to when a prescribed algorithm is used.
Figure 1. Conceptual model in study 1. a AEAPs = autonomy-enhancing algorithmic procedures. AEAPs was coded as 1 and prescribed algorithm as 0.
(see Appendix A) or finished the tasks in less than a minute. prediction from a prescribed algorithm but could not influ
After inspecting the data, we also excluded participants who ence the algorithm’s weights or the resulting predictions.
reported impossible values (e.g., longer role tenure than After reading the condition-specific descriptions, participants
organizational tenure or an incorrect MTurk ID). We retained completed the dependent measures in random order.
usable data from 582 participants. The mean age was M =
37.1 (SD = 10.6) and 55% was male. Most participants were
U.S. citizens (77%). Other participants were Indian (17%), Measures
Brazilian (3%), or had other non-EU nationalities (3%) or EU All measures were assessed on a 5-point Likert scale (1 =
nationalities (<0.01%). Furthermore, most participants were strongly disagree, 5 = strongly agree, see OSF). We used an
employed full-time (92%) and held a Bachelor or Master adapted six-item scale from Nolan and Highhouse (2014) as
degree (86%). The median study completion time was 5 a manipulation check, to measure the extent to which partici
minutes. The ethics committee of the Heymans institute for pants perceived the manager as having autonomy in making
psychological research approved this study (code: PSY-2021- hiring decisions (α = .88). Furthermore, we used Nolan’s et al.
S-0500). (2016) three-item scales to measure personal control and locus
of causality (α = .84 and α = .83, respectively). Lastly, we used an
adapted version of Nolan’s (2013) six-item scale to measure
Design and procedure perceived competence (α = .89).
We employed a one-factorial (hiring approach) between-subjects
design with four levels (holistic, holistic adjustment, self-designed
Analytical approach
algorithm, prescribed algorithm). Participants read an introduction
of a hypothetical hiring scenario, similar to Nolan et al. (2016): We applied Bayesian parameter estimation because estimat
“Imagine yourself in the following situation . . . The person respon ing the magnitude of and uncertainty around the differ
sible for hiring decisions at your company (e.g., a manager) just hired ences of interest was the most relevant to this study. We
a new employee to fill an open position. Please read the description report posterior parameter estimates (standardized mean
of how this decision was made and answer the questions that differences, denoted as d) with corresponding 95% highest
follow”. Afterwards, participants were randomly assigned to one density intervals (HDIs). An HDI allows for the intuitive
of four conditions and read a condition-specific description of the interpretation that there is a 95% probability that the cor
manager’s hiring approach (see Table 1). responding true parameter lies within the HDI, conditional
We varied how the manager combined a cognitive ability on the model, the priors, and the observed data (Kruschke,
test score, a score from a personality questionnaire, and 2015). We initialized a model in Stan (Stan Development
a rating from a traditional hiring interview to evaluate candi Team, 2022) with chains = 7, thinning = 5, warmup samples
dates. In the holistic condition, the manager used their intui = 5000, and actual samples = 50000, resulting in a total of 7
tion and expertise to combine the information and make (chains) * 50000 (samples)/5 (thinning) = 70000 samples.
a hiring decision. In the holistic adjustment condition, the Furthermore, we used improper (very uninformative) priors
manager saw the prediction from a prescribed algorithm for each parameter. For mediation results, we report the
and could adjust the algorithm’s prediction using their intui partially standardized effect size (Preacher & Kelley, 2011),
tion and expertise. In the self-designed algorithm condition, which is the ratio of the (in)direct effect to the standard
the manager designed an algorithm by choosing predictor deviation of Y. It can be interpreted as the number of
weights that reflected the importance of the information. In standard deviations of Y that two groups differ on average
the prescribed algorithm condition, the manager saw the as a result of the (in)direct mechanism.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 249
Measure 1. 2. 3.
1. Autonomya - -
2. Locus of causality .76 - -
3. Personal control .74 .78 -
4. Perceived competence .68 .65 .54
Correlations were calculated across conditions. N = 582. a Manipulation check.
250 M. NEUMANN ET AL.
Figure 2. Observed means and raw data per condition and measure in study 1. Error bars represent 95% confidence intervals. Some jittering in the x-axis direction was
added to the plot to improve readability.
Table 3. Observed means and standard deviations per condition and measure in study 1.
Condition M SD M SD M SD M SD
Holistic 4.12 0.50 4.21 0.58 4.11 0.65 4.18 0.56
Holistic adjustment 3.93 0.71 4.02 0.71 4.02 0.75 4.09 0.69
Self-designed algorithm 3.90 0.75 3.90 0.89 3.91 0.94 3.97 0.76
Prescribed algorithm 3.50 1.05 3.46 1.14 3.27 1.22 3.82 0.80
a
Manipulation check.
more responsible for and somewhat more in control of the competent than managers who used a prescribed algorithm
hiring decision than managers who used AEAPs (d = 0.29, 95% via its effect on locus of causality (partially standardized effect
HDI [0.09, 0.50] and d = 0.15, 95% HDI [−0.05, 0.35], respec size = 0.32). Furthermore, zero was a credible value for the
tively), this difference was relatively small compared to the direct effect of enhanced autonomy on perceived competence
difference between using AEAPs and using a prescribed algo (partially standardized effect size = −0.09). We did not find
rithm. We found similar results for perceived competence (d = enough evidence that personal control mediated the relation
0.21, 95% HDI [0.00, 0.41]). between enhanced autonomy and perceived competence (par
tially standardized effect size = 0.06). In sum, we found partial
support for hypothesis 4.
Mediation
We first conducted feasibility checks as outlined in Murphy
Discussion
(2021) to assess whether mediation was plausible (cf. hypoth
esis 4). These include checking whether the independent vari In line with findings by Nolan et al. (2016), stakeholders per
able and the mediator(s) are indeed correlated with the ceived decision makers who made holistic predictions as being
dependent variable, and whether the semipartial correlation more competent, more responsible for, and more in control of
between the independent and dependent variable is close to hiring decisions than decision makers who used a prescribed
zero when controlling for the mediator(s). Table 4 shows the algorithm. Extending Nolan’s et al. (2016) findings, we found
correlations between the variables included in the proposed that stakeholders evaluated decision makers much more posi
mediation model. Although the correlation between enhanced tively when they adjusted predictions from a prescribed algo
autonomy and perceived competence was rather small (r = .13), rithm or self-designed an algorithm, compared to when they
the semi-partial correlation was close to zero when controlling strictly used a prescribed algorithm. Moreover, stakeholders
for the mediators (r = −.04). This pattern of correlations sug evaluated decision makers who used AEAPs only slightly
gests that mediation is plausible. more negatively than decision makers who made holistic pre
As Table 5 shows, we found some evidence that stake dictions. While this may suggest that stakeholders somewhat
holders perceived managers who used AEAPs as more divided their attributions of causality, control, and competence
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 251
Figure 3. Posterior distributions for the hypothesized effects in study 1. The grey area under each curve depicts the 95% most probable effect sizes, given the model
and the data. H = holistic prediction, AEAPs = autonomy-enhancing algorithmic procedures, and PA = prescribed algorithm.
Table 4. Correlations between variables included in the pre-registered mediation model in study 1.
Measure 1. 2. 3.
1. Enhanced autonomya -
2. Locus of causality .25 -
3. Personal control .31 .78 -
4. Perceived competence .13 .67 .54
a
Enhanced autonomy was coded 1 for the two autonomy-enhancing conditions (holistic adjustment and self-designed algorithm) and
0 for the prescribed algorithm condition. Therefore, data from the holistic condition was disregarded, and hence, N = 449.
between the human decision maker and the algorithm, these AEAPs when they also themselves believe that stakeholders
effects were small. Thus, there is some positive news here: Our appreciate their use of AEAPs more than their use of
results suggest that decision makers should be much less wor a prescribed algorithm. This was investigated in Study 2.
ried about negative stakeholder evaluations when using AEAPs. Importantly, we also investigated whether AEAPs result in
However, decision makers may be only more likely to use higher predictive validity than holistic prediction.
252 M. NEUMANN ET AL.
Figure 4. Conceptual model in study 2. The hypothesized conceptual model. The dashed line indicates that we dropped TOTU in our exploratory analyses because it
showed correlations with other variables that made the hypothesized mediation model implausible. a AEAPs = autonomy-enhancing algorithmic procedures. AEAPs
was coded as 1 and prescribed algorithm as 0. b TOTU = threat of technological unemployment.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 253
predictors substantially differ in validity, accurate weighting interview, conducted by a line manager. All applicants were
becomes important as well, as overweighting invalid pre hired, except for those who obtained the lowest possible interview
dictors would decrease predictive validity. The prediction rating. Supervisors had rated the applicants’ overall performance
task in this study contained predictors with near-zero to approximately three months after they were hired. It was not
moderate validity. We should be able to assume that hiring feasible to let participants make performance predictions for all
professionals have some knowledge of predictor validities 236 applicants in this study. Therefore, we selected 40 applicants
and hence would choose predictor weights that are more such that the absolute differences in correlations between the
appropriate than random weights. We expected that holi correlation matrices of the reduced and the full dataset did not
stically adjusted predictions from a prescribed algorithm differ by more than .015 (see the R script “Applicant selection.R” on
and predictions from a self-designed algorithm would be OSF). Table 6 shows the descriptive statistics and correlations of
more valid than pure holistic predictions. the variables of the reduced applicant dataset.
Table 6. Descriptive statistics and Correlations between variables of the reduced applicant dataset.
Variable M SD 1. 2. 3.
1. Cognitive ability 0.68 0.14 -
2. Conscientiousness 3.92 0.42 .11 -
3. Interview 2.83 0.98 .11 .02 -
4. Job performance 3.15 0.40 .31 .23 .04
N = 40.
254 M. NEUMANN ET AL.
Condition Description
Holistic Please review the scores of the applicants and predict based on your intuition and expertise how the applicants will perform on the job.
Holistic adjustment In this approach, we show you, for each applicant, the performance prediction of a decision rule. An assessment professional designed this
decision-rule based on numerous empirical research findings. The decision rule looks like this:
Decision-rule prediction = General mental ability test score × 53 + Conscientiousness questionnaire score × 28 + Interview rating × 19.
As you can see above, the decision rule assigns the following weights to the information:
The chosen weights correspond to the importance assigned to each piece of the information. So, based on the numerous empirical research
findings, the assessment professional decided to give most weight to the general mental ability test score, and least weight to the interview
rating. Above you can see that the scores and ratings of an applicant were multiplied by weights (*) and then added up (+).
The higher the decision-rule prediction, the more likely it is that the applicant shows good job performance.
You can use the decision-rule prediction in different ways. You can review the scores of the applicants and consult the prediction of the
decision rule shown below and decide based on your intuition and expertise how you would like to combine the information to predict the
applicants’ job performance. If you only want to use the exact decision-rule prediction, you simply reproduce this prediction as your answer.
Although the decision rule will probably not result in perfect performance predictions, research showed that adjusting decision-rule
predictions based on one’s intuition and expertise usually decreases prediction accuracy.
Self-designed In this approach, we want you to design a decision rule. You will estimate how important each piece of information is for later job success and
algorithm create a rule that will be used to make performance predictions for all applicants. The higher the decision-rule prediction, the more likely it
is that the applicant shows good job performance.
Before you get descriptions of the applicants, you will decide how much weight you will assign to each piece of information: General Mental
Ability Test, Conscientiousness Questionnaire, and Interview Rating
The chosen weights correspond to the importance you assign to each piece of information. To illustrate this, if you think interview ratings are
most important in predicting job performance you should weigh it more heavily than the other information. Conversely, if you think the
general mental ability test or conscientiousness questionnaire will be most important in predicting later job success you should assign
more weight to them respectively.
On the basis of your designed decision rule an overall score for each applicant will be calculated. Below you can see that the scores and ratings
of an applicant are multiplied by weights (*) and then added up (+). The higher the overall score of the applicant the higher the chance of
job success.
The decision rule looks like this:
Decision-rule prediction = General mental ability test score * your chosen weight 1 + Conscientiousness questionnaire score * your chosen
weight 2 + Interview rating * your chosen weight 3
You will see the predictions made based on your decision rule. But once you designed the rule you cannot adjust its predictions.
Although your decision rule will probably not result in perfect performance predictions, research showed that using such a rule results in more
accurate performance predictions than using one’s intuition and expertise.
Prescribed In this approach, we show you, for each applicant, the performance prediction of a decision rule. An assessment professional designed this
algorithm decision rule based on numerous empirical research findings. The decision rule looks like this:
Decision-rule prediction = General mental ability test score × 53 + Conscientiousness questionnaire score × 28 + Interview rating × 19.
As you can see above, the decision rule assigns the following weights to the pieces of information:
The chosen weights correspond to the importance assigned to each piece of information. So, based on the numerous empirical research
findings, the assessment professional decided to give most weight to the general mental ability test score, and least weight to the interview
rating. Above you can see that the scores and ratings of an applicant were multiplied by weights (*) and then added up (+).
The higher the decision-rule prediction, the more likely it is that the applicant shows good job performance.
You will see the predictions made based on the decision rule. The applicants’ predicted job performance will be shown to you and you cannot
adjust this prediction based on your intuition and expertise. The decision rule will predict the performance of all 40 applicants. To
demonstrate how the rule operates, we will show you 10 performance predictions which you cannot adjust.
Although the decision rule will probably not result in perfect performance predictions, research showed that adjusting decision-rule
predictions based on one’s intuition and expertise usually decreases prediction accuracy.
predictions to a five-point scale (1 = very bad job perfor algorithm’s predictions, we presented participants with 10
mance, 5 = very good job performance) using min-max scaling randomly selected applicants, their predictor scores, and the
(see p. 114, formula 3.8 in Han et al., 2011). Participants rescaled self-designed algorithm’s predictions (1 = very bad
could holistically adjust the algorithm’s predictions if they job performance, 5 = very good job performance). Participants
wanted. could not holistically adjust these predictions.
In the self-designed algorithm condition, participants chose In the prescribed algorithm condition, participants had no
percentage weights, which were multiplied by the respective autonomy as they had to strictly use the algorithm’s predic
standardized predictor scores. The resulting scores were tions. The algorithm was constructed in the same way as in
summed to yield an applicant’s job performance prediction. the holistic adjustment condition. As in the self-designed
Participants were told that their self-designed algorithm algorithm condition, participants saw the algorithm’s predic
would be used to predict the performance of all 40 appli tions and the respective predictor scores for 10 applicants but
cants. To let participants experience their self-designed were told that the algorithm was used to predict the job
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 255
performance of all 40 applicants. In the holistic adjustment or the predictions that resulted from participants’ self-designed
and the prescribed algorithm condition, participants saw the algorithms (self-designed algorithm condition).
algorithm’s predictor weights as percentage weights before
they made (or saw) any predictions. So, in all conditions
where an algorithm was used (all except the holistic condi Results
tion), participants saw the algorithm’s predictor weights and Again, we treated locus of causality and personal control as
were told that the higher the algorithm’s prediction, the separate variables, based on the results of confirmatory factor
more likely it is that the applicant shows good job perfor analyses (see Supplement S2). Table 8 shows the correlations
mance, according to the algorithm. Importantly, in all condi between the measures. Figure 5 shows the observed means per
tions except the holistic condition, participants were warned condition and measure. Table 9 shows the exact observed
that although the algorithm’s predictions would not be per means and standard deviations. Furthermore, Figure 6 shows
fect, research showed that holistically adjusting the algo the posterior distributions with 95% HDIs for the relevant
rithm’s predictions usually decreases prediction accuracy. effects. Again, AEAPs resulted in very similar perceptions and
After the prediction task, participants imagined that the hence were combined in the following analyses.
airline used the same approach to make performance predic
tions and hired the applicants with the highest scores. They
also imagined being the manager who delivered the decision Autonomy (manipulation check)
to applicants. Then, participants filled in the dependent mea Decision makers experienced much more autonomy when
sures in random order. using AEAPs than when using a prescribed algorithm (d =
1.10, 95% HDI [0.75, 1.45]). This suggests that our manipulation
Measures was successful.
Attitudinal measures
All measures were assessed on a 5-point Likert scale (1 = strongly Locus of causality, personal control, and perceived
disagree, 5 = strongly agree, see OSF). We used the same measures competence
for perceived autonomy (manipulation check, α = .90), locus of Decision makers had higher meta-beliefs about their responsi
causality (α = .75), and personal control (α = .77) as in Study 1, bility for and control of the hiring decision when they made
but slightly adapted the wording to account for the decision- holistic predictions compared to when they used a prescribed
maker’s perspective. Furthermore, we used Nolan’s et al. (2016) algorithm (d = 0.43, 95% HDI [0.03, 0.83] and d = 0.92, 95% HDI
five-item TOTU scale (α = .93) and an adapted version of Nolan’s [0.51, 1.33], respectively). They also had much higher meta-
(2013) six-item scale to measure competence perceptions (α beliefs about their perceived competence when they made
= .89). Lastly, we used an adapted three-item scale from Nolan holistic predictions compared to when they used a prescribed
and Highhouse (2014) to measure participants’ intention to use algorithm (d = 0.94, 95% HDI [0.54, 1.35]). Therefore, hypoth
hiring approaches. The internal consistency of this scale was quite eses 5a, 6a, and 7a were fully supported.
low (α = .59) and could be improved (α = .85) by removing Furthermore, decision makers had higher meta-beliefs
the second, reverse-coded item, which read “If I could use about their responsibility for making a hiring decision when
a different approach to make hiring decisions, I would”. We report they used AEAPs rather than a prescribed algorithm (d = 0.38,
results based on the original three-item scale and the scale after 95% HDI [0.05, 0.72]). Similarly, they had much higher meta-
removing the second item. Removing the second item did not beliefs about their control of the hiring decision and their
qualitatively change the study’s conclusions. perceived competence when they used AEAPs rather than
a prescribed algorithm (d = 0.96, 95% HDI [0.62, 1.31] and d =
Predictive validity 0.77, 95% HDI [0.43, 1.11], respectively). So, hypotheses 5b, 6b,
For each participant, we calculated the Fisher’s z transformed and 7b were also fully supported.
correlation between their 40 performance predictions and the Decision makers also had very similar meta-beliefs about
applicants’ observed performance. So, the performance predic their responsibility for and control of the hiring decision when
tions were participants’ holistic predictions (holistic condition), making holistic predictions and when using AEAPs (d = 0.05,
holistically adjusted predictions (holistic adjustment condition), 95% HDI [−0.25, 0.36] and d = −0.04, 95% HDI [−0.35, 0.27],
Measure 1. 2. 3. 4. 5. 6.
1. Autonomya - .50 .65 .70 −.04 .61
2. Locus of causality .50 - .61 .52 −.04 .39
3. Personal control .65 .61 - .64 .06 .46
4. Perceived competence .70 .52 .64 - −.22 .58
5. TOTUb −.04 −.04 .06 −.22 - −.01
6. Use intentions .45 .29 .31 .46 −.23 -
Correlations below the diagonal are based on the original three-item use intentions scale. Correlations above the diagonal are based on the use intentions scale with
the second item removed. Correlations were calculated across conditions. N = 269. aManipulation check. bThreat of technological unemployment.
256 M. NEUMANN ET AL.
Figure 5. Observed means and raw data per condition and measure in study 2. Error bars represent 95% confidence intervals. Some jittering in the x-axis direction was
added to the plot to improve readability.
respectively), although small effects in both directions were likely to use holistic prediction than a prescribed algorithm (d
also plausible. We found similar results for perceived compe = 0.38, 95% HDI [−0.02, 0.78]), while the difference in intentions
tence (d = 0.16, 95% HDI [−0.15, 0.47]). to use holistic prediction and AEAPs was negligible (d = 0.10,
95% HDI [−0.20, 0.41]), although holistic prediction was slightly
favoured. Removing the second item of the use intentions scale
Mediation resulted in slightly larger effect sizes. Decision makers were
We hypothesized that the effect of enhanced autonomy on use more likely to use holistic prediction (d = 0.41, 95% HDI [0.01,
intentions would be mediated by locus of causality and perso 0.82]) and AEAPs (d = 0.44, 95% HDI [0.10, 0.77]), compared to
nal control (in parallel), and competence perceptions and TOTU a prescribed algorithm. The difference between holistic predic
(serially). The correlations between the variables of the pre- tion and AEAPs was near zero (d = −0.02, 95% HDI [−0.33, 0.29]).
registered mediation model are shown in Table 10. The correla Given that a mediation model including TOTU was implau
tions between perceived competence and TOTU, and TOTU and sible, we did not formally investigate hypothesis 8, but instead
use intentions were small (r = −.21 and r = −.19). Therefore, the explored a mediation model without TOTU. Specifically, we
pre-registered mediation model was not plausible. estimated the indirect effect of enhanced autonomy on use
Furthermore, we did not find enough evidence that decision intentions via locus of causality and personal control (in paral
makers perceived less TOTU when using AEAPs rather than lel), and competence perceptions. The results are shown in
a prescribed algorithm (d = 0.14, 95% HDI [−0.19, 0.48]). The Table 11. We found some evidence that enhanced autonomy
observed effect was in the opposite direction than was was related to use intentions via personal control and compe
expected. However, decision makers were somewhat more tence perceptions (partially standardized effect size = 0.18).
likely to use AEAPs than a prescribed algorithm (d = 0.28, 95% Furthermore, zero was a credible value for the direct effect of
HDI [−0.06, 0.61]). Additionally, decision makers were more enhanced autonomy on use intentions. Yet, we did not find
Table 9. Observed means and standard deviations per condition and measure in study 2.
Autonomya Locus of causality Personal control Perceived competence TOTUb Use intentions Use intentions*
Condition M SD M SD M SD M SD M SD M SD M SD
Holistic 4.23 0.52 4.22 0.56 4.07 0.68 4.39 0.54 3.02 1.34 3.48 0.81 4.02 0.87
Holistic adjustment 4.15 0.69 4.15 0.55 4.11 0.75 4.27 0.60 3.31 1.12 3.42 0.61 4.08 0.76
Self-designed algorithm 4.05 0.68 4.22 0.64 4.10 0.67 4.30 0.49 3.00 1.16 3.38 0.73 4.00 0.92
Prescribed algorithm 3.25 1.24 3.93 1.04 3.33 1.20 3.77 1.08 2.99 1.11 3.19 0.89 3.65 1.08
a
Manipulation check. bThreat of technological unemployment. * Means and standard deviations of the use intentions scale with the second item removed.
EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 257
Figure 6. Posterior distributions for the hypothesized and exploratory effects in study 2. The grey area under each curve depicts the 95% most probable effect sizes,
given the model and the data. H = holistic prediction, AEAPs = autonomy-enhancing algorithmic procedures, and PA = prescribed algorithm.
Table 10. Correlations between variables included in the pre-registered mediation model in study 2.
Measure 1. 2. 3. 4. 5. 6.
1. Enhanced autonomya - .14 .35 .29 .06 .17
2. Locus of causality .14 - .59 .50 −.02 .36
3. Personal control .35 .59 - .64 .08 .45
4. Perceived competence .29 .50 .64 - −.21 .56
5. TOTUb .06 −.02 .08 −.21 - −.01
6. Use intentions .12 .25 .28 .43 −.19 -
Correlations below the diagonal are based on the original three-item use intentions scale. Correlations above the diagonal are based on the use intentions scale with
the second item removed. aEnhanced autonomy was coded 1 for the two autonomy-enhancing conditions (holistic adjustment and self-designed algorithm
condition) and 0 for the prescribed algorithm condition. Therefore, data from the holistic condition was disregarded, and hence, N = 215. bThreat of technological
unemployment.
much evidence that enhanced autonomy affected use inten from the prescribed algorithm were most valid (r = .36). In
tions via locus of causality and competence perceptions (par support of hypothesis 9, AEAPs resulted in much higher pre
tially standardized effect size = 0.03). dictive validity (�r = .28 in the holistic adjustment condition and
�r = .31 in the self-designed algorithm condition, respectively)
than holistic predictions (d = 1.10, 95% HDI [0.77, 1.42]).
Predictive validity
Importantly, these results suggest that AEAPs resulted in only
Figure 7 shows the mean validity coefficient per condition. somewhat less valid predictions than the prescribed algorithm.
Holistic predictions were least valid (�r = .16), while predictions Interestingly, the variability in decision-makers’ validity was
258 M. NEUMANN ET AL.
Figure 7. Mean predictive validity per condition. Each raw data point represents the correlation between a participant’s predicted and observed performance (N = 40).
No raw data is plotted in the prescribed algorithm condition because participants could not influence the algorithm’s prediction in any way. Therefore, no variance was
observed in this condition by design. Error bars represent 95% confidence intervals. Some jittering in the x-axis direction was added to the plot to improve readability.
highest in the holistic condition and lowest in the self-designed algorithm condition than in the holistic condition, which
algorithm condition. In other words, the less decision makers likely explains the markedly higher predictive validity.
were directly involved in combining information about indivi
dual applicants, the less it mattered who combined the infor
Discussion
mation. Moreover, just using the cognitive ability test in a top-
down manner would have resulted in higher predictive validity In line with our hypotheses and earlier findings (Nolan
than the validity that was observed in the holistic and holistic et al., 2016), decision makers believed stakeholders would
perceive them as more competent, more responsible for,
adjustment condition (cf. Table 6 and Figure 7).
and more in control of the hiring decision when making
holistic predictions rather than using a prescribed algo
rithm. Extending earlier findings, we also found that deci
Judgement consistency and relative weights analysis sion-makers’ meta-beliefs were much higher when they
We also explored how, and how consistently decision used AEAPs rather than a prescribed algorithm. Relatedly,
makers weighted the information, by regressing the partici decision-makers’ meta-beliefs were very similar when mak
pant’s (or the algorithm’s) predictions on the three predic ing holistic predictions compared to using AEAPs.
tors. The resulting multiple R reflects how consistent Importantly, we also found that using AEAPs substantially
decision makers apply their own judgement policy and is increased predictive validity compared to holistic
also called response consistency or cognitive control prediction.
(Hammond & Summers, 1972). We also calculated relative
weights (Grömping, 2006), which reflect how important
a predictor was in forming the participant’s (or algorithm’s) General Discussion
prediction. Table 12 shows participants’ mean judgement One reason why decision makers rarely use algorithms in
consistency and mean relative weights for each of the three hiring decisions is that they worry about negative stake
predictors per condition. Providing decision makers with holder evaluations (Nolan et al., 2016). In line with existing
a valid anchor (the algorithm’s prediction) increased judge research (Gonzalez et al., 2022; Kares et al., 2023), we
ment consistency compared to the holistic condition, and expected and found that the use of AEAPs improved stake
resulted in relative weights that were similar to relative holder evaluations and decision-makers’ meta-beliefs, com
weights obtained from the meta-analytically derived algo pared to the use of a prescribed algorithm. Furthermore,
rithm. Furthermore, despite similar relative weights, judge both stakeholder evaluations and decision-makers’ meta-
ment consistency was much higher in the self-designed beliefs were similar for holistic prediction and AEAPs.
Table 12. Mean judgement consistency and mean relative predictor weights per condition.
Extending earlier findings, we also showed that stakeholder mixed findings (Balzer et al., 1989; Neumann et al., 2022).
and decision-makers’ perceptions were relatively indifferent Relatedly, Kahneman et al. (2021) recently suggested to delay
to how autonomy was enhanced (algorithm design vs. con the time between collecting information (e.g., interview
trol over a prescribed algorithm’s predictions). Lastly, we impressions) and making hiring decisions, and to present infor
showed that AEAPs substantially improved predictive validity mation sequentially to decision makers. We encourage
compared to holistic prediction. Yet, neither holistic predic researchers to test whether these interventions can improve
tion nor AEAPs resulted in higher predictive validity than the decision making in a similar manner as enhancing autonomy in
meta-analytically derived algorithm. In summary, our results algorithmic decision making.
on predictive validity are largely in line with findings by We presented decision makers with quantitative test scores
Dietvorst et al. (2018) and Neumann et al. (2022), and gen and a quantified rating from an interview conducted by some
erally suggest that enhancing autonomy in algorithmic deci one else. If decision makers had conducted interviews them
sion making effectively encourages algorithm use and selves, they may have overweighted such qualitative and “rich”
improves validity compared to holistic prediction. information compared to a quantified interview rating from
Intriguingly, our results also showed that strict top-down someone else (Kuncel, 2018; Niessen et al., 2022). Therefore,
use of the single best predictor would have resulted in more future research could vary decision-makers’ involvement in the
valid predictions (r = .31) than holistic prediction (r = .16). This data collection and use applicant data that contains qualitative
demonstrates again that “less can be more”; invalid information and quantitative information. Furthermore, it should be inves
can distract decision makers from valid information and tigated whether decision makers are less likely to holistically
thereby reduce predictive validity, which is also known as the adjust predictions from a self-designed rather than a prescribed
“dilution effect” (Niessen et al., 2022). Relatedly, conscientious algorithm. Decision makers may trust their own algorithm more
ness and the unstructured interview had only slight incremen than an algorithm presented by others (Kuncel, 2018).
tal validity over and above cognitive ability when this Relatedly, characteristics such as an algorithm’s transparency
information was combined through our meta-analytic, pre and complexity, and decision-makers’ understanding of an
scribed algorithm, but not when combined holistically or algorithm’s design could affect algorithm use (Langer &
through a self-designed algorithm. This finding underscores Landers, 2021). Although algorithms with equal and even ran
that results from studies on combinations of predictors, in dom but consistent predictor weights can outperform holistic
which predictors are typically combined using optimal regres predictions when predictors are similar in validity (Yu & Kuncel,
sion (e.g., Schmidt & Hunter, 1998), may not generalize to 2020), it is plausible that decision makers would holistically
practice (Murphy, 2019; Sackett et al., 2017), where optimal adjust predictions from such simple and unfancy algorithms
regression or other, suboptimal mechanical combination meth more, compared to more complex algorithms that may, for
ods are rarely used to make hiring decisions (Neumann et al., example, take predictor interactions and non-linearities into
2023; Ryan & Sackett, 1987). Hence, we should think about account. Moreover, future studies may present information on
personnel selection as judgement and decision science the same scale. We deliberately presented information on dif
(Meijer & Niessen, 2022), and investigate not only the validity ferent scales because this is representative of practice. Yet, this
of (optimal combinations of) test scores, as is now often the may have made consistent information combination in the
case in research, but also the validity of our decisions. This holistic and holistic adjustment condition more difficult.
changing attitude would greatly increase the usefulness of
personnel selection research for practice.
Theoretical and practical implications
In line with the discounting principle from attribution theory
Limitations and directions for future research
and earlier research (Nolan et al., 2016), stakeholders ascribed
We conducted experiments to test the effect of the hiring more credit to decision makers for holistic compared to
approach on stakeholder perceptions, decision-makers’ meta- algorithmic hiring decisions. Importantly, decision makers
beliefs, and predictive validity. Ideally, our findings would be who used AEAPs were still largely seen as responsible for
replicated in the field, although this is very challenging, and in control of the hiring decision by stakeholders, which
because very few organizations contain the required number was related to higher competence perceptions. We also
of assessors for these studies. Furthermore, we relied on self- found support for the hypothesis that decision-makers’ meta-
reported intentions to use a hiring approach for future deci beliefs relate to their intentions to use algorithmic decision-
sions. Ideally, we would also observe actual behaviour (i.e., making procedures, as proposed by the theory of planned
implementation of AEAPs). Lastly, we note that our cross- behaviour. Decision-makers’ concerns about negative stake
sectional design limits the conclusions that can be drawn holder perceptions were largely alleviated when they
from our mediation analyses (Spector, 2019). retained autonomy in algorithmic decision making.
As our results showed, retaining decision-makers’ autonomy Moreover, our results advance our understanding of whether
in algorithmic decision making can be a promising way to stakeholders and decision makers are sensitive to different
improve decision making. However, interventions that primar autonomy-enhancing features (Langer & Landers, 2021).
ily focus on improving the validity of pure holistic predictions Design control (by choosing an algorithm’s predictor
may also be fruitful. Such interventions may include providing weights) resulted in very similar perceptions by both stake
decision makers with knowledge on the validity of predictors, holders and decision makers as outcome control (adjusting
although research on this intervention produced somewhat a prescribed algorithm’s predictions holistically). We
260 M. NEUMANN ET AL.
Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques: education and incentives to improve decision making. Journal of
Vol. 3rd ed. Morgan Kaufmann. Experimental Psychology: Applied, 28(1), 166–178. https://doi.org/10.
Highhouse, S., & Brooks, M. E. (2023). Improving workplace judgments by 1037/xap0000372
reducing noise: Lessons learned from a century of selection research. Neumann, M., Niessen, A. S. M., Hurks, P. M., & Meijer, R. R. (2023). Holistic
Annual Review of Organizational Psychology and Organizational Behavior, and mechanical combination in psychological assessment: Why algo
10(1), 519–533. https://doi.org/10.1146/annurev-orgpsych-120920- rithms are underutilized and what is needed to increase their use.
050708 International Journal of Selection and Assessment, 31(2), 267–285.
Hoffman, M., Kahn, L. B., & Li, D. (2017). Discretion in hiring. The Quarterly Journal https://doi.org/10.1111/ijsa.12416
of Economics, 133(2), 765–800. https://doi.org/10.1093/qje/qjx042 Neumann, M., Niessen, A. S. M., & Meijer, R. R. (2021). Implementing
Kahneman, D., & Klein, G. (2009). Conditions for intuitive expertise: A failure evidence-based assessment and selection in organizations: A review
to disagree. American Psychologist, 64(6), 515–526. https://doi.org/10. and an agenda for future research. Organizational Psychology Review,
1037/a0016755 11(3), 205–239. https://doi.org/10.1177/2041386620983419
Kahneman, D., Sibony, O., & Sunstein, C. R. (2021). Noise: A flaw in human Neumann, M., Niessen, A. S. M., Tendeiro, J. N., & Meijer, R. R. (2022). The
judgment. Little, Brown Spark. autonomy-validity dilemma in mechanical prediction procedures: The
Kares, F., König, C. J., Bergs, R., Protzel, C., & Langer, M. (2023). Trust in hybrid quest for a compromise. Journal of Behavioral Decision Making, 35(4),
human-automated decision-support. International Journal of Selection e2270. https://doi.org/10.1002/bdm.2270
and Assessment, 31(3), 388–402. https://doi.org/10.1111/ijsa.12423 Newman, D. T., Fast, N. J., & Harmon, D. J. (2020). When eliminating bias isn’t
Kausel, E. E., Culbertson, S. S., & Madrid, H. P. (2016). Overconfidence in fair: Algorithmic reductionism and procedural justice in human resource
personnel selection: When and why unstructured interview information decisions. Organizational Behavior and Human Decision Processes, 160,
can hurt hiring decisions. Organizational Behavior and Human Decision 149–167. https://doi.org/10.1016/j.obhdp.2020.03.008
Processes, 137, 27–44. https://doi.org/10.1016/j.obhdp.2016.07.005 Niessen, A. S. M., Kausel, E. E., & Neumann, M. (2022). Using narratives and
Kelley, H. H. (1973). The processes of causal attribution. American numbers in performance prediction: Attitudes, confidence, and validity.
Psychologist, 28(2), 107–128. https://doi.org/10.1037/h0034225 International Journal of Selection and Assessment, 30(2), 216–229. https://
Kruschke, J. K. (2015). Doing bayesian data analysis: A tutorial with R, JAGS, doi.org/10.1111/IJSA.12364
and Stan (2nd ed.). Academic Press. https://doi.org/10.1016/B978-0-12- Nolan, K. P. (2013). Basic psychological need fulfillment and user resistance to
405888-0.00008-8 objective and analytical decision-making practices in employee selection
Kuncel, N. R. (2018). Judgment and decision making in staffing research and (Doctoral dissertation).
practice. In D. S. Ones, N. Anderson, C. Viswesvaran, & H. K. Sinangil Nolan, K. P., Carter, N. T., & Dalal, D. K. (2016). Threat of technological
(Eds.), The sage handbook of industrial, work and organizational psychol unemployment: Are hiring managers discounted for using standardized
ogy (2nd ed., pp. 474–487). SAGE Publications Ltd. https://doi.org/10. employee selection practices? Personnel Assessment and Decisions, 2(1),
4135/9781473914940.n17 30–47. https://doi.org/10.25035/pad.2016.004
Kuncel, N. R., Klieger, D. M., Connelly, B. S., & Ones, D. S. (2013). Mechanical Nolan, K. P., Dalal, D. K., & Carter, N. (2020). Threat of technological unem
versus clinical data combination in selection and admissions decisions: A ployment, use intentions, and the promotion of structured interviews in
meta-analysis. Journal of Applied Psychology, 98(6), 1060–1072. https:// personnel selection. Personnel Assessment and Decisions, 6(2), 38–53.
doi.org/10.1037/a0034156 https://doi.org/10.25035/pad.2020.02.006
Langer, M., & Landers, R. N. (2021). The future of artificial intelligence at work: Nolan, K. P., & Highhouse, S. (2014). Need for autonomy and resistance to
A review on effects of decision automation and augmentation on workers standardized employee selection practices. Human Performance, 27(4),
targeted by algorithms and third-party observers. Computers in Human 328–346. https://doi.org/10.1080/08959285.2014.929691
Behavior, 123, 106878. https://doi.org/10.1016/J.CHB.2021.106878 Preacher, K. J., & Kelley, K. (2011). Effect size measures for mediation models:
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis Quantitative strategies for communicating indirect effects. Psychological
and a review of the evidence. University of Minnesota Press. https://doi. Methods, 16(2), 93–115. https://doi.org/10.1037/a0022658
org/10.1037/11281-000 Ryan, A. M., & Sackett, P. R. (1987). A survey of individual assessment
Meehl, P. E. (1986). Causes and effects of my disturbing little book. Journal practices by I/O psychologists. Personnel Psychology, 40(3), 455–488.
of Personality Assessment, 50(3), 370–375. https://doi.org/10.1207/ https://doi.org/10.1111/j.1744-6570.1987.tb00610.x
s15327752jpa5003_6 Sackett, P. R., Dahlke, J. A., Shewach, O. R., & Kuncel, N. R. (2017). Effects of
Meijer, R. R., Neumann, M., Hemker, B. T., & Niessen, A. S. M. (2020). A tutorial on predictor weighting methods on incremental validity. Journal of Applied
mechanical decision-making for personnel and educational selection. Psychology, 102(10), 1421–1434. https://doi.org/10.1037/apl0000235
Frontiers in Psychology, 10. Article 3002. https://doi.org/10.3389/fpsyg.2019. Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection
03002 methods in personnel psychology: Practical and theoretical implications
Meijer, R. R., & Niessen, A. S. M. (2022). Personnel selection as judgment and of 85 years of research findings. Psychological Bulletin, 124(2), 262–274.
decision science: An introduction. International Journal of Selection and https://doi.org/10.1037/0033-2909.124.2.262
Assessment, 30(2), 193–194. https://doi.org/10.1111/IJSA.12377 Spector, P. E. (2019). Do not cross me: Optimizing the use of cross-sectional
Meyer, J. P., & Koelbl, S. L. M. (1982). Students’ test performances: designs. Journal of Business and Psychology, 34(2), 125–137. https://doi.
Dimensionality of causal attributions. Personality and Social Psychology org/10.1007/s10869-018-09613-8
Bulletin, 8(1), 31–36. https://doi.org/10.1177/014616728281005 van der Zee, K. I., Bakker, A. B., & Bakker, P. (2002). Why are structured
Mikkelson, D. (2008, January 31). Requiring an Egg Made Instant Cake Mixes interviews so rarely used in personnel selection? Journal of Applied
Sell? Snopes. https://www.snopes.com/fact-check/something-eggstra/ Psychology, 87(1), 176–184. https://doi.org/10.1037/0021-9010.87.1.176
Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can decision Wesche, J. S., Hennig, F., Kollhed, C. S., Quade, J., Kluge, S., & Sonderegger, A.
making be improved? Perspectives on Psychological Science, 4(4), (2022). People’s reactions to decisions by human vs. algorithmic deci
379–383. https://doi.org/10.1111/j.1745-6924.2009.01142.x sion-makers: The role of explanations and type of selection tests.
Murphy, K. R. (2019). Understanding how and why adding valid predictors can European Journal of Work and Organizational Psychology, 1–12. https://
decrease the validity of selection composites: A generalization of Sackett, doi.org/10.1080/1359432X.2022.2132940
Dahlke, shewach, and Kuncel (2017). International Journal of Selection and Yu, M. C., & Kuncel, N. R. (2020). Pushing the limits for judgmental consis
Assessment, 27(3), 249–255. https://doi.org/10.1111/ijsa.12253 tency: Comparing random weighting schemes with expert judgments.
Murphy, K. R. (2021). In praise of table 1: The importance of making better Personnel Assessment and Decisions, 6(2), 1–10. https://doi.org/10.25035/
use of descriptive statistics. Industrial and Organizational Psychology: pad.2020.02.002
Perspectives on Science and Practice, 14(4), 461–477. https://doi.org/10. Yu, M. C., & Kuncel, N. R. (2022). Testing the value of expert insight:
1017/iop.2021.90 Comparing local versus general expert judgment models. International
Neumann, M., Hengeveld, M., Niessen, A. S. M., Tendeiro, J. N., & Meijer, R. R. Journal of Selection and Assessment, 30(2), 202–215. https://doi.org/10.
(2022). Education increases decision-rule use: An investigation of 1111/IJSA.12356
262 M. NEUMANN ET AL.