0% found this document useful (0 votes)
56 views

Bonezzi Ostinelli 2021-Can Algorithms Legitimize Discrimination

AI research paper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Bonezzi Ostinelli 2021-Can Algorithms Legitimize Discrimination

AI research paper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/350299764

Can algorithms legitimize discrimination?

Article in Journal of Experimental Psychology Applied · March 2021


DOI: 10.1037/xap0000294

CITATIONS READS

16 1,912

2 authors, including:

Andrea Bonezzi
New York University
20 PUBLICATIONS 1,570 CITATIONS

SEE PROFILE

All content following this page was uploaded by Andrea Bonezzi on 08 May 2021.

The user has requested enhancement of the downloaded file.


Journal of Experimental Psychology: Applied
Can Algorithms Legitimize Discrimination?
Andrea Bonezzi and Massimiliano Ostinelli
Online First Publication, March 22, 2021. http://dx.doi.org/10.1037/xap0000294

CITATION
Bonezzi, A., & Ostinelli, M. (2021, March 22). Can Algorithms Legitimize Discrimination?. Journal of Experimental
Psychology: Applied. Advance online publication. http://dx.doi.org/10.1037/xap0000294
Journal of Experimental Psychology: Applied
© 2021 American Psychological Association
ISSN: 1076-898X https://doi.org/10.1037/xap0000294

Can Algorithms Legitimize Discrimination?


Andrea Bonezzi1 and Massimiliano Ostinelli2
1
Stern School of Business, New York University
2
College of Business Administration, Winthrop University

Algorithms have been the subject of a heated debate regarding their potential to yield biased decisions. Prior
research has focused on documenting algorithmic bias and discussing its origins from a technical standpoint.
We look at algorithmic bias from a psychological perspective, raising a fundamental question that has
received little attention: are people more or less likely to perceive decisions that yield disparities as biased,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

when such decisions stem from algorithms as opposed to humans? We find that algorithmic decisions that
This document is copyrighted by the American Psychological Association or one of its allied publishers.

yield gender or racial disparities are less likely to be perceived as biased than human decisions. This occurs
because people believe that algorithms, unlike humans, decontextualize decision-making by neglecting
individual characteristics and blindly applying rules and procedures irrespective of whom they are judging.
In situations that entail the potential for discrimination, this belief leads people to think that algorithms are
more likely than humans to treat everyone equally, thus less likely to yield biased decisions. This
asymmetrical perception of bias, which occurs both in the general population and among members of
stigmatized groups, leads people to endorse stereotypical beliefs that fuel discrimination and reduces their
willingness to act against potentially discriminatory outcomes.

Public Significance Statement


This research suggests that replacing human with algorithmic decision-making might contribute to
legitimize discrimination. In situations that entail the potential for discrimination, algorithmic decisions
that yield disparities are less likely than human decisions to be perceived as biased. The presumed
objectivity of algorithms might foster stereotypical beliefs about stigmatized groups and make people
less likely to take action against disparities that could be discriminatory.

Keywords: algorithmic bias, discrimination, disparities, inequality, algorithm aversion

Supplemental materials: https://doi.org/10.1037/xap0000294.supp

Imagine that a white man and a black man are arrested for stealing Algorithms are increasingly being used to make decisions that
a car. After setting the defendants’ court dates, a judge needs to profoundly impact people’s lives, such as who gets incarcerated
decide whether they should be detained while awaiting trial. To (Dressel & Farid, 2018), hired (Bogen & Rieke, 2018), admitted to
make this decision, the judge assesses the risk that the two men will universities (Pangburn, 2019), provided access to healthcare (Bates
reoffend if released pre-trial. The defendants committed the same et al., 2014), and extended financial credit (Gomber et al., 2018).
crime and have similar criminal histories. Yet, the judge deems the The use of algorithms is predicated on the premise that statistical
black man at higher risk to reoffend than the white man, so decides models can be more accurate and efficient than human decision-
to detain the black man and release the white man. Most people makers (Dawes et al., 1989; Grove et al., 2000; Meehl, 1954). Yet,
would perceive such decision as biased. Now take a step back. recent evidence suggests that algorithms can also be biased, just like
Imagine that an algorithm, rather than a judge, had appraised the the human counterparts they are meant to replace (e.g., Eubanks,
black defendant at higher risk to reoffend than the white defendant, 2018; Noble, 2018; O’Neil, 2016; Schwemmer et al., 2020).
dooming the black man to be detained and the white man to be In its most general sense, algorithmic bias refers to systematic
released. Would people still perceive the decision as biased? error. An algorithm is considered biased if it incorrectly assigns
better outcomes to certain individuals or groups of individuals over
others, yielding systematic disparities that have no valid grounds
(Friedman & Nissenbaum, 1996). As such, the term algorithmic bias
can have different connotations. Algorithmic bias assumes a dis-
Andrea Bonezzi https://orcid.org/0000-0002-5624-8406 criminatory connotation when it yields systematic disparities asso-
Massimiliano Ostinelli https://orcid.org/0000-0002-9515-9131 ciated with variables that are protected by law, such as gender or
Data and materials for all studies are available on OSF: https://osf
race. To illustrate, an algorithm used to screen job applicants was
.io/276gm/
found to be biased because it consistently assigned higher employ-
Correspondence concerning this article should be addressed to Andrea
Bonezzi, Stern School of Business, New York University, 40 West 4th ability scores to men over women with comparable qualifications
Street, New York, NY 10012, United States. Email: abonezzi@stern (Dastin, 2018). Similarly, an algorithm used in the criminal justice
.nyu.edu system to assess defendants’ risk of recidivism was considered

1
2 BONEZZI AND OSTINELLI

biased because it consistently classified black defendants at higher they lack the cognitive flexibility necessary to tailor decision-
risk than they actually were, and white defendants at lower risk than making to each individual (Haslam, 2006; Loughnan & Haslam,
they actually were (Angwin et al., 2016). Analogous instances of 2007; Nissenbaum & Walker, 1998). In contrast, human decision-
algorithmic bias have been documented in domains such as health- makers possess cognitive flexibility and are therefore considered
care (Obermeyer et al., 2019), education (Schwartz, 2019), and better able to recognize and consider the unique characteristics of the
credit lending (Bartlett et al., 2019). target they are judging and, consciously or unconsciously, can tailor
Algorithmic bias, however, ought not necessarily imply discrim- decision-making to each individual.
ination. An algorithm can also be considered biased if it makes We further propose that the belief that algorithms decontextu-
systematic errors that yield disparities that are not associated with alize decision-making can sway the perception of bias in opposite
variables that are protected by law. For example, consider an directions, because decontextualization can be either detrimental
algorithm designed to screen job applicants that considers the or beneficial to the judgment at hand. On the one hand, unique
analytical skills of the applicants, but ignores their qualitative skills, individual characteristics can provide information that is relevant
although both are important for the job. Such an algorithm will and can improve judgment (e.g., individual motivations and
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

mental states); neglecting such information can undermine the


This document is copyrighted by the American Psychological Association or one of its allied publishers.

systematically favor candidates with stronger analytical skills over


those with stronger qualitative skills. In this case, the algorithm accuracy of a judgment, yielding decisions that are more biased.
would be considered biased because it introduces a systematic error On the other hand, individual characteristics can provide infor-
in the selection process. Yet, to the extent that such error is not mation that introduces irrelevant elements that can distort judg-
associated with variables that are protected by law, the bias does not ment (e.g., race and gender); neglecting such information can
assume a discriminatory connotation. improve the accuracy of a judgment, yielding decisions that are
Algorithmic bias is pervasive and hard to eradicate because it less biased. Based on this logic, we argue that the belief that
can stem from many sources (Hao, 2019). Bias might stem from the algorithms decontextualize decision-making can foster two dif-
data used to train the algorithm. When trained on historical data ferent inferences, which sway the perception of bias in opposite
that reflect pre-existing biases, algorithms might learn to make directions, depending on whether the situation entails the poten-
predictions based on associations that reflect such biases (Hajian tial for discrimination or not.
et al., 2016). For example, an algorithm that scans resumes to In situations that don’t raise concerns about discrimination, the
screen job applicants might mistakenly screen out female appli- belief that algorithms decontextualize decision-making might foster
cants if the data used to train the algorithm reflects biased decisions the perception that algorithmic decisions are more biased than
made by humans in the past that resulted in men being consistently human decisions. This is because the belief that algorithms are
unable to recognize the unique characteristics of an individual might
preferred over equally qualified women. In addition, bias can stem
lead people to think that algorithms are more likely than humans to
from the way computer scientists program algorithms. Program-
ignore information about the individual that can be relevant to the
mers can intentionally or unintentionally ingrain their own views
judgment at hand, and doing so can lead to systematic errors (Sloan &
and beliefs into an algorithm by deciding which variables the
Warner, 2018). For example, consider a situation that entails assessing
algorithm should consider versus ignore, or which objective func-
a white defendant’s risk of recidivism, a situation that is less likely to
tion an algorithm should optimize (Kleinberg et al., 2018a). Bias
raise concerns about discrimination, as the defendant does not belong
can also stem from improper use of algorithms, such as when an
to a stigmatized population. In this situation, people might perceive an
algorithm calibrated on data that is representative of a specific and
algorithmic assessment to be more biased than a judge’s assessment
homogeneous population is deployed to make predictions about
because they might think that the algorithm does not consider, to the
a wider and more heterogeneous set of audiences (Danks &
same extent that a judge would, information about the defendant that is
London, 2017).
relevant to make an accurate assessment (e.g., mitigating circum-
Prior research on algorithmic bias has, for the most part, assumed a
stances), therefore yielding systematic errors. This notion is consis-
technical connotation, in that it has focused on documenting and
tent with prior literature on algorithm aversion (e.g., Dietvorst et al.,
defining algorithmic bias from a statistical standpoint (e.g., Dieterich
2015), according to which people often think that humans can make
et al., 2016; Kleinberg et al., 2016; Larson et al., 2016). The present
more accurate decisions than algorithms.
research aims to contribute a psychological perspective to the
In situations that entail the potential for discrimination, in con-
ongoing debate on algorithmic bias, raising a fundamental ques-
trast, we propose that the belief that algorithms decontextualize
tion that has thus far received little attention: are people more or
decision-making might instead foster the perception that algorithmic
less likely to perceive decisions that yield disparities as biased,
decisions are less biased than human decisions. This is because the
when such decisions stem from algorithms rather than humans?
belief that algorithms are unable to recognize the unique character-
istics of an individual might lead people to think that algorithms are
Conceptual Development more likely than humans to ignore information about the individual
that could be grounds for discrimination, such as demographic
We propose that the answer to this question hinges on the information that denotes social group membership. As a conse-
fundamental belief that algorithms decontextualize decision-making quence, people might think that algorithms are more likely than
because they neglect the unique characteristics of the individual humans to treat everyone equally, avoiding systematic errors. For
being judged (Longoni et al., 2019; Newman et al., 2020; Sloan & example, consider a situation that entails assessing a black defen-
Warner, 2018). People think of algorithms as reductionist tools that dant’s risk of recidivism, a situation more likely to raise concerns
standardize decision-making by blindly applying predetermined about discrimination, as the defendant belongs to a stigmatized
rules and procedures irrespective of whom they are judging, because population. In this situation, people might perceive an algorithmic
CAN ALGORITHMS LEGITIMIZE DISCRIMINATION? 3

assessment to be less biased than a judge’s assessment because they belief that algorithms, unlike humans, ignore the unique character-
might think that the algorithm, unlike the judge, does not consider istics of the individual being judged fosters different inferences,
information about the defendant that might be grounds for discrimi- which sway the perception of bias in opposite directions, depend-
nation, such as the defendant’s race. ing on whether the situation entails the potential for discrimination
In sum, we propose that the perception of bias for human as or not. Study 3 aims to provide convergent evidence by testing
compared to algorithmic decisions is driven by the fundamental whether, in a situation that entails the potential for discrimination,
belief that algorithms, unlike humans, decontextualize decision- people think that algorithms are more likely than humans to ignore
making because they are unable to recognize the unique character- individual characteristics that could be grounds for discrimination.
istics of the individual being judged (Longoni et al., 2019; Newman Studies 4, 5, and 6 explore societally relevant consequences. In
et. al, 2020). We further argue that this fundamental belief can foster particular, Study 4 tests the hypothesis that perceiving algorithmic
two different inferences, depending on whether the situation entails decisions less biased than human decisions can make people more
the potential for discrimination or not. In the absence of potential for likely to erroneously think that disparities stemming from algo-
discrimination (e.g., a white defendant), the belief that algorithms rithms are a reflection of actual differences in dispositions and
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

ignore the unique characteristics of the individual might lead abilities. Studies 5a–b examine responses to algorithmic decisions
This document is copyrighted by the American Psychological Association or one of its allied publishers.

people to infer that algorithms are more likely than humans to by those who are most likely to be negatively impacted by
miss information that can be relevant to the judgment at hand, and algorithmic bias, namely, people who belong to stigmatized groups
doing so might lead to systematic errors (Sloan & Warner, 2018). that are the target of discrimination. In particular, we test the
As a consequence, algorithmic decisions could be perceived more hypothesis that members of stigmatized groups might prefer
biased than human decisions. In contrast, in the presence of algorithmic over human evaluations in situations that entail the
potential for discrimination (e.g., a black defendant), the belief potential for discrimination. Finally, Study 6 explores people’s
that algorithms ignore the unique characteristics of the individual propensity to take actions against disparities generated by algo-
might lead people to infer that algorithms are more likely than rithmic as opposed to human decisions. In particular, we test the
humans to treat everyone equally because they are more likely than hypothesis that people might be less likely to support actions
humans to neglect information that might be grounds for discrimi- aimed to remove disparities when decisions stem from algorithms
nation, and doing so avoids systematic errors. As a consequence, rather than humans.
algorithmic decisions should be perceived less biased than human With respect to data practices, we report all conditions, manipula-
decisions. tions, measures, and data exclusions. Unless reported, no participant
Our key hypothesis that, in situations that entail the potential for was excluded. An attention check was included at the beginning of
discrimination, algorithmic decisions that yield disparities are less each Study. Respondents who failed the attention check did not
likely than human decisions to be perceived as biased leads to three qualify for the Study. In all studies, the sample size was predeter-
predictions about potential consequences of key societal relevance. mined, and we analyzed the data only after all responses were
First, perceiving that algorithmic decisions that yield disparities collected. A sensitivity power analysis (Faul et al., 2009) indicated
are less biased than human decisions can make people more likely that the studies had the power to detect effects of size considered to be
to erroneously think that disparities stemming from algorithmic of practical relevance (Ferguson, 2009), with a significance level α of
decisions are an accurate reflection of actual differences in dis- .05 and a power (1–β) of .80 (Studies 1s: d = .46; Study 2 and Study
positions and abilities, potentially reinforcing stereotypes that fuel 3: ηp2 = .02; Study 4: d = .46; Study 5a: d = .42; Study 5b: w = .20;
discrimination. Second, perceiving that algorithmic decisions that Study 6: d = .46). Experimental stimuli are provided in the
yield disparities are less biased than human decisions can mislead Supplementary Materials.
members of stigmatized groups into preferring algorithmic over
human evaluations, expecting that they will receive a more just
treatment when decisions stem from algorithms rather than hu- Studies 1a–c
mans. Third, perceiving that algorithmic decisions that yield We open our empirical investigation by examining how people
disparities are less biased than human decisions might thwart perceive algorithmic versus human decisions that yield disparities
people’s willingness to support actions against such disparities that entail the potential for discrimination. We examine decisions
when they stem from algorithms rather than humans. In the that yield gender and racial disparities across three domains where
remainder of the article we report nine studies that systematically algorithms are increasingly being used to replace human decision-
test how people perceive algorithmic versus human decisions that making, yet have been shown to have the potential to perpetrate bias:
yield disparities that entail the potential for discrimination, as well education, hiring, and criminal justice. Specifically, the first three
as the proposed psychological mechanism and downstream studies test our basic hypothesis that algorithmic decisions that yield
consequences. gender or racial disparities are less likely than human decisions to be
perceived as biased.
Overview of the Studies
Studies 1a–c examine how people perceive algorithmic versus Study 1a: University Admission Decisions
human decisions that yield disparities that entail the potential for
Method
discrimination. In particular, we test whether algorithmic decisions
that yield gender or racial disparities are less likely than human Study 1a had a one-factor, between-subjects design: algorithm
decisions to be perceived as biased. Studies 2 and 3 test the proposed versus human decision-maker. We aimed to recruit 150 respondents.
underlying mechanism. Study 2 tests the idea that the fundamental One hundred fifty respondents (59% females; age: M = 39,
4 BONEZZI AND OSTINELLI

SD = 12) recruited on Mturk read about a university where admis- Study 1c: Parole Decisions
sion decisions were based on an analysis of applicants conducted
either by an admission team or by an artificial intelligence (AI)
Method
software. To ensure that respondents understood the information Study 1c had a one-factor, between-subjects design: algorithm
presented, they were asked to indicate who/what conducted the versus human decision-maker. We aimed to recruit 150 respondents.
analysis of the applicants, before moving on to the next screen. One hundred forty-nine respondents (59% females; age: M = 39,
All but two respondents answered this question correctly. SD = 12) recruited on Mturk read that parole decisions often rely on
Removing these observations does not change the conclusion an assessment of the risk that a defendant will reoffend, made by a
of the analysis reported below. Respondents were then presented judge/AI software. They further read that risk assessments are
with information about the university’s acceptance rates for expressed with a numerical score ranging from 1 to 10, where
different ethnic groups, which showed a clear racial disparity, lower scores indicate lower risk. To ensure that respondents under-
in that white candidates were three times more likely to be stood the information presented, they were asked to indicate who/
accepted than black and hispanic candidates (acceptance rate: what performs the risk assessments, before moving on to the next
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

47% white applicants, 15% black applicants, 15% hispanic screen. Eight respondents did not answer this question correctly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

applicants). Respondents then rated the likelihood that the deci- Removing these observations does not change the conclusion of the
sions made by the admission team/AI software were biased analysis reported below. Respondents were then presented with the
(1 = very unlikely 7 = very likely). average risk assessment score for two ethnic groups, which showed
a clear racial disparity: the average risk score for black men
Results defendants (8 out of 10) was 60% higher than the average risk
score for white men defendants (5 out of 10). Respondents then rated
As predicted, respondents perceived admission decisions that the likelihood that the risk assessments made by the judge/AI
yielded a racial disparity less biased were they were made by an software were biased (1 = very unlikely; 7 = very likely).
algorithm (M = 3.97, SD = 2.06) than by a human (M = 4.82,
SD = 1.49), t(148) = 2.88, p < .01, d = .47.
Results
Study 1b: Hiring Decisions As predicted, respondents perceived judicial risk assessments that
yielded a racial disparity less biased when they were made by an
Method
algorithm (M = 4.14, SD = 1.92) rather than by a human
Study 1b had a one-factor, between-subjects design: algorithm (M = 4.95, SD = 1.47), t(147) = 2.85, p < .01, d = .47.
versus human decision-maker. We aimed to recruit 150 respondents.
One hundred fifty-one respondents (51% females; age: M = 38, Discussion
SD = 11) recruited on Mturk read that a local company had recently
posted a job announcement for four store manager positions at four Overall, Studies 1a–c show that decisions that yield racial and
local restaurants. Respondents further read that the hiring decision gender disparities are less likely to be perceived as biased when they
was based on the results of an analysis of the candidates conducted stem from algorithms rather than humans. This result was robust
by a recruiter/AI software. In both cases, it was specified that, to across three domains where algorithms are increasingly used to
predict the success of a candidate as a store manager, the recruiter/AI replace human decision-makers and algorithmic bias has been
software took into consideration background information about the documented: education, hiring, and criminal justice. Our findings
candidates, along with answers to attitudinal questions. To ensure provide preliminary evidence that, in situations that entail the
that respondents understood the information presented, they were potential for discrimination, people might not perceive algorithmic
asked to indicate who/what conducted the analysis of the candidates, bias to the same extent they perceive human bias. In the next studies,
before moving on to the next screen. All but one participant we explore the psychological mechanism that drives such differen-
answered this question correctly. Removing the observations tial perception of bias (Studies 2 and 3), as well as downstream
from this participant does not change the conclusion of the analysis consequences of societal relevance (Studies 4–6).
reported below. Respondents then read that 100 people applied for
the four positions and that the demographics of the applicants were Study 2
as follows: 45 white men, 23 white females, 20 black men, and 12
black females. Respondents then learned that four white males were We proposed that differences in perception of bias for human
hired, an outcome indicating both racial and gender disparities. versus algorithmic decisions are driven by the belief that algorithms,
Respondents then rated the likelihood that the decision made by the unlike humans, decontextualize decision-making because they are
recruiter/AI software was biased (1 = very unlikely; 7 = very unable to recognize the unique characteristics of the individual being
likely). judged and blindly apply predetermined rules and procedures in a
rigid way, irrespective of whom they are judging (Haslam, 2006;
Loughnan & Haslam, 2007). We further argued that this fundamen-
Results
tal belief can foster two inferences, which sway the perception of
As predicted, respondents perceived hiring decisions that yielded bias in opposite directions, depending on whether the situation
a racial and gender disparity less biased when they were made by an entails the potential for discrimination or not.
algorithm (M = 4.18, SD = 1.96) rather than by a human When the situation entails the potential for discrimination
(M = 4.89, SD = 1.85), t(149) = 2.29, p = .02, d = .37. (e.g., a black defendant), the belief that algorithms ignore the
CAN ALGORITHMS LEGITIMIZE DISCRIMINATION? 5

unique characteristics of the individual being judged might lead A 2 × 2 ANOVA on blindness to individual characteristics revealed
people to infer that algorithms are more likely than humans to treat a significant main effect of decision-maker [F(1, 395) = 46.28,
everyone equally because they ignore information about the p < .001, ηp2 = .11], a significant main effect of race
individual that might be grounds for discrimination. As a conse- [F(1, 395) = 5.73, p = .02, ηp2 = .01], and a non-significant race ×
quence, algorithmic decisions should be perceived less biased than Decision-maker interaction [F(1, 395) < 1]. As predicted, respondents
human decisions. perceived an algorithm to be more blind to the individual characteristics
In contrast, when the situation does not entail the potential for of the defendant than a judge, both when the defendant was black
discrimination (e.g., a white defendant), the belief that algorithms (Malgorithm = 4.63, SDalgorithm = 1.80; Mjudge = 3.66, SDjudge = 1.39,
ignore the unique characteristics of the individual being judged F(1, 395) = 19.84, p < .001, ηp2 = .05), and when the defendant was
might lead people to infer that algorithms are more likely than white (Malgorithm = 4.34, SDalgorithm = 1.51; Mjudge = 3.21, SDjudge =
humans to miss information about the defendant that can be relevant 1.43, F(1, 395) = 26.69, p < .001, ηp2 = .06).
to the judgment at hand, and doing so might be detrimental to the A 2 × 2 ANOVA on missing relevant information revealed a
accuracy of an assessment (Sloan & Warner, 2018). As a conse- significant main effect of decision-maker [F(1, 395) = 26.29,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

quence, algorithmic decisions should be perceived more biased than p < .001, ηp2 = .06], a non-significant main effect of race [F(1,
This document is copyrighted by the American Psychological Association or one of its allied publishers.

human decisions. 395) = 2.24, p = .14, ηp2 = .01], and a Significant race × Deci-
sion-maker interaction [F(1, 395) = 13.19, p < .001, ηp2 = .03].
As predicted, when the defendant was white, respondents perceived
Method
that the algorithm (M = 4.25, SD = 1.45) missed information that
Study 2 had a 2(decision-maker: human vs. algorithm) × 2(defen- was relevant to make an accurate assessment to a greater extent than
dant race: white vs. black) between-subjects design. We aimed to the judge did (M = 3.02, SD = 1.20, F(1, 395) = 38.26, p < .001,
recruit 400 respondents. Three hundred ninety-nine respondents ηp2 = .09), whereas, when the defendant was black, there was no
(58% females; age: M = 40, SD = 13) recruited on Mturk were difference between algorithm (M = 3.95, SD = 1.55) and judge
presented with a news article about a man being sentenced to 5 years (M = 3.74, SD = 1.38, F(1, 395) = 1.12, p = .29, ηp2 = .00).
in prison for stealing a car, based on a risk assessment that deemed A 2 × 2 ANOVA on treating everyone equally revealed a
him at a high risk of recidivism. Between subjects, we manipulated significant main effect of race [F(1, 395) = 14.68, p < .001,
the potential for discrimination by changing the race of the defendant ηp2 = .04], a non-significant main effect of decision-maker
(white vs. black). Moreover, we manipulated whether the risk [F(1, 395) = 2.00, p = .16, ηp2 = .01], and a Significant race ×
assessment was conducted by a judge or by an algorithm. Respon- Decision-maker interaction [F(1, 395) = 9.08, p < .01,
dents rated the likelihood that the risk assessment was biased (bias: ηp2 = .02]. As predicted, when the defendant was black, respon-
1 = very unlikely; 7 = very likely); based on our proposed theoreti- dents perceived that the defendant was treated like any other
cal account, we expected to observe a Race × Decision-maker defendant more by an algorithm (M = 5.11, SD = 1.71) than by
interaction on this variable. Then, they rated the extent to which a judge (M = 4.46, SD = 1.59, F(1, 395) = 9.83, p < .01,
they thought that the judge/algorithm was blind to the unique ηp2 = .02), whereas, when the defendant was white, there was no
characteristics of the defendant (blindness to individual character- difference between algorithm (M = 5.23, SD = 1.25) and judge
istics: 1 = strongly disagree; 7 = strongly agree), the judge/algo- (M = 5.46, SD = 1.25, F(1, 395) = 1.27, p = .26, ηp2 = .00).
rithm missed information that was relevant to make an accurate To test our proposed mechanism, we conducted a moderated
assessment (missing relevant information: 1 = strongly disagree; serial mediation analysis using the custom model depicted in
7 = strongly agree), and the judge/algorithm treated the defendant Figure 1, with 5,000 bootstraps (Hayes, 2018). Decision-maker
like any other defendant (treating everyone equally: 1 = strongly served as the independent variable (−1 = judge and 1 = algo-
disagree; 7 = strongly agree). Based on our proposed theoretical rithm), bias as the dependent variable, blindness to individual
account, we expected to observe a main effect of decision-maker on characteristics as the first mediator, equal treatment and missing
blindness to individual characteristics, and a Race × Decision-maker relevant information as competing subsequent mediators, and race
interaction on missing relevant information and treating everyone as the moderator (−1 = white and 1 = black).
equally. In line with our theorizing, when the defendant was black, the
results showed a significant indirect effect via blindness to individ-
ual characteristics and equal treatment (i.e., decision-maker →
Results
blindness to individual characteristics → equal treatment → bias;
A 2 × 2 ANOVA on bias revealed a significant main effect of race b = −.06, 95% CI: −.11 to −.02), whereas there was no evidence
[F(1, 395) = 32.76, p < .001, ηp2 = .08], a non-significant main effect for a significant indirect effect via blindness to individual character-
of decision-maker [F(1,395) < 1], and a Significant race × Decision- istics and missing relevant information (i.e., decision-maker
maker interaction (F(1, 395) = 20.54, p < .001, ηp2 = .05]. As pre- →blindness to individual characteristics → missing relevant infor-
dicted, when the defendant was black, respondents considered the risk mation → bias), as the confidence interval included zero (b = .04,
assessment less biased when it was performed by an algorithm 95% CI: .00–.09). This result is consistent with our hypothesis that,
(M = 3.64, SD = 1.97) rather than by a judge (M = 4.55, when the situation entails the potential for discrimination (i.e., a
SD = 1.74, F(1, 395) = 14.73, p < .001, ηp2 = .04). When the defen- black defendant), the belief that algorithms are more blind than
dant was white, the effect reversed, and respondents considered the risk humans to individual characteristics leads to the perception that
assessment more biased when performed by an algorithm (M = 3.44, algorithms are more likely than humans to treat everyone equally.
SD = 1.53) rather than by a judge (M = 2.83, SD = 1.42, F(1, As a consequence, algorithmic decisions are perceived less biased
395) = 6.62, p = .01, ηp2 = .02). than human decisions.
6 BONEZZI AND OSTINELLI

Figure 1
Moderated Serial Mediation Model in Study 2

Treating
Everyone
Equally
White: -.02NS
-.48***
Black: .23***

.52*** Blindness to
Decision-maker Individual Bias
Characteristics
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

White: .29*** .45***


Black: .18**
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Missing
Relevant
Information

White: .31* (-.03NS)


Black: -.46*** (-.35***)

Race

Note. * p < .05. ** p < .01. *** p < .001.

In contrast, when the defendant was white, the results showed a individual leads people to infer that algorithms are more likely
significant indirect effect via blindness to individual characteristics than humans to treat everyone equally. Consequently, algorithmic
and missing relevant information (i.e., decision-maker → blindness decisions are perceived less biased than human decisions. In con-
to individual characteristics → missing relevant information → bias; trast, when a target’s individual characteristics do not entail the
b = .07, 95% CI: .02–.12), whereas there was no evidence for a potential for discrimination (i.e., a white defendant), the belief that
significant indirect effect via blindness to individual characteristics algorithms ignore the unique characteristics of the individual leads
and equal treatment, as the confidence interval included zero (i.e., people to infer that algorithms are more likely than humans to miss
decision-maker → blindness to individual characteristics → equal information that is relevant to the judgment at hand. Consequently,
treatment → bias; b = .01, 95% CI: −.02–.04). This result is algorithmic decisions are perceived more biased than human
consistent with our hypothesis that, when the situation does not decisions.
entail the potential for discrimination (i.e., a white defendant), the In Study 3, we aim to provide convergent evidence for the
belief that algorithms are more blind than humans to individual proposed underlying mechanism via moderation. In particular,
characteristics leads to the perception that algorithms are more likely we aim to show that, when individual characteristics entail the
than humans to miss information that is relevant to the judgment at potential for discrimination (e.g., a black defendant), the belief that
hand. As a consequence, algorithmic decisions are perceived more algorithms ignore the unique characteristics of the individual being
biased than human decisions. judged leads people to think that algorithms are more likely than
humans to ignore information that might be grounds for
discrimination.
Discussion
While we acknowledge the correlational nature of this analysis
Study 3
(Fiedler et al., 2018), these findings provide support for our pro-
posed psychological mechanism. Consistent with our theorizing, the The results of Study 2 show that people believe that algorithms
results suggest that people believe that algorithms are more likely are more likely than humans to ignore the individual characteristics
than humans to ignore the unique characteristics of the individual of the target being judged. We argued that, in situations that entail
being judged. This fundamental belief fosters different inferences, the potential for discrimination, this belief leads people to think that
depending on whether individual characteristics entail the potential algorithms are more likely than humans to ignore individual char-
for discrimination or not. When a target’s individual characteristics acteristics that could be grounds for discrimination. In Study 3, we
entail the potential for discrimination (i.e., a black defendant), the test this hypothesis by directly manipulating whether or not re-
belief that algorithms ignore the unique characteristics of the spondents were informed that a decision-maker (i.e., human versus
CAN ALGORITHMS LEGITIMIZE DISCRIMINATION? 7

algorithm) considered information about social group membership Discussion


that can be grounds for discrimination (i.e., the demographic profile
The results of Study 3 provide further evidence for our proposed
of an individual). If our hypothesis is correct, in a situation that
underlying mechanism by showing that, in situations that entail the
entails the potential for discrimination (e.g., a black defendant),
potential for discrimination, people perceive algorithmic decisions
human decisions should be perceived more biased than algorithmic
to be less biased than human decisions because they think that
decisions, unless people are informed that the algorithm considered
algorithms are more likely than humans to ignore individual char-
the demographic profile of an individual, thus information about
acteristics that could be grounds for discrimination. Taken together,
social group membership that can be grounds for discrimination.
the first three studies provide support for our hypothesis that people
perceive decisions that yield outcomes that entail the potential for
Method discrimination (e.g., racial or gender disparities) less biased when
they stem from an algorithm rather than a human. In the next three
Study 3 had a 2(decision-maker: human vs. algorithm) × 2(de- studies, we test three consequences that should stem from this
mographics: control vs. considered) between-subjects design. We
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

asymmetrical perception of bias, and that have societal relevance.


aimed to recruit 400 respondents. Four hundred respondents (54%
This document is copyrighted by the American Psychological Association or one of its allied publishers.

females; age: M = 40, SD = 13) recruited on Mturk were presented


Study 4
with the same news article used in Study 2, about a black man who
was sentenced to five years in prison for stealing a car, based on a We proposed that, if decisions that yield disparities are less likely
risk assessment that deemed him at a high risk of recidivism. to be perceived as biased when they stem from algorithms rather
Between subjects, we manipulated whether the risk assessment than humans, people might then be more likely to think that such
was conducted by a judge or by an algorithm, and whether respon- disparities reflect actual differences in dispositions and abilities. In
dents were informed that the judge/algorithm considered the de- Study 4, we test this hypothesis in the context of gender disparities in
fendant’s demographic profile (e.g., information about race, gender, hiring. Specifically, we expect that people might be more likely to
and age). Respondents rated the likelihood that the risk assessment rationalize gender disparities as a reflection of actual differences in
was biased (1 = very unlikely; 7 = very likely). dispositions and abilities between genders, when such disparities
stem from algorithmic rather than human decisions. If supported,
this prediction implies that replacing human with algorithmic deci-
Results sion-making might contribute to fostering stereotypical beliefs
A 2 × 2 ANOVA on likelihood of bias revealed a significant about stigmatized groups.
main effect of decision-maker [F(1, 396) = 4.16, p = .04,
ηp2 = .01], a non-significant main effect of demographics Method
[F(1, 396) = 3.14, p = .08, ηp2 = .01], and a Significant decision
Study 4 had a one-factor, between-subjects design: algorithm
maker × Demographics interaction [F(1, 396) = 6.12, p = .01,
versus human decision-maker. We aimed to recruit 150 respondents.
ηp2 = .02]. As predicted, when the decision-maker was not speci-
One hundred fifty respondents (56% females; age: M = 40,
fied to have considered the demographic profile of the defendant, SD = 13) recruited on Mturk read that a company was looking
respondents perceived the risk assessment to be less biased when to hire an executive director, and that the company had used either a
performed by an algorithm (M = 3.44, SD = 1.82) than by a judge recruiter or an AI software to screen applicants and select a few
(M = 4.29, SD = 1.73, F(1, 396) = 10.13, p < .01, ηp2 = .03), candidates to interview for the position. To ensure that respondents
thus replicating the results of Study 2. In contrast, when the understood the information presented, they were asked to indicate
decision-maker was specified to have considered the demographic who/what conducted the analysis of the candidates, before moving
profile of the defendant, respondents perceived the risk assessment on to the next screen. Four respondents did not answer this question
performed by the algorithm (M = 4.24, SD = 2.05) as biased as the correctly. Removing these observations does not change the con-
risk assessment performed by the judge (M = 4.16, SD = 1.89, clusions of the analysis reported below. Respondents then read that
F(1, 396) < 1). 68 males and 32 females applied for the position and that five males
From a different angle, when the risk assessment was performed were selected to be interviewed, an outcome indicative of a gender
by the judge, respondent’s perception of bias did not differ as a disparity. Respondents were then asked to rate the likelihood that the
function of whether the judge was specified to have considered the decision made by the recruiter/AI software was biased (1 = very
demographic profile of the defendant (M = 4.16, SD = 1.89) or not unlikely; 7 = very likely). Then, respondents indicated their agree-
(M = 4.29, SD = 1.73, F(1, 396) < 1). This finding suggests that ment with four items, adapted from Cundiff and Vescio (2016), that
respondents assumed that the judged considered the demographic explained the decision in terms of dispositional differences between
profile of the defendant, even when such information was not genders (e.g., the women who applied possessed skills that are
provided. In contrast, when the risk assessment was performed better suited for other positions; α = .89).
by the algorithm, respondent’s perceived it more biased when the
algorithm was specified to have considered the demographic profile
Results
of the defendant (M = 4.24, SD = 2.05) than when such informa-
tion was not provided (M = 3.44, SD = 1.82, F(1, 396) = 8.97, As predicted, respondents perceived a hiring decision that yielded
p < .01, ηp2 = .02). This finding suggests that respondents assumed a gender disparity less biased when it was made by an algorithm
that the algorithm did not consider the demographic profile of the (M = 4.28, SD = 1.98) rather than by a human (M = 5.04,
defendant, when such information was not explicitly provided. SD = 1.60), t(148) = 2.59, p = .01, d = .42. Moreover,
8 BONEZZI AND OSTINELLI

respondents were more likely to explain the decision in terms of algorithm rather than a human when concerns about discrimination
dispositional differences among genders when it was made by an are more versus less salient.
algorithm (M = 4.33, SD = 1.37) rather than by a recruiter
(M = 3.70, SD = 1.47), t(146) = 2.68, p = .01, d = .44 (two miss-
ing values on this measure). This result suggests that respondents Study 5a: Perception of Bias About a Job Rejection
considered female candidates less qualified for the executive posi- Method
tion when the gender disparity stem from an algorithmic rather than
a human decision. Study 5a had one-factor, between-subjects design: algorithm
A mediation analysis (Hayes, 2018; Model 4) revealed a signifi- versus human decision-maker. We aimed to recruit 150 female
cant indirect effect via perception of bias (i.e., decision-maker → respondents. The final sample size was determined by the following
bias → dispositional differences; b = .28, 95% CI: .08–.52). More- procedure. We recruited 300 respondents on Mturk. At the begin-
over, when controlling for perception of bias, the main effect of ning of the Study, respondents were asked to indicate their gender
decision-maker on dispositional differences became non-significant (male, female, and prefer not to answer). Only female respondents
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

[t(145) = 1.60, p =.11], thus indicating full mediation. were redirected to our Study. We expected that at least 50% of
This document is copyrighted by the American Psychological Association or one of its allied publishers.

respondents would be females and that this procedure would


therefore yield the desired sample size. The final sample consisted
Discussion of 180 female respondents (age: M = 38, SD = 12).
Respondents were asked to imagine that they had applied for a job
The results of Study 4 support our hypothesis that, because
at a company that was rumored to discriminate against female
algorithmic decisions that yield disparities are perceived less biased
applicants. Respondents were then presented with an extract
than human decisions, people are more likely to rationalize such
from an email from the company stating that they had not been
disparities as a reflection of actual differences in dispositions and
selected for a follow-up interview. The email mentioned that the
abilities when they stem from algorithms rather than humans. In our
review process was conducted by a recruiter/algorithm. To ensure
Study, when an algorithm, as opposed to a recruiter, made a hiring
that respondents understood the information presented, they were
decision that favored men over women, respondents rationalized the
asked to indicate who/what screened the application before moving
decision by thinking that female applicants were actually less
on to the next screen. All but two respondents answered the question
qualified than male applicants. This result has important societal
correctly. Removing these observations does not change the con-
implications because such rationalizations can reinforce stereotypes
clusion of the analysis reported below. Respondents then rated the
that inhibit efforts aimed to combat discrimination (Reyna, 2000),
likelihood that the decision made by the recruiter/algorithm was
thus contributing to perpetuate inequalities (Cundiff & Vescio,
biased (1 = very unlikely; 7 = very likely).
2016). We elaborate further on this point in the general discussion.

Results
Studies 5a–b
As predicted, respondents perceived the decision less biased
In the first four studies, respondents judged situations where when it was made by an algorithm (M = 4.28, SD = 1.85) than
disparities affected others. In Study 5a–b, we test whether our a human (M = 5.10, SD = 1.54), t(178) = 3.23, p = .002, d = .48.
findings hold even when members of stigmatized groups are directly
affected. It is possible that members of stigmatized groups, who are
more often subject to discrimination (Operario & Fiske, 2001), Study 5b: Choosing How to be Evaluated
might be more sensitive to the presence of disparities, and thus more
Method
likely to perceive decisions that yield disparities as biased, regard-
less of whether they stem from algorithms or from humans. How- Study 5b had a one-factor, between-subjects design: potential for
ever, members of stigmatized groups might also share the same discrimination more versus less salient. We aimed to recruit 200
fundamental beliefs about how algorithms operate as those who do female respondents. For this Study, we recruited only female
not belong to stigmatized groups. Consequently, even members of respondents by specifying gender as a recuiting criterion on Turk-
stigmatized groups should be more likely to perceive algorithmic Prime (Litman et al., 2017). Two hundred female respondents (age:
decisions that yield disparities less biased than human decisions. In M = 40, SD = 12) were asked to imagine that they were applying
studies 5a–b, we test this hypothesis by asking female participants to for a job in an industry where women are underrepresented, for
evaluate two situations that entail the potential for gender discrimi- which they knew they were well-qualified. They further read that
nation in hiring. Specifically, In Study 5a, we examine to what they had the option to have their application screened either by an
extent female participants perceive as biased an allegedly discrimi- algorithm or by a recruiter. In one condition, we made the potential
natory decision that leads to the rejection of a job application, as a for discrimination more salient by drawing respondents’ attention to
function of whether the decision is made by a human versus an the possibility that they could be discriminated against because of
algorithm. We expected that women would perceive an allegedly their gender. In the other condition, no concern about gender
discriminatory decision less biased when made by an algorithm discrimination was raised, thus the potential for discrimination
rather than a human. In Study 5b, we explore whether female was less salient. Respondents indicated whether they would prefer
participants prefer to be evaluated by a human versus an algorithm, to have their application screened by a recruiter or by an algorithm.
when concerns about gender discrimination are more versus less To confirm that our manipulation worked as intended, we then asked
salient. We expected that women would prefer to be evaluated by an respondents to rate the extent to which, when making their decision,
CAN ALGORITHMS LEGITIMIZE DISCRIMINATION? 9

they were concerned about the possibility of being discriminated read that the company publicly responded that bonus payments were
because of their gender (1 = not at all; 7 = very much). determined by a department manager/algorithm based on the per-
formance and qualifications of the employees, and not on their
gender. Finally, respondents were read that a group of workers
Results
started a petition against the company for gender discrimination, and
Respondents were more likely to be concerned about the possi- were asked whether they would consider signing the petition
bility of being discriminated because of their gender when the (1 = definitely not; 6 = definitely yes).
potential for discrimination was more salient (M = 5.25,
SD = 1.70) than when it was less salient (M = 4.60, SD = 2.07),
Results
t(198) = 2.43, p = .02, d = .34, confirming the effectiveness of our
manipulation. More importantly, as predicted, participants choice As predicted, respondents were less willing to sign the petition
differed significantly between conditions, χ2(1) = 15.69, p < .001, when disparities in bonus payments were determined by an algo-
w = .28. When the potential for discrimination was less salient, only rithm (M = 3.41, SD = 1.65) than by a human (M = 4.03,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

37% of respondents chose to be evaluated by the algorithm, a choice SD = 1.43), t(148) = 2.43, p = .02, d = .40.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

share significantly below the point of indifference (i.e., 50%),


Z = −2.6. p = .01. In contrast, when the potential for discrimina-
Discussion
tion was more salient, 65% of respondents chose to be evaluated by
the algorithm, a choice share significantly above the point of The results of Study 6 suggest that people might be less likely to
indifference, Z = 3.00, p < .01. take action against decisions that yield disparities when such
decisions stem from algorithms rather than humans. This result
has important societal implications, in that it suggests that replacing
Discussion
humans with algorithmic decision-making might make people less
Overall, Studies 5a–b show that even members of stigmatized likely to take action against decisions that might, in fact, be
groups might perceive algorithmic decisions that entail the potential discriminatory.
for discrimination less biased than human decisions. In our studies,
women considered an allegedly discriminatory employment deci-
General Discussion
sion less biased when it stemmed from an algorithm rather than a
recruiter. And they preferred to be evaluated by an algorithm rather Across nine studies, we document a tendency to perceive algo-
than a recruiter when the potential for discrimination was salient. rithmic decisions that yield disparities that entail the potential for
These findings contribute to understanding how individuals that are discrimination less biased than human decisions. In Study 1,
more susceptible to the negative consequences of discrimination algorithmic decisions that yielded gender and racial disparities
perceive algorithmic as opposed to human decisions. In so doing, were perceived less biased than human decisions. This result was
these findings warn again the risk that systematic discrimination robust across three domains of key societal importance, where
might go more undetected when perpetrated by algorithms rather algorithms are increasingly replacing human decision-makers. Stud-
than humans, even by those who stand to be most impacted by it. In ies 2 and 3 showed that this differential perception of bias is driven
Study 6, we aim to explore an additional societal implication of our by the fundamental belief that algorithms, unlike humans, decon-
findings, namely, whether algorithmic, as opposed by human de- textualize decision-making because they ignore the unique char-
cisions, affect people’s willingness to take actions against disparities acteristics of the individual being judged. This fundamental belief
that might be discriminatory. fosters different inferences, which sway the perception of bias in
opposite directions, depending on whether the situation entails the
potential for discrimination or not. Studies 4, 5, and 6 showed that
Study 6
this differential perception of bias has societally relevant conse-
Prior research suggests that perceiving decisions that yield dis- quences. Perceiving algorithmic decisions less biased than human
parities as biased is a necessary condition to mobilize people to decisions can (a) induce people to erroneously think that disparities
support actions aimed to remove unjust inequalities in our society that stem from algorithms are an accurate reflection of actual
(Corcoran et al., 2015; Earl, 2004, 2006). If people perceive algo- differences in dispositions and abilities (Study 4), (b) lead members
rithmic decisions that yield disparities less biased than human of stigmatized groups into being more accepting of algorithmic
decisions, they might be less likely to support actions aimed to decisions that might, in fact, be discriminatory (Study 5a), (c)
remove such disparities when they stem from algorithms rather than preferring algorithmic over human evaluations (Study 5b), and
humans. We test this hypothesis in Study 6. (4) reduce people’s propensity to take actions against disparities
that might be discriminatory, when such disparities are perpetrated
by algorithms (Study 6). Altogether, these findings have both
Method
theoretical and practical implications.
The Study had a one-factor, between-subjects design: algorithm
versus human decision-maker. We aimed to recruit 150 respondents.
Theoretical Contributions
One hundred fifty respondents (55% females; age: M = 40,
SD = 13) recruited on Mturk read that a reporter investigated a Our findings contribute to research on algorithmic bias. Prior
local company and discovered that female employees were less investigations have predominantly focused on documenting the
likely to receive bonus payments than male employees. They then existence of algorithmic bias and defining it from a technical
10 BONEZZI AND OSTINELLI

standpoint (e.g., Friedman & Nissenbaum, 1996; Kleinberg et al., of information that is detrimental rather than beneficial to the
2016). We add to this line of research by contributing a psychologi- judgment at hand, potentially resulting in algorithm appreciation
cal perspective, exploring whether people perceive algorithmic bias. rather than aversion.
Our research provides one of the first attempts to understand Finally, our findings contribute to research on social inequalities.
people’s perception of algorithmic bias, unpacking the duality Past research (Cundiff & Vescio, 2016; Yzerbyt et al., 1997) has
inherent in the notion of bias as systematic error that can have a examined how stereotypical beliefs affect how people perceive
discriminatory or nondiscriminatory connotation. Our research disparities. Specifically, Cundiff and Vescio (2016) showed that
shows that when confronted with decisions that yield outcomes those who endorse gender stereotypes are more likely to attribute
that are not discriminatory, people may be prone to overestimate gender disparities to dispositional differences between men and
algorithmic bias, a tendency that might induce people to over rely on women, and less to discrimination. We contribute to this literature
human judgment. Yet, when confronted with decisions that yield by identifying a novel factor that sways how people make sense of
outcomes that can be discriminatory, people may be prone to disparities, namely, whether disparities stem from decisions made
underestimate algorithmic bias, a tendency that might induce people by humans versus algorithms. Our research shows that people are
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

to over rely on algorithmic judgment. In doing so, our work enriches more likely to rationalize gender disparities as a reflection of actual
This document is copyrighted by the American Psychological Association or one of its allied publishers.

the current debate on algorithmic bias by broadening its scope and differences in dispositions and abilities between genders when such
highlighting the importance of understanding people’s perception of disparities stem from algorithmic rather than human decisions, a
bias and its consequences. We contend that, although the statistical conclusion that can be dangerously erroneous.
detection of bias is of paramount importance, understanding peo-
ple’s perceptions of algorithmic bias is equally important, as efforts
Practical Implications
aimed to remove bias in society are often driven by whether or not
people perceive such bias (Corcoran et al., 2015; Earl, 2004). If people are less likely to perceive decisions that yield disparities
Our research also contributes to the literature on clinical versus as biased when they stem from algorithms rather than humans, then
statistical judgments, in two distinct ways. After Meehl’s seminal replacing human with algorithmic decision-making can potentially
contribution (Meehl, 1954), a growing body of evidence has shown contribute to legitimize discrimination. For example, the results of
that people tend to trust human more than algorithmic decisions Study 4 suggest that algorithmic decision-making may increase the
(Grove & Meehl, 1996), a phenomenon referred to as algorithm risk that disparities might be erroneously interpreted as an accurate
aversion (Dietvorst et al., 2015). This behavior can be suboptimal in reflection of differences in people’s dispositions and abilities. As a
light of evidence suggesting that algorithms can outperform human result, algorithms might reinforce stereotypical beliefs that contrib-
intuition (Dawes et al., 1989; Grove et al., 2000). Our research ute to perpetuating discrimination. Indeed, a society’s first line of
suggests that algorithm aversion might not manifest, and might even defense against discrimination lays in people’s ability to recognize
reverse, in situations that entail the potential for discrimination. For that disparities might stem from biased decisions (Crosby, 1993;
example, in Study 5b, female respondents preferred to be evaluated Spring et al., 2018). Perceiving bias is a necessary condition to
by a human rather than an algorithm only when the potential for mobilize efforts aimed to remove inequalities in society (Corcoran
discrimination was not salient. When the potential for discrimination et al., 2015; Earl, 2004). The results of Study 6 suggest that
was salient, the effect reversed, such that female respondents algorithmic decision-making may make people less likely to take
preferred to be evaluated by an algorithm rather than a human. actions against decisions that might, in fact, be discriminatory. We
Thus, our findings add to recent research that suggests that algorithm believe that algorithms have tremendous potential to unveil and
aversion might not be as universal as previously thought (Castelo eventually rid decision-making from human biases (Kleinberg et al.,
et al., 2019; Logg et al., 2019). 2020), leading to more accurate and equitable results (Gates et al.,
Our research further contributes to the literature on clinical versus 2002). Yet, to the extent that algorithmic bias exists and is hard to
statistical judgments by providing a more nuanced understanding of detect and eradicate (Hao, 2019), the use of algorithms might have
a key psychological driver of algorithm aversion, namely, algo- societal implications that should not be ignored. More research is
rithms’ perceived inability to contextualize decision-making needed to explore these implications.
(Newman et al., 2020; Sloan & Warner, 2018). Prior research By showing that differential perceptions of bias for human versus
argues that algorithms’ perceived inability to contextualize deci- algorithmic decision-making are driven by erroneous beliefs about
sion-making leads people to infer that algorithms are more prone to how algorithms operate, our research highlights the importance of
error than humans, because they are more likely than humans to fostering algorithmic literacy. Recent investigations show that
neglect information that is relevant to the judgment at hand, a belief algorithmic bias can go undetected by the public, even when it
that drives algorithm aversion (Longoni et al., 2019). In contrast, affects the well-being of millions of people (Obermeyer et al.,
our results suggest that, in situations that entail the potential for 2019). Our findings suggest that education can alleviate the danger
discrimination, algorithms’ perceived inability to contextualize that the general public might be deceived by the apparent objectivity
decision-making triggers different inferences, leading to a reversal of algorithmic decision-making. For example, the results of Study 3
of algorithm aversion. In these situations, this fundamental belief show that the asymmetric perception of bias was eliminated when
can lead people to infer that algorithms are less prone to error than participants were informed that an algorithm considered information
humans, because they are more likely than humans to neglect about the demographic profile of an individual. Yet, organizations
information that might be grounds for discrimination. Thus, our that use these algorithms often reassure the general public and
findings add to prior research by showing that perceived decontex- policymakers that their tools do not rely on such information
tualization can trigger different inferential processes. Depending on (Puente, 2019). We argue that such claims may exploit people’s
the situation, decontextualization can be perceived to lead to a loss misconceptions about algorithms. Most instances of algorithmic
CAN ALGORITHMS LEGITIMIZE DISCRIMINATION? 11

bias are not due to the direct use of protected variables as inputs into used to replace human decision-making, and modeled our stimuli
algorithmic decision-making, but to the use of non-protected “prox- after documented cases of algorithmic bias. For example, in the
ies” that are highly associated to protected variables (Barocas & hiring domain, where algorithms are increasingly being used at all
Selbst, 2016; Hajian et al., 2016). For example, an algorithm that stages of the hiring process (for a review, see Bogen & Rieke, 2018),
uses SAT scores to screen students’ applications might introduce our stimuli were based on a case in which an algorithm developed to
bias in the selection process even if it ignores race as a variable, screen applicants’ resumes was found to systematically favor male
because of the historical association between SAT scores and race over female applicants (Dastin, 2018). Similarly, in the criminal
(Geiser, 2015). Thus, the fact that an algorithm does not use justice domain, where algorithms are extensively deployed to assess
protected variables does not guarantee that the algorithm is unbi- offenders’ risk of recidivism (for a review, see Desmarais et al.,
ased, yet people might erroneously conclude that it is. In fact, 2016), our stimuli were based on a case where an algorithm used to
technical debates about how to debias algorithms have pointed to the predict the risk of recidivism was allegedly found to systematically
need for algorithms to actually use protected variables in their assess white defendants at a lower risk than black defendants
predictions as a way to detect and correct for bias (Gillis & (Angwin et al., 2016; for a rebuttal see Dieterich et al., 2016).
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Spiess, 2019; Kleinberg et al., 2018b). Our research calls to estab- Yet, we acknowledge that in our studies, respondents were exposed
This document is copyrighted by the American Psychological Association or one of its allied publishers.

lishing checks and balances aimed not only at detecting and to simple scenarios aimed to resemble what people may be exposed
alleviating algorithmic bias, but also at educating the general public to via everyday news. Future research is needed to explore the
to think critically about the way algorithms operate. robustness of our results in field settings.
Fifth, we suggested that our findings point to an important societal
Limitation and Future Research implication, namely, the risk that by hindering perceptions of bias,
algorithmic decision-making might reinforce stereotypes and make
Our research provides robust evidence that perceptions of bias people less likely to take actions against decisions that might be
differ as a function of whether decisions that yield disparities stem discriminatory. For example, the results of Study 4 suggest that
from algorithms versus humans. Yet, our research also has limita- algorithmic decisions may foster stereotypical beliefs, in that people
tions that open avenues for future research. We focus our discussion might be more likely to rationalize disparities as a reflection of actual
on six key areas that we believe offer particularly fruitful oppor- differences in dispositions and abilities, when disparities stem from
tunities to extend the current work. algorithmic rather than human decisions. Similarly, the results of
First, our investigation was by nature limited in scope, and as Study 6 suggest that people might be less willing to take action
such, it does not provide an exhaustive analysis of all the possible against alleged discrimination when algorithms, rather than humans,
instances of algorithmic bias. For example, we focused only on make decisions. However, we acknowledge that our empirical
gender and racial disparities. Additional research is needed to investigation is only the first step toward investigating these impor-
examine perceptions of bias for decisions that yield disparities tant societal implications. More research is needed to explore these
related to other protected variables, such as age, sexual orientation, and other responses to disparities generated by algorithmic as
or religion. Moreover, the disparities investigated in our studies are opposed to human decisions.
not representative of the full range of disparities people may be Finally, it is important to note that even if people perceive
confronted with. Facing disparities that are more versus less extreme algorithmic decisions to be less biased than human decisions,
might either thwart or magnify perceptions of bias. Indeed, in our they might not discount the role of institutional discrimination.
studies, respondents were relatively ambivalent about the perception To illustrate, even if people think that algorithms are less biased than
of algorithmic bias, as suggested by responses around the midpoint humans in making university admissions decisions, they may still
of the measure of bias. Thus, future research is needed to generalize believe that institutional discrimination makes it more difficult for
our findings to a broader range of situations, where disparities might marginalized students to achieve the standards set for admission. In
be more versus less pronounced. fact, the perceived impartiality of algorithms might make institu-
Second, it is possible that differential perceptions of bias for tional discrimination even more salient. Future research is needed to
algorithmic versus human decisions might be, at least in part, explore whether algorithmic decision-making impairs or magnifies
contingent on the general public being misinformed about the the role of institutional discrimination, and whether this impacts
existence of algorithmic bias. That is, based on everyday experience, support for affirmative action.
people might be more familiar with instances of human rather than
algorithmic bias. As such, the phenomenon we document might
progressively attenuate as people become more educated about References
instances of algorithmic bias. Although we agree that awareness
of algorithmic bias is key to mitigating the erroneous perception that Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias:
algorithms are unbiased decision-makers, our results might point to There’s software used across the country to predict future criminals. And
a more fundamental issue than simple misinformation. In this it’s biased against blacks. ProPublica. www.propublica.org/article/
machine-bias-risk-assessments-in-criminalsentencing
regard, it is important to mention that our empirical investigation
Barocas, S., & Selbst, A. D. (2016). Big data’s disparate impact. California
was conducted at a time of an extensive media coverage of instances
Law Review, 104, 671–732.
of algorithmic bias. More research is needed to investigate what kind Bartlett, R., Morse, A., Stanton, R., & Wallace, N. (2019). Consumer-
of educational interventions might be most effective at eradicating lending discrimination in the FinTech Era. National Bureau of Economic
the erroneous beliefs that drive the phenomenon we document. Research, Working Paper No 25943. https://doi.org/10.3386/w25943
Fourth, to probe the generalizability of our results, we tested our Bates, D. W., Saria, S., Ohno-Machado, L., Shah, A., & Escobar, G. (2014).
hypothesis across different domains where algorithms are widely Big data in health care: Using analytics to identify and manage high-risk
12 BONEZZI AND OSTINELLI

and high-cost patients. Health Affairs, 33, 1123–1131. https://doi.org/10 Gates, S. W., Perry, V. G., & Zorn, P. M. (2002). Automated underwriting in
.1377/hlthaff.2014.0041 mortgage lending: Good news for the underserved? Housing Policy
Bogen, M., & Rieke, A. (2018). Help wanted: An exploration of hiring Debate, 13(2), 369–391. https://doi.org/10.1080/10511482.2002
algorithms, equity, and bias. Technical report. Upturn. .9521447
Bonezzi, A., & Ostinelli, M. (2020). Can algorithms legitimize discrimina- Geiser, S. (2015). The growing correlation between race and SAT scores:
tion? https://doi.org/10.17605/OSF.IO/276GM New findings from California. University of California, Berkeley Center
Castelo, N., Bos, M., & Lehmann, D. (2019). Task-dependent algorithm for Studies in Higher Education.
aversion. JMR, Journal of Marketing Research, 56(5), 809–825. https:// Gillis, T., & Spiess, J. (2019). Big data and discrimination. The University of
doi.org/10.1177/0022243719851788 Chicago Law Review. University of Chicago. Law School, 89(2),
Corcoran, K., Pettinicchio, D., & Young, J. (2015). Perceptions of structural 459–488.
injustice and efficacy: Participation in low/moderate/high cost forms of Gomber, P., Kauffman, R. J., Parker, C., & Weber, B. W. (2018). On the
collective action. Sociological Inquiry, 85, 429–461. https://doi.org/10 fintech revolution: Interpreting the forces of innovation, disruption, and
.1111/soin.12082 transformation in financial services. Journal of Management Information
Crosby, F. J. (1993). Why complain? Journal of Social Issues, 49, 169–184. Systems, 35, 220–265. https://doi.org/10.1080/07421222.2018.1440766
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

https://doi.org/10.1111/j.1540-4560.1993.tb00916.x Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Cundiff, J. L., & Vescio, T. K. (2016). Gender stereotypes influence how (subjective, impressionistic) and formal (mechanical, algorithmic) predic-
people explain gender disparities in the workplace. Sex Roles: A tion procedures: The clinical-statistical controversy. Psychology, Public
Journal of Research, 75, 126–138. https://doi.org/10.1007/s11199- Policy, and Law, 2, 293–323. https://doi.org/10.1037/1076-8971.2.2.293
016-0593-2 Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000).
Danks, D., & London, A. J. (2017). Algorithmic bias in autonomous systems. Clinical versus mechanical prediction: A meta-analysis. Psychological
Proceedings of the Twenty-Sixth International Joint Conference on Arti- Assessment, 12, 19–30. https://doi.org/10.1037/1040-3590.12.1.19
ficial Intelligence, 17, 4691–4697. Hajian, S., Bonchi, F., & Castillo, C. (2016). Algorithmic bias: From
Dastin, J. (2018). Amazon scraps secret AI recruiting tool that showed bias discrimination discovery to fairness-aware data mining. Proceedings of
against women. Reuters Technology News. www.reuters.com/article/us- the 22nd ACM SIGKDD International Conference on Knowledge Discov-
amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting- ery and Data Mining, 16, 2125–2126. https://doi.org/10.1145/2939672
tool-that-showed-bias-against-women-idUSKCN1MK08G .2945386
Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial Hao, K. (2019). This is how AI bias really happens—and why it’s so hard to
judgment. Science, 243, 1668–1674. https://doi.org/10.1126/science fix. MIT Technology Review. www.technologyreview.com/s/612876/
.2648573 this-is-how-ai-bias-really-happensand-why-its-so-hard-to-fix
Desmarais, S. L., Johnson, K. L., & Singh, J. P. (2016). Performance of Haslam, N. (2006). Dehumanization: An integrative review. Personality and
recidivism risk assessment instruments in US correctional settings. Psy- Social Psychology Review, 10, 252–264. https://doi.org/10.1207/
chological Services, 13(3), 206–222. https://doi.org/10.1037/ser0000075 s15327957pspr1003_4
Dieterich, W., Mendoza, C., & Brennan, T. (2016). COMPAS risk scales: Hayes, A. (2018). Introduction to mediation, moderation, and condi-
Demonstrating accuracy equity and predictive parity. Technical report. tional process analysis: A regression-based approach. The Guil-
Northpointe. ford Press.
Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: Kleinberg, J., Ludwig, J., Mullainathan, S., & Rambachan, A. (2018b).
People erroneously avoid algorithms after seeing them err. Journal of Algorithmic fairness. American Economic Association Papers and Pro-
Experimental Psychology: General, 144(1), 114–126. https://doi.org/10 ceedings, 108, 22–27.
.1037/xge0000033 Kleinberg, J., Ludwig, J., Mullainathan, S., & Sunstein, C. R. (2018a).
Dressel, J., & Farid, H. (2018). The accuracy, fairness, and limits of Discrimination in The age of algorithms. The Journal of Legal Analysis,
predicting recidivism. Science Advances, 4(1), eaao5580. https:// 10, 113–174. https://doi.org/10.1093/jla/Laz001
doi.org/10.1126/sciadv.aao5580 Kleinberg, J., Ludwig, J., Mullainathan, S., & Sunstein, C. R. (2020).
Earl, J. (2006). Pursuing social change online: The use of four protest tactics Algorithms as discrimination detectors. Proceedings of the National
on the internet. Social Science Computer Review, 24, 362–377. https:// Academy of Sciences of the United States of America, 117(48), 30096–
doi.org/10.1177/0894439305284627 30100. https://doi.org/10.1073/pnas.1912790117
Earl, J. (2004). The cultural consequences of social movements. In D. A. Kleinberg, J., Mullainathan, S., & Raghavan, M. (2016). Inherent trade-offs
Snow, S. A. Soule, & H. Kriesi (Eds.), The Blackwell companion to social in the fair determination of risk scores. https://arxiv.org/abs/1609
movements (pp. 508–530). Blackwell. .05807v2
Eubanks, V. (2018). Automating inequality: How high-tech tools profile, Larson, J., Mattu, S., Kirchner, L., & Angwin, J. (2016). How we analyzed
police, and punish the poor. St. Martin’s Press. the COMPAS recidivism algorithm. ProPublica. www.propublica.org/
Faul, F., Erdfelder, E., Buchner, A., & Lang, A. (2009). Statistical power article/how-we-analyzed-the-compas-recidivism-algorithm
analyses using G* power 3.1: Tests for correlation and regression analy- Litman, L., Robinson, J., & Abberbock, T. (2017). TurkPrime.com: A
ses. Behavior Research Methods, 41, 1149–1160. https://doi.org/10.3758/ versatile crowdsourcing data acquisition platform for the behavioral
BRM.41.4.1149 sciences. Behavior Research Methods, 49(2), 433–442. https://doi.org/
Ferguson, C. J. (2009), An effect size primer: A guide for clinicians and 10.3758/s13428-016-0727-z
researchers. Professional Psychology: Research and Practice, 40(5), Logg, J. M., Minson, J. A., & Moore, D. A. (2019). Algorithm appreciation:
532–538. People prefer algorithmic to human judgment. Organizational Behavior
Fiedler, K., Harris, C., & Schott, M. (2018). Unwarranted inferences from and Human Decision Processes, 151, 90–103. https://doi.org/10.1016/j
statistical mediation tests—An analysis of articles published in 2015. .obhdp.2018.12.005
Journal of Experimental Social Psychology, 75, 95–102. https://doi.org/10 Longoni, C., Bonezzi, A., & Morewedge, C. (2019). Resistance to medical
.1016/j.jesp.2017.11.008 artificial intelligence. The Journal of Consumer Research, 46(4), 629–650.
Friedman, B., & Nissenbaum, H. (1996). Bias in computer systems. ACM https://doi.org/10.1093/jcr/ucz013
Transactions on Information Systems, 14(3), 330–347. https://doi.org/10 Loughnan, S., & Haslam, N. (2007). Animals and Androids: Implicit
.1145/230538.230561 Associations between Social Categories and Nonhumans. Psychological
CAN ALGORITHMS LEGITIMIZE DISCRIMINATION? 13

Science, 18(2), 116–121. https://doi.org/10.1111/j.1467-9280.2007 Puente, M. (2019). LAPD official behind controversial data programs to
.01858.x retire after winning lucrative contract. https://www.baltimoresun.com/La-
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical me-sean-malinowski-predictive-policing-20190508-story.html
analysis and a review of the evidence. University of Minnesota Press. Reyna, C. (2000). Lazy, dumb, or industrious: When stereotypes convey
https://doi.org/10.1037/11281-000 attributional information in the classroom. Educational Psychology
Newman, D., Fast, N., & Harmon, D. (2020). When eliminating bias isn’t Review, 12, 85–110. https://doi.org/10.1023/A:1009037101170
fair: Algorithmic reductionism and procedural justice in human resource Schwartz, O. (2019). Untold history of AI: Algorithmic bias was born in the
decisions. Organizational Behavior and Human Decision Processes, 160, 1980s. IEEE Spectrum. https://spectrum.ieee.org/tech-talk/tech-history/
149–167. https://doi.org/10.1016/j.obhdp.2020.03.008 dawn-of-electronics/untold-history-of-ai-the-birth-of-machine-bias
Nissenbaum, H., & Walker, D. (1998). Will computers dehumanize educa- Schwemmer, C., Knight, C., Bello-Pardo, E. D., Oklobdzija, S., Schoon-
tion? A grounded approach to values at risk. Technology in Society, 20, velde, M., & Lockhart, J. W. (2020). Diagnosing gender bias in image
237–273. https://doi.org/10.1016/S0160-791X(98)00011-6 recognition systems. Socius: Sociological Research for a Dynamic World,
Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce 6, 1–17. https://doi.org/10.1177/2378023120967171
racism. New York University Press. https://doi.org/10.2307/j.ctt1pwt9w5 Sloan, R. H., & Warner, R. (2018). When is an algorithm transparent?
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

O’Neil, C. (2016). Weapons of math destruction: How Big data increases Predictive analytics, privacy, and public policy. IEEE Security and
This document is copyrighted by the American Psychological Association or one of its allied publishers.

inequality and threatens democracy. Crown Publishers. Privacy, 16(3), 18–25. https://doi.org/10.1109/MSP.2018.2701166
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Spring, V. L., Cameron, C. D., & Cikara, M. (2018). The upside of outrage.
Dissecting racial bias in an algorithm used to manage the health of Trends in Cognitive Sciences, 22, 1067–1069. https://doi.org/10.1016/j
populations. Science, 366, 447–453. https://doi.org/10.1126/science .tics.2018.09.006
.aax2342 Yzerbyt, V., Rocher, S., & Schadron, G. (1997). Stereotypes as explanations:
Operario, D., & Fiske, S. T. (2001). Ethnic identity moderates perceptions of A subjective essentialistic view of group perception. In R. Spears, P. J.
prejudice: Judgments of personal versus group discrimination and subtle Oakes, N. Ellemers, & S. A. Haslam (Eds.), The social psychology of
versus blatant bias. Personality and Social Psychology Bulletin, 27, 550– stereotyping and group life (pp. 20–50). Blackwell.
561. https://doi.org/10.1177/0146167201275004
Pangburn, D. J. (2019). Schools are using software to help pick who gets in.
What could go wrong? Fast Company. www.fastcompany.com/ Received April 20, 2020
90342596/schools-are-quietly-turning-toai-to-help-pick-who-gets-in- Revision received December 14, 2020
what-could-go-wrong Accepted December 24, 2020 ▪

View publication stats

You might also like