Cox Book Review of the Alignment Problem

The book review discusses 'The Alignment Problem: Machine Learning and Human Values' by Brian Christian, highlighting its exploration of the intersection between AI/ML and human values, particularly in the context of risk analysis. It addresses critical questions regarding fairness, transparency, and the implications of AI decision-making, emphasizing the challenges of bias and the importance of explainable AI. The review outlines the book's structure, which includes chapters on representation, fairness, transparency, and reinforcement learning, providing insights into how AI systems can both serve and fail human needs.

Uploaded by

shopify1091975

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

Cox Book Review of the Alignment Problem

Uploaded by

shopify1091975

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DOI: 10.1111/risa.

14108

BOOK REVIEW

Book Review of The Alignment Problem: Machine Learning and

Human Values by Brian Christian

Louis Anthony Cox Jr.

Cox Associates, MoirAI, Entanglement, and University of Colorado, Denver, Colorado, USA
Correspondence
Louis Anthony Cox, Jr., 503 N. Franklin Street, Denver, CO 80218, USA.
Email: [email protected]

Risk analysis seeks to provide useful frameworks for thinking does an outstanding job of explaining insights and progress
about how to manage both known risks and uncertain risks, from recent technical AI/ML literature for a general audience.
taking into account that knowledge of consequence probabil- For risk analysts, it provides both a fascinating exploration
ities for different choices, as well as current perceptions of of foundational issues about how data analysis and algo-
risks and beliefs about their causes and likely consequences, rithms can best be used to serve human needs and goals
is usually limited and may turn out to be inaccurate. Modern and also a perceptive examination of how they can fail to
artificial intelligence (AI) and machine learning (ML) wres- do so.
tle with many of the same challenges in guiding the decisions The book consists of a Prologue and Introduction followed
of robots and autonomous agents and teams of such agents by nine chapters organized into three parts (titled Prophecy,
operating under uncertainty or under novel conditions. The Agency, and Normativity, each consisting of three chapters)
field of AI/ML has raised questions that have not been much and a Conclusion. All are worth reading. The three-page Pro-
addressed in risk analysis, yet that might be of great interest logue describes the seminal work and famous 1943 paper of
to many risk analysts. Among these are the following. McCulloch and Pitts introducing artificial neural networks
and hints that finding out just what “mechanical brains” built
a. Is it possible to design risk-scoring systems that are both from simplified logical models of neurons could do would
equitable and accurate, meaning that they yield well- soon become an exciting field.
calibrated risk predictions while giving all participants The Introduction explains that “This is a book about
equal (preferably small) probabilities of false positives and machine learning and human values; about systems that
also equal probabilities of false negatives? learn from data without being explicitly programmed, and
b. What role, if any, should curiosity play in deciding about how exactly—and what exactly—we are trying to
what to try doing next in new, uncertain, and hazardous teach them.” It presents several examples of applications in
environments? which AI/M systems fail to perform as desired or intended.
c. Which is preferable for an AI agent that manages risks on It begins in 2013 with the introduction of Google’s open-
behalf of humans: source “word2vec” which uses modern neural networks to
∙ do exactly what it is instructed to do; encode words as vectors that can be added and subtracted.
∙ do what it infers that its users (probably) want it to do, This leads to both insightful equations of word arithmetic
even if they have not articulated it perfectly; (e.g., Paris – France + Italy = Rome) and more problematic
∙ do what it judges is best for them (e.g., what it deems ones (e.g., shopkeeper – man + woman = housewife) that
they should want it to do, or what it predicts they will reflect biases built into our language that we do not necessar-
want it to have done in hindsight), even if that is not ily want our machine assistants to inherit. Other challenges
what they want now. include video game-playing programs that learn to optimize
the reward functions specified by their designers while failing
These are some of the questions explored in Brian to exhibit the behaviors those rewards were meant to elicit,
Christian’s thought-provoking and readable new book The image-recognition software that performs better for racial
Alignment Problem. Similar questions can be asked for groups that were well represented in the data used to train
human risk analysts seeking to identify what is best to do it than for other groups, and crime risk assessment software
when making or recommending risk management decisions programs for supporting parole decisions that turn out to have
and policies on behalf of others. The Alignment Problem different error rates for Blacks and Whites.

Risk Analysis. 2023;43:423–428. wileyonlinelibrary.com/journal/risa © 2023 Society for Risk Analysis. 423
424 BOOK REVIEW

Chapter 1 (Representation) traces the early history of neu- processing (NLP) systems that represent word meanings as
ral nets, starting with the 1958 perceptron, a single artificial vectors (e.g., Google’s “word2vec” algorithm for embed-
“neuron” or logic gate that outputs a value of 1 if and only ding words into vector spaces based on how frequently they
if a weighted sum of inputs is above a threshold and out- appear near other words). Such systems use deep learning to
puts 0 otherwise, where 0 and 1 are typically interpreted as extract the most predictively useful representations of words
two classes. The chapter describes the “stochastic gradient as vectors. They enable numerous useful applications, from
descent” algorithm (which Christian explains in eight simple sentence-completion software to AI for question-answering
lines of plain English) for automatically adjusting the weights and retrieval of relevant information from the web on smart-
to reduce the classification errors that the system makes on a phones. But the resulting systems are trained on past language
training set for which the correct outputs (i.e., classifications) use. Therefore, they reflect the biases, assumptions, and his-
are already known. This adjustment process can be consid- torical conditions built into past usage. This can lead to false,
ered a form of “supervised learning” in which the examples obsolete, or undesirable inferences in AI NLP systems, such
in the training set are used to adjust weights for the inputs as about whether a doctor is more likely to be male or female,
until the system classifies cases in the training set accurately. or about whether a job applicant is likely to succeed at a
The final set of weights can be interpreted as implicitly repre- company based on inferred superficial (e.g., demographic)
senting the knowledge needed to predict the output class from similarities to past hires. Moreover, the distances between
the inputs. Although the perceptron can only classify some words (represented as points in a word2vec-type vector space
patterns accurately (e.g., “left” vs. “right” for which side of a embedding) turn out to correspond quite closely to human
card a shape appears on, but not whether the number of dots reaction times in tasks that require pairing words, such as
on a card is odd or even), networks with multiple layers of implicit bias or implicit association tests: pairs of words that
perceptron-like artificial neurons arranged so that the outputs are more distant take longer to pair. Studying how word
from the neurons at one level are inputs to neurons at the next embeddings shift over time can help to identify social trends
layer can learn any desired input-output function exemplified that are reflected in language use, including changing percep-
in a training set of examples. Such multi-layer (“deep”) arti- tions of identity groups and of hazards such as pandemics,
ficial neural networks can be trained to map input signals to climate change, or terrorism risks.
output decisions. Applications range from classifying images Chapter 2 (Fairness) looks at how predictive risk-scoring
(is this tumor malignant, are those pictures of enemy tanks, algorithms have been used to inform decisions about which
does this photo show a cat?) to deciding what to do next in inmates should be classified as safe to parole or to release
response to what is sensed (should this mortgage applica- early based on predicted risks to society. It cites the dramatic
tion or job application be accepted? Should an autonomous swings in public and media perceptions of such systems, such
vehicle steer left or right or go straight to stay on the road?). as the New York Times veering from urging wider acceptance
Variations of deep neural nets can also be used to detect of risk assessment tools in parole in 2014 because “they have
anomalies and novelty (is this input not mapped with high been proved to work” to writing in 2016 about “a backlash
confidence to any of the output classes learned about in the against using data to foretell defendants’ futures” because,
training set?) and to win a variety of games and control in the words of an ACLU Director, it is “kind of rushing
a wide variety of industrial processes while avoiding dan- into the world of tomorrow with big-data risk assessment.”
gerous or uncertain conditions that make achieving desired A potent catalyst for the backlash was a May 2016 article
outcomes too unpredictable for safe operation. These dra- by ProPublica entitled “Machine Bias: There’s software used
matic accomplishments are achieved via deep learning and across the country to predict future criminals. And it’s biased
other “supervised learning” algorithms that iteratively adjust against blacks.” The chapter then moves into a fascinating
weights to reduce errors in classification, prediction, and con- discussion of recent mathematical research on exactly what
trol rules for the cases in a “training set” of examples for algorithmic “fairness” and “bias” mean, and of the discov-
which correct or desired responses are known for a variety of ery that the standards that ProPublica advocated—essentially,
input conditions. The rules so learned typically perform very that an acceptable risk assessment tool should not only be
well as long as the new cases or situations to which they are well-calibrated, rendering statistically accurate predictions of
applied are statistically similar to those in the training set. But reoffense rates, but should also have the same misclassifi-
they may perform poorly when applied to cases different from cation rates (i.e., false-positive and false-negative rates) for
those in the training set. A face-detection or face-recognition different groups with different base rates of reoffense—are
system trained only on White male faces may perform poorly mathematically impossible. Theorems on the “impossibility
if applied to Black female faces. No amount of sophistica- of fairness” shine new analytic light on trade-offs and on
tion in the training algorithms and representation of decision what “fairness” and “bias” can and cannot mean. Chapter
rules can overcome limitations and biases created by train- 2 also discusses the important distinction between predict-
ing data that are non-representative of the cases to which the ing and preventing crime: being able to predict who is most
decision rules are applied. In a world where useful software likely to have an undesired outcome under the conditions for
propagates quickly, biases and limitations in training sets may which data have been collected does not necessarily reveal
get locked into classification systems that are then widely how outcome probabilities would change under new poli-
deployed. This is perhaps especially true for natural language cies or conditions. Yet this—the province of causal artificial
BOOK REVIEW 425

intelligence rather than predictive machine learning—is typi- that preceded them. Since the 1950s, when IBM unveiled a
cally what reformers and policymakers most want to know. checkers-playing program that learned from experience how
Chapter 3 (Transparency) discusses the challenge of creat- to improve its game by adjusting its parameters based on
ing safe, trustworthy AI advisory systems that provide clear wins and losses, machine learning researchers have learned
reasons for recommended decisions or courses of action. It how to use prediction errors—the differences between pre-
starts with the cautionary tale of a neural net-based system dicted and experienced future rewards following an action
that learned from data a predictive rule stating that patients (e.g., moving to a state with a higher expected value, assum-
with asthma have lower risks of developing pneumonia than ing optimal decision-making ever after that transition)—to
other patients. The pattern was correct, but the system did not simultaneously adjust action-selection probabilities and esti-
model the causes for it: patients with asthma have lower risks mates of the expected rewards from taking each possible
precisely because they would have much higher risks if they action in each possible state until no further improvements
were treated the same as other patients, and they are therefore can be made. The resulting RL algorithms appear to reflect
given more intensive care to prevent the onset of pneumo- the biology of learning how to act effectively in initially
nia. A system that simply classifies patients with asthma as uncertain environments. In such biological learning, the neu-
low risk and therefore allocates limited care resources else- rotransmitter dopamine acts as a signal of prediction error
where would be disastrous for patients with asthma. This that guides learning in the brains of a wide variety of species.
reinforces the larger methodological point that risk mod- In environments where the causal rules linking actions to
els that accurately predict risks under current conditions do probabilities of outcomes remain fixed, such as games rang-
not necessarily offer insight into how risks would change ing from checkers or backgammon to video games, RL has
under new conditions or following interventions intended to produced impressive levels of mastery in machines, including
improve the current system. Much of Chapter 3 is there- many examples of super-human skill. Chapter 4 concludes by
fore devoted to “explainable AI” that seeks to explain the discussing research linking RL, dopamine, exploration, and
basis for algorithmic decision recommendations, such as why happiness suggesting that happiness comes less from satisfac-
a borrower is turned down for a loan or why a course of tion that things have gone well, or even from anticipation that
treatment for a patient is recommended. It reviews the strik- things are about to go well, than from being pleasantly sur-
ing finding from decades of research that even simple linear prised that things are going better than expected. From this
models (applying equal weights to several causally relevant standpoint, “complete mastery of any domain seems neces-
factors) typically make more accurate risk predictions than sarily correlated with boredom” in humans and animals. Risk,
expert judgments or more complex statistical models. Human exploration, and surprise are key requirements for their flour-
expertise in the form of knowing what variables to look ishing. Subsequent chapters explore the follow-up questions
at—what is likely to be causally relevant for predicting an of how to determine what is valued (i.e., whether surprises
outcome—together with simple objective quantitative mod- are evaluated as pleasant or unpleasant and when events are
els for combining information from these variables typically evaluated as going better than expected) and how to structure
greatly outperforms human expert judgment alone. Other rewards to elicit desired behaviors in machines as well as in
recent developments, such as multitask learning—that is, animals or people.
using models to predict multiple causally related outcomes Chapter 5 (Shaping) examines how to train animals or
simultaneously instead of just one (such as disease, hospital- machines to exhibit desired complex stimulus-response
ization costs and duration, and mortality risks instead of just behaviors by rewarding successive approximations of the
mortality risk)—have not only improved predictive accuracy desired behaviors. It emphasizes the importance of creating
compared to predicting a single dependent variable at a time both a good curriculum and appropriate incentives, that is,
but have also allowed greater visibility into the features that designing rewards that lead a reward-maximizing learner
allow accurate predictions. Studying the relative times that an to master a sequence of progressively more difficult and
ML model spends processing different features to make its more accurate approximations of desired complex behaviors.
predictions (“saliency” analysis) helps to identify which fea- These principles are illustrated by examples that range from
tures it treats as most informative for purposes of prediction. Skinner’s seminal work with animals and behaviorism in the
This has led to unexpected and useful discoveries, such as 1950s, to DeepMind’s use of automated curriculum design
that age and sex can be identified with astonishing accuracy to train AlphaGo and more recent world champion-level
from retinal scans. Go-playing programs, to child psychology and implications
Part second of the book (Agency) begins with Chapter for parenting. Children in families, adults in organizations,
4 on reinforcement learning (RL) in animals, people, and and AIs endowed with RL algorithms are all adept at gaming
machines. All three can learn from experience by noticing the systems in which they are placed to maximize their
when actions are followed by desirable or undesirable results rewards, often discovering loopholes and ways to exploit
and trying to infer decision rules that specify what actions to rules and incentives that were not intended or desired by
take in different situations to make preferred outcomes more those who created them. Design principles discovered in ML
likely. If results are long delayed and only learned after long research, such as (a) rewarding states rather than actions (e.g.,
sequences of actions, then the “credit-assignment” problem rewarding achievement of a goal state rather than behaviors
arises of attributing causation of outcomes to specific choices that we hope might lead to it), (b) paying as much attention
426 BOOK REVIEW

to movement away from goals as movement toward them, via explicit instructions (including being programmed, for
and (c) distinguishing between what is desired and what is machines), imitation learning—that is, learning by imitat-
rewarded (since rewards shape behaviors in ways that are ing with increasing accuracy the successful behaviors and
not necessarily simply related to what is desired or intended) skills that others have already mastered—has the distinct
may be useful for improving the performance of learning advantages of efficiency; safety of imitating known success-
individuals and organizations as well as the performance of ful behaviors instead of risking exploration of potentially
learning AIs. The chapter ends with discussions of the inter- disastrous ones; and the possibility of learning skills that can-
action between evolution and learning in which evolutionary not easily be described, but that are easier to show than to tell.
pressures shape what we value and count as positive rewards, Whether the task is learning to walk without falling or ride a
and gamification in which well-designed curricula and bicycle or drive an autonomous vehicle safely in traffic under
incentives are used to make acquiring real-world skills and changing conditions or control a complex industrial facility
knowledge as compelling—or even addictive—as playing safely and efficiently, imitation learning can help AIs (as well
well-designed video games. as infants, children, and new employees) learn from people
Chapter 6 (Curiosity) opens with a discussion of the inte- who already have the needed experience and skills to accom-
gration of deep learning (discussed in Chapter 1) with RL plish these tasks. But such learning is vulnerable to the fact
(discussed in Chapters 4 and 5) to automate first the construc- that experts seldom make mistakes, so crucial lessons about
tion of higher level features relevant for gameplay from raw how to recover quickly from errors are unlikely to be learned
pixel-level data in dozens of Atari video games, and then the by imitation of successful behaviors. Shared control in which
process of learning to play and win the games. The result- the learner is allowed to make decisions and try out partially
ing “deep RL” technology pioneered by DeepMind in 2015 acquired skills and a human expert can override to correct
learned to play most of the video games on which it was mistakes provides dramatic improvements in safe imitation
tested with super-human skill (often more than 10 times more learning including error-recovery skills. When a master has
skillful than human expert game players). For a small minor- skills that a novice lacks, however, imitation learning may
ity of games in which no rewards or feedback (other than the be impracticable: the learner simply cannot imitate the mas-
death of the game character) occurred until far into the game, ter’s behaviors. For an imperfect agent, the question of what
however, deep RL could not learn to play: feedback was too constitutes “optimal” behavior deserves close consideration:
sparse for RL to gain traction. It turned out that what was should the value of reaching a state be defined as the value
needed for AIs to succeed in these high-risk, low-feedback earned by acting optimally from that point forward, or as
environments was something parallel to human and animal the value earned by acting as well as the imperfect agent
curiosity and intrinsic motivation: the desire to explore new can from that point forward? These two concepts (referred to
environments and master new skills not for any reward, but in ML as off-policy vs. on-policy methods, respectively) can
from curiosity and love of novelty and surprise. The recog- yield quite different decision recommendations. For example,
nition that curiosity and novelty-seeking were important for a self-driving car trained with on-policy methods might stay
both infants and AIs to learn about how to act effectively away from a cliff edge even if driving quickly and without
in new situations led ML researchers to the further insight error along the cliff’s edge would in principle be a slightly
that the “newness” of situations could be defined and mea- more efficient route.
sured by the unpredictability of what would be observed Imitation learning raises the challenge of how an AI can
next (e.g., using estimated inverse log probabilities) and used learn to outperform the experts from which it learns. An
to reward novelty-seeking. Novelty-seeking together with architecture reminiscent of the dual-process “thinking, fast
surprise-seeking—experimenting further when actions pro- and slow” in people has proved successful in creating AI/ML
duce unexpected results until predictions are corrected and systems such as DeepMind’s AlphaGo Zero that taught itself
the initially surprising anomalies are no longer surprising— to play world champion-level Go in 3 days without any exam-
provides a computationally practicable version of machine ples of human games or instructions, guidance, or advice
curiosity. Amazingly, such curiosity-driven exploration per- from human experts. The key idea in this approach is to
forms well in video games and other tasks even when scores have a system repeatedly play against itself and learn how
or rewards are not revealed. That is, learning to predict to reliably imitate its own most successful strategies. To do
the effects of actions in a new environment is often a suc- so, a “fast thinking” component (implemented as a “value
cessful heuristic for guiding behavior even when extrinsic network” that estimates the value, i.e., probability of a win,
rewards and incentives are removed. The chapter closes with for each position, together with a “policy network” that
reflections on human gambling addiction, boredom, and the estimates the probability of selecting each possible move
downside of novelty-seeking, such as novelty-seeking AIs after further evaluation) is paired with a “slow thinking”
that abandon useful tasks to surf through TV channels when component (implemented using a Monte Carlo Tree Search
given the opportunity to do so. decision-optimization heuristic algorithm) that simulates
The third part of the book (Normativity) consists of three possible future plays for the most promising-looking possible
chapters on Imitation, Inference, and Uncertainty. Chap- next moves to help decide what to do next. Each component
ter 7 (Imitation) discusses imitation learning. Compared improves the performance of the other over time. Beyond
to learning by trial and error (including RL) and learning the context of board games—for example, in urban planning
BOOK REVIEW 427

or transportation system design—this approach may yield ious proposed criteria for “fairness” that may seem intuitive
super-human decision-making and design skills that reflect and desirable; that “humans place greater trust in transparent
the values of users but the search and optimization capabil- models even when these models are wrong and ought not to
ities of machines as people and AIs work together to create be trusted;” and that RL and other ML methods inevitably
options and select among them. make modeling assumptions (such as that selecting actions
Chapter 8 (Inference) deals primarily with inferring the does not reshape our goals and values) that may prove to be
goals, beliefs, and intentions of others from their observed erroneous. The modeling assumptions and data reflected in
behaviors and then using these inferences to help them predictive and prescriptive models may also become outdated
overcome obstacles and achieve their inferred goals. Infants even while the models based on them continue to be used.
as young as 18 months engage in collaborative behaviors However, human–machine cooperation and collaboration can
requiring such sophisticated cognition. “Inverse reinforce- ameliorate these limitations as people and AIs learn enough
ment learning” (IRL) algorithms endow AI with similar to work together safely and productively to achieve human
capacities for inferring goals and values (modeled as reward goals with super-human efficiency and effectiveness. The
functions) from observed behaviors, including inferring goals book concludes with 64 pages of notes and a 50-page bibli-
(e.g., for the safe operation of drones) even from imperfect ography providing sources and technical literature references
human attempts to achieve them. IRL has the advantage that for the preceding chapters.
goals are often much simpler to infer and describe than the For risk analysts, a useful aspect of The Alignment Prob-
complex actions and plans that might be undertaken in trying lem is its focus on clearly explaining technical challenges
to reach them. An AI that infers human goals as explana- and possible solutions (or, in some cases, the mathemati-
tions for their observed behaviors can use this understanding cal impossibility of solutions) for creating fair, transparent,
and its own skills to help achieve the inferred goals. In col- trustworthy data-driven prediction and decision-support mod-
laborations between AIs and humans, the AIs may need to els aligned with human values despite realistic limitations in
learn about human goals as the two cooperate and interact to available data and knowledge. The challenges and possibili-
complete goal-directed tasks. ties for developing and using trustworthy AI/ML algorithms
For risky applications ranging from diagnosing patients are similar in many ways to those of developing and apply-
to recommending actions for increasing the safety and effi- ing trustworthy risk analyses. (Indeed, the sentence from the
ciency of industrial processes, it is essential that an AI’s concluding chapter quoted above could well be rewritten as
predictions, classifications, and recommendations be accom- “Research on bias, fairness, transparency, and the myriad
panied by indications of confidence. Chapter 9 (Uncertainty) dimensions of safety now form a substantial portion of all
introduces techniques for training multiple ML models the work presented at risk analysis conferences.”) AI/ML
on available data and then using the extent of disagree- suggests some possible ways forward that have been lit-
ment among members of this model ensemble to estimate tle discussed in risk analysis to date. Deep learning teaches
the uncertainty in its predictions and recommendations. that accuracy of risk perceptions and risk assessment in
Autonomous vehicles guided by such methods automatically multi-layer artificial neural networks, as measured by average
slow down and drive more cautiously when they encounter prediction or misclassification errors, depends on extracting a
unfamiliar conditions and the multiple models make highly hierarchy of predictively relevant higher level features from
variable predictions. The chapter discusses the challenges low-level (e.g., sensor) input data, as well as on learning
of developing safe AI, noting that the precautionary prin- mappings from abstract features to risk predictions that mini-
ciple may fail to deliver useful recommendations when the mize prediction error (e.g., using stochastic gradient descent).
possibility of harm is unavoidable. AI safety research has Such feature extraction is also widely used, though less com-
discovered that keeping options open—taking only actions monly discussed, in risk modeling and risk assessment. An
that will allow several other goals (even randomly generated important part of the art of successful risk assessment is
ones) to be pursued in the future—is often a valuable heuris- understanding and using the relevant features of a situation
tic for avoiding premature commitment to actions that will be to predict risks.
regretted in hindsight. The chapter also touches on effective AI/ML algorithms in applications such as classifying a
altruism and on philosophical issues such as how to behave tumor as benign or malignant, or a transaction as legitimate
(or how machines should be designed to behave) when there or fraudulent, or estimating the probability of heart attack
is “moral uncertainty” about what is the right thing to do for a patient or of default for a borrower with stated lev-
and when the interests of potential far-future generations are els of confidence, use model ensemble techniques to gauge
considered in present decision-making. uncertainty about their own best predictions and to iden-
The book’s final chapter (Conclusion) discusses lessons tify anomalous and novel situations for which confident
and themes from the previous chapters. It notes that predictions cannot be made. Such uncertainty characteriza-
“Research on bias, fairness, transparency, and the myriad tion is also a key part of good practice in quantitative risk
dimensions of safety now forms a substantial portion of all assessment. Likewise, the roles of curiosity-driven explo-
the work presented at major AI and machine-learning confer- ration and intrinsic motivation, trial-and-error (reinforce-
ences.” Reflecting on these themes, the chapter reminds us ment) learning, and shaping of incentives to align behaviors
that no decision system, human or machine, can satisfy var- with goals in AI/ML also have parallels in human and animal
428 BOOK REVIEW

psychology and individual and organizational risk manage- How Artificial Intelligence Works and How We Can Har-
ment. The possibility of using imitation learning and IRL ness Its Power for a Better World. Henry Kissinger et al.’s
to infer goals and value trade-offs that are hard to articu- 2021 book The Age of AI and Our Human Future discusses
late and teach explicitly suggests a fascinating constructive at a less technical level implications of current and emerg-
approach for dealing with the inexpressible and ineffable in ing AI/ML technologies (including the revolutionary GPT-3
risk analysis—a topic not often emphasized in past discus- language model) and their implications for AI governance
sions of risk communication, but perhaps timely to consider and the future co-evolution of AI and humanity. The Align-
in a world where clearly stated and defended, widely accepted ment Problem is distinguished from these and other recent
values for use in risk analysis often seem increasingly hard to books by the clarity and depth of its exposition of technical
find, and yet collective decisions about threats to life, health, topics. It gives readers a real understanding of key research
and well-being must still be made. Finally, the questions issues and insights in dealing with uncertainty and biases
of how AI/ML agents can best serve human interests—for in AI/ML at a level not usually found in popular books.
example, by doing what they are told, or what they infer This is a triumph of exposition, making accessible to general
is intended, or what they predict will be most beneficial, readers key ideas and breakthroughs in AI/ML that are trans-
whether or not it is what is asked for—are analogous to forming our technological world. The author interviewed
questions that arise in risk governance. many of the innovators at the forefront of recent advances
The insights that The Alignment Problem offers into how in AI/ML and took careful notes. This extensive research
AI/ML systems are being designed to tackle these chal- has paid off in clear plain-English explanations of intellectu-
lenging questions may prove useful in thinking about how ally and technically exciting topics that are usually discussed
to improve human risk analyses. Both risk analysis and only in the technical literature. The exposition is intended for
AI/ML must confront similar challenges in using realistically novices—there are no equations or mathematical symbols—
imperfect data and knowledge to make, explain, and defend but it successfully conveys the challenge and progress that
predictions and risk management recommendations on behalf make the area enthralling to participants, explaining both why
of people who may not care about the underlying technical obstacles are hard to overcome and how various ingenious
details, but who want trustworthy predictions and recom- ideas have been developed to overcome them.
mendations with rationales that can be clearly explained if The Alignment Problem is an affordable (the paperback
desired. The Alignment Problem shows that recent and ongo- edition costs about $20), accessible introduction to how mod-
ing advances in AI/ML are likely to be part of the solution to ern AI/ML deals with prediction and decision-making under
these challenges, as well as increasing the urgency of finding uncertainty. It would make an ideal complement to techni-
pragmatic solutions that society can apply and accept. cal textbooks for an undergraduate or graduate course on
The Alignment Problem will appeal to readers who want AI/ML methods in risk analysis. Long after such a course
to understand the main ideas of how AI/ML algorithms work is over, students are likely to remember the research chal-
and the challenges that they must overcome in more detail lenges and applications, the people who tackled them, and
than is covered in other recent popular books. An easier intro- the research ideas that inspired them as explained in The
duction to the field is Polson and Scott’s 2019 book AIQ: Alignment Problem.