Lecture 1
Lecture 1
AI Transformation
It is hard these days to escape hearing about AI — in the news, on social
media, in cafe conversations. We see both reports triumphs of superhuman
performance in games such as Jeopardy! (IBM Watson, 2011) and Go
(DeepMind’s AlphaGo, 2016), as well as on benchmark tasks such as reading
comprehension, speech recognition, face recognition, and medical imaging
(though it is important to realize that these are about performance on one
benchmark, which is a far cry from the general problem).
AI Speculation
We also see speculation about the future: that it will bring about
sweeping societal change due to automation, resulting in
massive job loss, not unlike the industrial revolution, or that AI
could even surpass human-level intelligence and seek to take
control.
While media hype is real, it is true that both
companies and governments are heavily
investing in AI. Both see AI as an integral part of
their competitive strategy.
The AI Index is an effort to track the progress of AI over time. In 2017, the AI Index published a report, showing
essentially that all curves go up and to the right. Here are a few representative samples.
• Why might one even think that it is even possible to capture this rich behavior?
• While AI is a relatively young field, one can trace back some of its roots back to
Aristotle, who formulated a system of syllogisms that capture the reasoning
process: how one can mechanically apply syllogisms to derive new conclusions.
• In the 1940s, actual devices that could actually carry out these computations
started emerging. So perhaps one might be able to capture intelligent behavior via
a computer. But how do we define success?
• Can machines think? This is a question that has occupied philosophers since
Descartes. But even the definitions of ”thinking” and ”machine” are not clear. Alan
Turing, the renowned mathematician and code breaker who laid the foundations of
computing, posed a simple test to sidestep these philosophical concerns.
• In the test, an interrogator converses with a man and a machine via a text-based
channel. If the interrogator fails to guess which one is the machine, then the
machine is said to have passed the Turing test. (This is a simplification but it suffices
for our present purposes.)
• Although the Turing test is not without flaws (e.g., failure to capture visual and
physical abilities, emphasis on deception), the beauty of the Turing test is its
simplicity and objectivity. It is only a test of behavior, not of the internals of the
machine. It doesn’t care whether the machine is using logical methods or neural
networks. This decoupling of what to solve from how to solve is an important
theme in this class.
• AI started out with a bang. People were ambitious and tried to develop things like
General Problem Solver that could solve anything. Despite some successes, certain tasks
such as machine translation were complete failures, which lead to the cutting of funding
and the first AI winter. It happened again in the 1980s, this time with expert systems,
though the aims were scoped more towards industrial impact. But again, expectations
exceeded reality, leading to another AI winter. During these AI winters, people eschewed
the phrase ”artificial intelligence” as not to be labeled as a hype-driven lunatic.
• In the latest rebirth, we have new machine learning techniques, tons of data, and tons
of computation. So each cycle, we are actually making progress. Will this time be
different?
• We should be optimistic and inspired about the potential impact that advances in AI can
bring. But at the same time, we need to be grounded and not be blown away by hype.
This class is about providing that grounding, showing how AI problems can be treated
rigorously and mathematically. After all, this class is called ”Artificial Intelligence:
Principles and Techniques”.
• There are two ways to look at AI philosophicaly.
• The first is what one would normally associate with the AI: the science and
engineering of building ”intelligent” agents. The inspiration of what constitutes
intelligence comes from the types of capabilities that humans possess: the ability
to perceive a very complex world and make enough sense of it to be able to
manipulate it.
• The second views AI as a set of tools. We are simply trying to solve problems in
the world, and AI techniques happen to be quite useful for that.
• However, both views boil down to many of the same day-to-day activities (e.g.,
collecting data and optimizing a training objective), the philosophical differences
do change the way AI researchers approach and talk about their work. And
moreover, the conflation of these two can generate a lot of confusion.
The same computer vision
techniques used to recognize
objects can be used to tackle social
problems. Poverty is a huge
problem, and even identifying the
areas of need is difficult due to the
difficulty in getting reliable survey
data. Recent work has shown that
one can take satellite images (which
are readily available) and predict
various poverty indicators.
• Machine learning can also be used to optimize the energy efficiency of datacenters, which given the hunger for
compute these days makes a big difference. Some recent work from DeepMind show how to significantly reduce
Google’s energy footprint by using machine learning to predict the power usage effectiveness from sensor
measurements such as pump speeds, and using that to drive recommendations.
• Other applications such as self-driving
cars and authentication have high-stakes,
where errors could be much more
damaging than getting the wrong movie
recommendation. These applications
present a set of security concerns.
• The main conceptually magical part of learning is that if done properly, the trained
model will be able to produce good predictions beyond the set of training examples. This
leap of faith is called generalization, and is, explicitly or implicitly, at the heart of any
machine learning algorithm. This can even be formalized using tools from probability and
statistical learning theory.
• The idea of a reflex-
based model simply
performs a fixed sequence
of computations on a given
input. Examples include
most models found in
machine learning from
simple linear classifiers to
deep neural networks. The
main characteristic of
reflex-based models is that
their computations are
feed-forward; one doesn’t
backtrack and consider
alternative computations.
Inference is trivial in these
models because it is just
running the fixed
computations, which
makes these models
appealing.
• Reflex-based models are too simple for tasks that require more forethought
(e.g., in playing chess or planning a big trip). State-based models overcome this
limitation.
• The key idea is, at a high-level, to model the state of a world and transitions
between states which are triggered by actions. Concretely, one can think of
states as nodes in a graph and transitions as edges. This reduction is useful
because we understand graphs well and have a lot of efficient algorithms for
operating on graphs.
• Search problems are adequate models when you are operating in
environment that has no uncertainty. However, in many realistic settings,
there are other forces at play.
• Bayesian networks are variable-based models where variables are random variables
which are dependent on each other. For example, the true location of an airplane Ht and
its radar reading Et are related, as are the location Ht and the location at the last time
step Ht−1. The exact dependency structure is given by the graph structure and formally
defines a joint probability distribution over all the variables. This topic is studied
thoroughly in probabilistic graphical models.
• Our last stop on the tour is logic. Even more so than variable-based models, logic
provides a compact language for modeling, which gives us more expressivity.
• It is interesting that historically, logic was one of the first things that AI
researchers started with in the 1950s. While logical approaches were in a way
quite sophisticated, they did not work well on complex real-world tasks with noise
and uncertainty. On the other hand, methods based on probability and machine
learning naturally handle noise and uncertainty, which is why they presently
dominate the AI landscape. However, they have yet to be applied successfully to
tasks that require really sophisticated reasoning.
• In this course, we will appreciate the two as not contradictory, but simply
tackling different aspects of AI — in fact, in our schema, logic is a class of models
which can be supported by machine learning. An active area of research is to
combine the richness of logic with the robustness and agility of machine learning.