Artificial Intelligence (AI)
Artificial Intelligence (AI)
machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success. John McCarthy, who coined the term in 1956, defines it as "the science and engineering of making intelligent machines. The field was founded on the claim that a central property of humans, intelligence the sapience of Homo sapienscan be so precisely described that it can be simulated by a machine. This raises philosophical issues about the nature of the mind and the ethics of creating artificial beings, issues which have been addressed by myth, fiction and philosophysince antiquity. Artificial intelligence has been the subject of optimism, but has also suffered setbacks and, today, has become an essential part of the technology industry, providing the heavy lifting for many of the most difficult problems in computer science. AI research is highly technical and specialized, deeply divided into subfields that often fail in the task of communicating with each other. Subfields have grown up around particular institutions, the work of individual researchers, the solution of specific problems, longstanding differences of opinion about how AI should be done and the application of widely differing tools. The central problems of AI include such traits as reasoning, knowledge, planning, learning, communication, perception and the ability to move and manipulate objects. General intelligence (or "strong AI") is still among the field's long term goals. Currently, no computers exhibit full artificial intelligence (that is, are able to simulate human behavior). The greatest advances have occurred in the field of games playing. The best computer chess programs are now capable of beating humans. In May, 1997, an IBM super-computer called Deep Blue defeated world chess champion Gary Kasparov in a chess match. In the area of robotics, computers are now widely used in assembly plants, but they are capable only of very limited tasks. Robots have great difficulty identifying objects based on appearance or feel, and they still move and handle objects clumsily. Natural-language processing offers the greatest potential rewards because it would allow people to interact with computers without needing any specialized knowledge. You could simply walk up to
a computer and talk to it. Unfortunately, programming computers to understand natural languages has proved to be more difficult than originally thought. Some rudimentary translation systems that translate from one human language to another are in existence, but they are not nearly as good as human translators. There are also voice recognition systems that can convert spoken sounds into written words, but they do not understand what they are writing; they simply take dictation. Even these systems are quite limited -- you must speak slowly and distinctly. In the early 1980s, expert systems were believed to represent the future of artificial intelligence and of computers in general. To date, however, they have not lived up to expectations. Many expert systems help human experts in such fields as medicine and engineering, but they are very expensive to produce and are helpful only in special situations. There are two goals to AI, the biggest one is to produce an artificial system that is about as good as or better than a human being at dealing with the real world. The second goal is more modest: simply produce small programs that are more or less as good as human beings at doing small specialized tasks that require intelligence. To many AI researchers simply doing these tasks that in human beings require intelligence counts as Artificial Intelligence even if the program gets its results by some means that does not show any intelligence, thus much of AI can be regarded as "advanced programming techniques". The characteristics of intelligence given here would be quite reasonable I think to most normal people however AI researchers and AI critics take various unusual positions on things and in the end everything gets quite murky. Some critics believe that intelligence requires thinking while others say it requires consciousness. Some AI researchers take the position that thinking equals computing while others don't. A much more meaningful method of determining whether or not a computer is thinking would be to find out exactly what people are doing and then if the artificial system is doing the same thing or so close to the same thing as a human being then it becomes fair to equate the twoOne of the positions on intelligence that I mention in this section is that it requires consciousness and consciousness is produced by quantum mechanics. For those of you who have been denied a basic education in science by our schools quantum mechanics goes like this. By the beginning of the 20th Century physicists had discovered electrons, protons and various other very small
particles were not obeying the known laws of Physics. After a while it became clear that particles also behave like waves. The most well-known formula they found to describe how the particles move around is the Schrodinger wave equation, a second order partial differential equation that uses complex numbers. Since then quantum formulas have been developed for electricity and magnetism as well. Although as far as anyone can tell the formulas get the right answers the interpretation of what's really going on is in dispute. The formulas and some experiments apparently show that at times information is moved around not just as the very slow speed of light but INSTANTLY. Results like this are very unsettling. One of the developers of QM, Neils Bohr once said: "Anyone who isn't confused by quantum mechanics doesn't really understand it." But, relax, there is virtually no QM in this book, certainly not the formulas and nothing to get confused about. On the other hand research into applying quantum mechanics to human thought may soon make it necessary to include QM in Psychology and AI booksBottom lining it then, there are terrible disagreements between various camps in and out of AI as to what really constitutes intelligence and it will be a long time before it is sorted out.
much more powerful computing device than anyone has suspected. I've seen estimates that the brain processes from 10^23 to 10^28 bits per second with this architecture, well beyond the 10^16 bits per second estimate with the neuron as a simple switch architecture. Another suggestion is that the microtubules could be set up to allow optical computing. For more on the microtubules see The Quantum Basis of Natural Intelligence? page and/or the article by Dimitri Nanopoulos available from Los Alamos National Laboratory For the sake of trying to produce intelligent behaviour however really all that's being done is work with artificial neural networks where each cell is a very simple processor and the goal is to try and make them work together to solve some problem. That's all that gets covered in this book. Many people are skeptical that artificial neural networks can produce human levels of performance because they are so much simpler than the biological neural networks.
Symbol Processing
Symbol Processing has been the dominant theory of how intelligence is produced and its principles can be used to do certain useful tasks. Using this method you write programs that work with symbols rather than numbers. Symbols can be equal or not equal and that is the only relation defined between symbols, so you can't even ask if one is less than another much less do arithmetic with them. Of course in symbol processing programs the symbols do get represented by integers. These researchers have never been interested in how to implement symbol processing using neural networks. Besides the use of symbols the other key principle involved is that knowledge in the mind consists of a large number of rules. Symbol processing adherents make the bold claim that symbol processing has the necessary and sufficient properties to display intelligence and to account for human thought processes. Even though they say that symbols can only be equal or not equal and there are no other relations defined for them quite often "symbolic" programs end up using integers or reals as part of the program and its called symbolic anyway even though by the strictest standard doing so no longer makes the program wholly symbolic, only mostly symbolic. Relatively little has been released on the CYC project. One FAQ on CYChas been produced by
Association
Perhaps the most important principle in all of AI is found here, when one idea has been associated with another then when one comes up again the other will too. This was put very nicely by William James in his works on Psychology first published in 1890 (Yes, eighteen ninety!). There are several examples given. James' books are well worth reading if you ever have the time to spare. His other ideas come up in cognitive science fairly often.
Neural Networking
When people first discovered that nerve cells pass around pulses of electricity many people ASSUMED that this activity was used to produce thought. Heck, what else was there, if you were a materialist? (The only alternative was the idea that there is a human soul and the materialists didn't want anything to do with that!) Now there is a new theory floating about because of recent discoveries about cells. Almost every cell has a cytoskeleton built out of hollow tubes of protein. The tubes are called microtubules and there is now a proposal that these structures are involved in quantum mechanical computing. If thinking is done this way then the human mind is a
David Whitten. The official CYC WWW site is Cycorp, Inc. My opinion is that CYC may be quite useful for certain applications but it still won't be operating like a human being or anywhere near as well as a human being so in those most important senses it will be a failure.
Heuristic Search
Another of the principles used in AI is heuristic search. In heuristic search (as opposed to blind search) you use some information about the problem to try and solve it quickly.
The idea in this chapter is to do a little pattern recognition with concrete down to Earth problems likes recognizing letters, digits and words. The more abstract version of pattern recognition comes along in chapter 3. You could make a point that chapter 3 should come before chapter 2 then, except I think abstraction should come second not first. Also in this chapter it is shown that "simple" pattern recognition like recognizing letters is not so simple after all: it is tied up with everything we know, all the way up to the highest levels.
Recognizing Words
Recognizing letters and symbols is not done in a vacuum, it is normally done as part of some task where your thinking biases you to see certain patterns. In this section I show how a more sophisticated type of network, an interactive activation network, can be used to capture this bias in
Pattern Recognition
a realistic way. The example comes from a famous work by Rumelhart and McClelland. I did a not at all fancy version of the interactive activation network and the C source, DOS binaries and elementary instructions. It can be used for some of the exercises.
methods must be used. The simplest such algorithm is the nearest neighbor classifier. You simply take your unknown (as a vector) and compute the distance from it to all the other patterns you have the answer for. The answer you guess for the unknown pattern is the class of the closest pattern. In the nearest neighbor algorithm you have to keep a large inventory of patterns and their classifications so searching through this inventory for the closest match may take quite a long time. Lately some interesting variations on the nearest neighbor method have been developed that are much faster because you store fewer known patterns. The idea is to scatter a few prototype points, that is representative patterns, around the space for each class. There is a whole series of algorithms to do this that are called learning vector quantization (LVQ) algorithms. Perhaps the simplest of these is one called decision surface mapping (DSM) and this one is covered in the text. At one time I had the LVQ1 algorithm in the text as well in another chapter called Other Neural Networking Methods but in the end I thought it was best to cut this chapter from the book and make it available on the net. I've done a nearest neighbor program in C with DOS binaries that implements the k-nearest neighbor algorithm, DSM and LVQ1. More LVQ software is available from the group that started it all, the LVQ/SOM Programming Team of the Helsinki University of Technology, Laboratory of Computer and Information Science, Rakentajanaukio 2 C, SF-02150 Espoo, Finland. It comes as UNIX source and as a DOS self-extracting file. There are also other programs that can be used with this LVQ software.
Additional Perspective
I thought it best to do pattern recognition from the standpoint of identifying letters of the alphabet because it is easy to relate to this and it shows how many levels of knowledge are involved in the process but there is a lot more to the subject. First, the methods given here are not completely realistic in that human pattern recognition is much more complex than these algorithms. My guess is that the human algorithms will prove to be better than any of the simple man-made ones. Second, its also important to be able to interpret pictures of arbitrary scenes, not just letters and algorithms however this important topic was not covered here because you just can't do everything and because the principles involved are quite similar.
Hopfield Networks
Besides the Hopfield network this also contains the Boltzman machine relaxation algorithm. This Boltzman machine idea is really very intriguing because of the way it looks up memories. Its radically different from conventional computer architectures. While its very interesting theoretically there have been very few applications of this method. In the highly recommended Nanopoulos article he says (in effect) that the microtubules can form a Hopfield network. (I checked with physicist Jack Sarfatti on this to make sure I was interpreting Nanopoulos correctly.) Each tubulin molecule would represent a 1 or 0 depending on which state its in. I can't imagine how the weights get represented in this system and especially how a molecule at one end of the MT fiber can influcence a molecule at the other end so I asked Jack the following:
if the MTs work this way how does one unit on the far left manage to influence a unit on the far right? (I can imagine one unit affecting a neighboring unit but that is all.) What are the weights between the MT units and how are they created? Is there quantum "magic" at work here? His reply was: Yes. We have to look at the quantum version in which the quantum pilot waves provide long-range nonlocal links between the classical switches. Exactly how to do this mathematically I don't know right this minute. So APPARENTLY some quantum "magic" allows a tubulin molecule to connect to every other tubulin molecule giving a Hopfield type network, a pretty neat way to implement the algorithm. Nanopoulos also notes that the physical temperature of the tubulin fibers and the strength of the electric fields surrounding them change the characteristics of the fiber making possible (*I* think) the Boltzman machine relaxation algorithm. I'd like to see a supercomputer simulate this numerically, its a good project for someone I think. The Hopfield and Boltzman programs are available as C source and DOS binaries
Since backprop is so useful I've spent a lot of time working on a good version of backprop and it is available online.
Back-Propagation
Again, backprop is really important because of the large number of problems it can be applied to. Notice how many times it was discovered before it finally caught on. If you're familiar with regression notice how backprop is really just a version of non-linear regression. There are loads of ways to speed up the plain algorithm and generally speaking you should use them not the plain algorithm however sometimes the plain version will give the best results. At this point in time there is no way to know ahead of time which improved version or the plain version will be best. If you want a tutorial on backprop that is much the same as the one in the text get my postscript file or see the newer HTML file. The Rprop and Quickprop papers are available online, for these and more material on backprop see my Backpropagator's Review.
Rules
This chapter is an abrupt shift to symbol processing techniques however note that to a considerable extent it is a symbol processing analog to the material in the two pattern recognition chapters. Rules are simply another way to do pattern recognition and they have advantages over networks in many types of problems. They look a lot like small neural networks. If you want to put more emphasis on symbol processing in your course then you could actually start with section 4.2 go on to the online PROLOG chapter or the online LISP chapter.
Introduction
Even though I think its quite reasonable to say that expert systems do the job of a human expert and can be made using either connectionist or symbolic techniques the term originated with the symbolic systems and therefore some people tend to think of expert systems as symbolic and not connectionist. I needed a notation to describe symbol processing algorithms and rather than make up my own I thought it was better to use an established symbolic language, PROLOG, to do this. As I've already said PROLOG comes with some pattern matching capabilities builtin and this makes it more convenient to use for this purpose than LISP. Sometimes people take to one of these two languages better than the other. Personally if I had to write large programs I'd rather use LISP because its easier to trace what its doing when a bug comes up. I've had a lot of trouble trying to follow debugging output from the PROLOG interpreters I've used.
down to a few rules. Unfortunately the compression of facts about the world down to rules leaves you with problems like conflict resolution.
Conflict Resolution
The only comment I can make here is something that I will really make a point of in Chapter 7. There are many ways to do pattern recognition with cases, rules and networks. Rules are in effect a compressed form of knowledge, that is, you can take many cases and compress (think of the UNIX compress program or the JPEG compression algorithm for pictures) them
Logic
This chapter basically deals with Resolution based theorem proving, a more general version of reasoning than the "if A then B" type rules you find in PROLOG. The chapter uses a notation from a public domain Resolution based theorem proving program called Otter from Argonne National Laboratories. I have tried it a little and it is fairly nice. They have C code, a 32-bit DOS binary, a Macintosh version, user manuals in various formats and some sample problems. This is available by http and by ftp. A better source of sample problems is the Thousands of Problems for Theorem Provers collection by Geoff Sutcliffe and Christian Suttner. This collection is rather large, a gzipped file of 1M which, when gunzipped comes to 18M. In all likelihood the only problems you will want are the ones in the puzzles directory. This package is available from the University of Munich by ftp and by http or from James Cook University, Australia, by ftp and by http.
Other Logics
Real human logic is not discrete, it is analog and conclusions come with a confidence factor. For an example of how continuous valued logic can be formalized see "Beyond Associative Memories: Logics and Variables in Connectionist Models", by Ron Sun, from the Ohio State Neuroprose Archive. (The book only mentions this in one VERY short paragraph.) There is just the merest mention of the work of Johnson-Laird and Byrne whose research on people indicate that people reason by constructing models of the situation and looking for counter-examples. I've never had time to read more than a little of their book, Deduction yet it seems to fit in rather well with my bias toward image processing. In a previous work called Mental Models Philip Johnson-Laird proposed the idea of mental models based on his research. In
Controlling Search
Just blindly using the possible rules to solve a problem is not a good way to quickly find an answer. This section gives several heuristics that can speed the search (proof by contradiction, set of support, weighting and PROLOG's strategy). These help, but there is a lot left to be desired.
the introduction to Deduction they write that "Mental Models was well-received by cognitive scientists
Complex Architectures
The book starts with some of the simplest pattern recognition algorithms and moves on to higher levels of thought and reasoning. For the most part (not completely!) chapters 2 through 5 deal with one step pattern recognition type problems. But that is not good enough. Most problems involve doing many steps of pattern recognition and we really need to find an architecture that will do many steps of pattern recognition to solve one problem or to cope with the real world. Unfortunately not a lot can be done with this subject. What is done is to introduce the idea of a short term memory that works in conjunction with a long term memory. Besides the architecture problem there is also the problem of how to represent thoughts within such a system. Symbol processing advocates simply propose structures of symbols. Neural networking advocates have yet to establish good methods for storing structures of thoughts (unless you consider pictures as structured objects but hardly anyone has worked on this).
the use of symbols. I believe that Harnad is right in saying that symbols must be defined in terms of pictures, the idea of symbol grounding. The paper by Steven Harnad called, "The Symbol Grounding Problem" is a compressed text file and is available from the Ohio State neuroprose archive.
Flow of Control
The point here is that the human architecture is interrupt-driven, analog and fallible.
A variation on RAAM called SRAAM for Sequential RAAM turns trees into a linear list and then stores the list in a RAAM-like procedure. See the article "Tail-Recursive Distributed Representations and Simple Recurrent Networks" by Stan C. Kwasny and Barry L. Kalman available by http from: Washington University in St. Louis. This is not covered in the book. A newer scheme that may be more realistic and which I don't cover in this section is in the article "Holographic Reduced Representations", by Tony Plate, published in the May 1995 IEEE Transactions on Neural Networks and also online from the University of Toronto. Actually if you FTP to Tony Plate's directory at the University of Toronto you'll find this topic comes in documents of various sizes including a whole thesis.
Of course the psychology of using rules and networks to compress data is interesting in itself. First there is the traditional scientific goal of trying to describe how the world operates using as few equations (their version of rules) as necessary. I'm sure this is a major motivation for the symbolic AI community to find rules to express all that is known about the world. The other bit of psychology involved in compressing knowledge is that its really convenient for von Neumann style computers to work that way. All the computation has to go through a single CPU and to scan a large database as you do in a nearest neighbor method takes a lot of time. Thus a few rules can deal with a problem much faster than an exhaustive search of memory. But what if you have a machine architecture that can search for the nearest match easier than it can deal with rules? For now there are systems like the Connection Machine that can search in parallel. Then there is the scheme used in the Boltzman machine where given part of a pattern you can come up with the rest of the pattern using one cooling session. So it strikes me that between the scientific tradition of forming rules and the constraints of the von Neumann architecture AI researchers have been mentally trapped into the condensed/compressed knowledge scenario. But in this chapter try to free yourself from such thinking. Consider the virtues of a different architecture, one where you don't even have to bother to come up with rules! Quite a liberating concept if you ask me! But if people operate mostly with cases it poses quite a problem for AI researchers because to come close to duplicating human capabilities it will require vast amounts of storage space as well as an appropriate parallel processing architecture. I just ran into a WWW site that promotes Case Based Reasoning: Welcome to AI-CBR
being and describing one by saying that a person is 95% (or whatever percent) water, x% nitrogen, y% calcium, z% carbon, etc. I chose the word condensed because of its use for cans of condensed soup that you buy at the store. In brief this section gives arguments for and against the use of condensed knowledge.
case out there in the future he had the solution in his mind. Now in the present all he has to do is "remember" it. If information can be passed along at faster than the speed of light this is quite possible. If that is what is going on then it gives people a heuristic that no ordinary computer can duplicate. If QM allows you to remember the future you have to wonder why people don't remember the future all the time, especially useful things like the winning lottery numbers? My answer to this is that if anyone could remember the future with any degree of accuracy we would not have lotteries because everyone would win and having a lottery would be pointless. If you're going to have a lottery the "universe" has to find a compromise between everyone remembering the winning numbers and no one winning the lottery. It has to be rare for memories from the future to appear in the present. Mathematicians and other scientists are lucky in that when they have a flash of insight they have the means to prove it correct here in the present.
REFERENCES
www.mtsu.edu/~sschmidt/Cognitive/ Pattern%20Recognition.pdf www.stats.ox.ac.uk/~ripley/PRNN/C ompl.pdf www.filestube.com/p/pattern+recog nition+pdf www.cse.unr.edu/~bebis/.../Pattern RecognitionReview/lecture.pdf