Artificial Intelligence as a Precise Art Artificial General Intelligence Research Institute AGI Workshop November 1, 2006 Eliezer S. Yudkowsky [Intro.] Good morning. My name is Eliezer Yudkowsky and I am a Bayesian. The title of this talk is "AI as a Precise Art", and as a Bayesian, I obviously don't think that intelligence consists of carrying out logical deductions on perfect knowledge of the exact state of the environment. But just because you're uncertain about the environment, you shouldn't be afraid to use precise methods to deal with your own uncertainty. I'd like to begin by presenting an object lesson in how not to deal with uncertainty. [Cards.] First, an experiment by Amos Tversky and Ward Edwards in 1966. The subjects were shown a succession of cards, each card either red or blue. 70% of the cards were blue, and 30% red, and the experimenters had randomized the sequence. What Tversky and Edwards discovered was that most subjects, asked to guess the color of each succeeding card, would guess blue around 70% of the time, and red about 30% - as if the subjects thought they could predict the sequence, which was in fact randomized. Even when the subjects were paid a nickel for each correct guess, they still guessed blue an average of only 76% of the time. A better strategy would have been to guess blue every time. If you guess blue all the time, you'll get a nickel 70% of the time, and over a thousand trials you'll earn seven hundred nickels. If you guess at the same frequency as the cards, then 70% of the time you'll guess blue and get a nickel 70% of those times, and 30% of the time you'll guess red and get a nickel 30% of those times, in which case your expected payoff over a thousand trials is five hundred and eighty nickels. Robyn Dawes, commenting on this experiment, said: "Despite feedback through a thousand trials, subjects cannot bring themselves to believe that the situation is one in which they cannot predict." I would reply that the subjects didn't know that the card sequence was randomized; it was rational for them to keep looking for patterns. That's what science is all about, looking for order. On the other hand, even if the subjects thought they noticed a pattern, they didn't need to pay to test their hypotheses. They could have thought, "If pattern XYZ is true, the next card will be red," and then went on betting on blue until some hypothesis had actually been confirmed. I would suggest that subjects simply didn't think of the strategy of betting on blue every time. It's a counterintuitive betting strategy because it doesn't resemble a typical sequence of cards. If you had perfect knowledge of the cards, then your optimal betting strategy would exactly resemble the cards. But, under conditions of uncertainty, your optimal betting pattern may look startlingly unlike a typical sequence of cards. A noisy environment does not imply a noisy strategy for dealing with the environment. Or to put it another way, a random key does not fit a random lock just because they are "both random". [Vinge/chess.] Vernor Vinge once suggested that we would be unable to predict the actions of any sufficiently smart Artificial Intelligence, or any entity smarter than human, on the grounds that if you can predict the exact action of an entity smarter than you are, you are necessarily at least that smart yourself because you can just do whatever a superintelligence would do in your shoes. Let's try applying this logic to the chess-playing program Deep Blue. You cannot build Deep Blue by programming in a good chess move for every possible chess position; first of all, there are too many chess positions, and second, the resulting program would play no better than you did. If Deep Blue's programmers had known exactly where Deep Blue would move in any particular position, they would necessarily have been at least that good at chess themselves; they could have just moved wherever Deep Blue would move. Now imagine that the programmers had said to themselves: "We want an AI that can transcend our own chess-playing abilities. If we knew what Deep Blue's move would be, it couldn't possibly play any better than us. So we'll use a random move generator, and then we won't know where Deep Blue will move. Problem solved!" And yet instead of using a random move generator, the programmers felt obligated to spend a huge amount of time and effort crafting a complex program that would output moves that were predictably so good the programmers couldn't predict them. As Marcello Herreshoff put it, "We never run any computer program unless we know something about the output, and we don't know the actual output." [Gilovich.] Here the point is that an unknown key does not fit an unknown lock. If you don't know a superior move, and you don't understand your own AI program, these two ignorances do not cancel out. This is an important point to keep in mind because, as Thomas Gilovich observes, conclusions people don't want to believe are held to higher standards than conclusions people want to believe. If someone doesn't want to believe something, they ask whether the evidence compels them to believe; if someone wants to believe something, they ask whether the evidence allows them to accept the conclusion. If Deep Blue's programmers hadn't understood their own program, and hadn't known which chess move would beat Kasparov, they might have convinced themselves that somehow things would magically work out, and no one could have proven to them that they were wrong. There are times when people can be very reluctant to relinquish their ignorance for exactly that reason - think of any creationist trying to claim that evolution is just a theory. [Jaynes.] Neither the uncertainty of the environment, nor the programmers' own uncertainty as to the best or satisficing action, imply that an AI's internal workings need to be mysterious. E. T. Jaynes, the father of maximum entropy methods, once observed that if we are ignorant about a phenomenon, this is a fact about us, not a fact about the phenomenon itself. A blank map does not correspond to a blank territory. Confusion exists in the mind, not in reality. Jaynes labeled this the "Mind Projection Fallacy" - the error of treating our own cognitive states as if they were properties of external objects. The mysteriousness of an AI method cannot be responsible for the AI working well, because our uncertainty about the AI is a property of us, not a property of the AI. Why is this important to emphasize? Consider this quote from the eminent nineteenth-century physicist, Lord Kelvin. The big mystery in the nineteenth century was what we would call biology, that is, the extraordinary, inexplicable difference between things that are alive and things that aren't. Lord Kelvin said: [Kelvin.] "The influence of animal or vegetable life on matter is infinitely beyond the range of any scientific inquiry hitherto entered on. Its power of directing the motions of moving particles, in the demonstrated daily miracle of our human free-will, and in the growth of generation after generation of plants from a single seed, are infinitely different from any possible result of the fortuitous concurrence of atoms... Modern biologists were coming once more to the acceptance of something and that was a vital principle." If you consider this quote, it looks like Lord Kelvin got a tremendous emotional kick out of the mysteriousness of biology. Infinitely beyond the range of any scientific inquiry! Not just a little beyond the range of science, but infinitely beyond! When you get that much satisfaction from mysteriousness, you aren't likely to look kindly on any darned answers that come along. Something that seems sufficiently mysterious creates an aesthetic experience, and this aesthetic experience is very dangerous because you can become attached to it. Lord Kelvin saw a mysterious question and thought that it had a mysterious answer. But there are no phenomena that are mysterious in themselves. A blank map does not correspond to a blank territory. Lord Kelvin's mistake is closely related to the mistake of guessing a mixed sequence of blue and red cards. The observed nature of life seemed very mysterious, so the vitalists hypothesized a cause, the elan vital, which was equally sacred and mysterious. A mysterious effect with a mundane cause is just as counterintuitive as a mixed pattern of cards giving rise to an optimal betting strategy of all-blue. When things are unknown we think they are more beautiful, more powerful, because of that, and this is dangerous. The only reason to become excited about not knowing something is that you intend to solve the puzzle in the immediate future. The glory of glorious mystery is to be destroyed, after which it ceases to be mystery. So I find it disturbing when people, for example, praise the power of genetic programming because they don't know how the resulting programs work, or praise the power of neural networks because they don't know how the network analyzes the data. Or when someone says that the unpredictability of how intelligence emerges in the brain is the key to human creativity. Your ignorance of a phenomenon is a fact about you, not a fact about the phenomenon, so your inability to explain how a process works cannot make it work better. I have an ulterior motive for saying all this, which is that I need precise understanding of a class of problems where neural nets and genetic programming will both automatically fail because they are not certain to succeed. [Good.] In 1965, I. J. Good hypothesized an effect he named the "intelligence explosion": If an AI is sufficiently smart, said Good, it can rewrite its own source code to make itself smarter; and then, having become smarter, rewrite its own source code again, becoming in smarter, and so on ad infinitum. I. J. Good only talked about AI, but in principle the concept of an intelligence explosion generalizes further: for example, humans augmented by direct brain-computer interfaces, using their improved intelligence to design better brain-computer interfaces. In any case, I. J. Good therefore predicted that once an AI became sufficiently smart, a positive feedback cycle would rapidly take it to what Good called "ultraintelligence". To me, Good's intelligence explosion scenario has a great deal of intuitive plausibility. And I definitely think that reflectivity and self-modification is one of the great keys to AI. Thus, I am very interested in the question of how a mind can remain stable while rewriting its own source code. [Trans1] An analogy: Take a single transistor in your computer's CPU, and ask, given that the transistor is operating today, what is the chance that it will still be functioning tomorrow? Many faults could destroy the transistor; the heatsink could fail and fry the chip, lightning could strike the power line, someone might throw the computer out the window. That doesn't happen every day, but we can loosely state that it happens more often than once in every three thousand years. So if you look at one lone transistor, nothing else, and ask the probability that it will still function tomorrow, the chance of failure is clearly greater than one in a million. [Trans2] But there are millions of transistors in the chip - say 155 million for a top-end CPU in 2006. So, we conclude, if each lone transistor has a failure probability greater than one in a million, then the chance of the entire chip working throughout the day is infinitesimal. [Trans3] The flaw in this reasoning is that the probability of failure is not conditionally independent between transistors. A heatsink failure will destroy many transistors at the same time, so if I'm told that one transistor is operating correctly, my probability that the neighboring transistors operate correctly goes way up. The potential causes of failure that are conditionally independent for each transistor operation must have probabilities very close to zero. A computer chip needs to perform a mole of transistor operations, error-free, every two weeks or so. [Trans4] For an AI to remain stable while rewriting its own source code, the cumulative probability of catastrophic error has to be bounded over millions of sequential self-modifications. Imagine that when we get to the point where the millionth version of the source code is designing the millionth- plus-one version, nothing went wrong on any of the previous million code changes. This doesn't tell you that the probability of the whole AI failing was zero, but it tells you that the independent component in the probability of failure on each round of self-modification was effectively zero. [Chip1] How can we possibly achieve this? Well, how do modern chip engineers design a machine that has 155 million interdependent parts, and that can't be fixed once it leaves the factory? [Chip2] The glorious thing about formal mathematical proof is that a formal proof of ten billion steps is as reliable as a proof of ten steps. The proof is just as strong as its axioms, no more, no less. This doesn't mean the conclusion of a formal proof is perfectly reliable. Your axioms could be wrong. But it is possible, even in the real world, for a formal proof of ten billion steps to be right. [Chip3] Human beings can't check a purported proof of ten billion steps because we're too easily bored, too slow, and too likely to miss something. [Chip4] And present-day theorem-proving techniques aren't powerful enough to design a chip and prove it correct; they undergo an exponential explosion in the search space. [Chip5] Human mathematicians can prove theorems much more complex than automated theorem-provers can handle. [Chip6] But human mathematics is informal and unreliable, and sometimes we find a flaw in a previously accepted proof. [Solution.] So the humans choose the lemmas, a complex theorem-prover generates a formal proof, and a simple verifier checks the steps. That's how modern engineers build reliable machinery with 155 million interdependent parts. It requires a synergy of human intelligence and computer algorithms because currently neither suffices on its own. If an AI had a similar combination of abilities - both the ability to avoid an exponential explosion, and the ability to verify a proof with extreme reliability - then it might be able to execute a mole of sequential self- modifications with a bounded cumulative failure probability. But if you require this, it rules out vast swathes of potential AI techniques that are opaque, or stochastic. You can't rely on genetic programming. You can't rely on trial and error. Success on a thousand trials may argue probabilistically for success on the thousandth-and-first trial if the trials are independent and identically distributed, but success on a thousand trials does not imply success on a million trials. Modern theorem-proving techniques do not begin to address the kind of thought that needs to happen, but whatever method we do use will need to work deterministically rather than Page 7 stochastically. Usually when I say this, someone replies that we need chaotic minds to handle a chaotic world, hence the first half of this talk. [Det1] The inside of a chip is a nearly deterministic environment; we ought to be able to succeed with nearly deterministic probability in how the AI rewrites itself, even if we can only succeed probabilistically in the external environment. [Det2] The AI wouldn't prove that its next version would succeed in the real world, but it would prove something about how its code tried to succeed in the real world. If you imagine that the AI currently wants to help little old ladies across the street, the AI wouldn't try to prove that its future self would deterministically succeed in helping the lady across the street with probability 1, but the AI would prove that its future self would do its best to help little old ladies across the street rather than trying to steer them into oncoming trucks. [Det3] The current AI wouldn't know the exact move its future self makes, because the AI's future self is intended carry out a more effective search for good moves, but the current AI would understand the criterion that generated the move. [Formal] That's an informal argument. I'd like to formalize it, but I've been having trouble doing so. Current decision theory revolves around expected utility maximization, and that formalism can't handle self-reference. You can use expected utility to evaluate the expected utility of an action, or evaluate the expected utility of source code that chooses between actions. But the current formalism has no way to wrap all the way around - no way to modify the source code that does the modifying. The algorithm breaks down on an infinite recursion. The problem is not representable within the system. [Formal2] And yet humans think about themselves all the time; we have no difficulty using the word "I" in a sentence. I can think about what it would be like to modify my own neural circuitry, and my brain doesn't crash. How are humans handling the self-embedding? The significance of this problem is not that the classical algorithm is too inefficient for the real world, as is often the case with theoretical Bayes. Rather, we don't know how to solve this problem even given infinite computing power. We don't have a deep understanding of the structure of the problem; we don't know what kind of work needs to be performed. [Reflective] I expect that anyone here can think of at least three ad-hoc solutions for getting an AI to modify its own source code. But is an ad hoc solution going to last through a million sequential self-modifications? More importantly, there is something here we don't understand. We have a very deep understanding of how the Bayesian rules for probability theory arise; we know literally dozens of different constraint sets that will yield the same set of rules for probability theory. Similarly for generating the axioms of expected utility maximization. Jaynes, master of maximum entropy, thought very deeply about the nature of ignorance. Judea Pearl's book on Causality represents an extraordinarily deep understanding of conditional independence, its relation to human perceptions of causality, and why that makes Bayesian networks work well on some kinds of problems. What we need is this kind of deep understanding for reflectivity and self-modification. Designing AI will only become a precise art when we understand how to make an AI design itself. That is my current project, to develop a reflective decision theory. I have not solved this problem but I am working on it, and I am very interested in any suggestions you might have, and especially if you can think of any references I should read. Thank you.