IN JANUARY, THE British-American computer scientist Stuart Russell drafted and became the first signatory of an open letter calling for researchers to look beyond the goal of merely making artificial intelligence more powerful. “We recommend expanded research aimed at ensuring that increasingly capable AI systems are robust and beneficial,” the letter states. “Our AI systems must do what we want them to do.” Thousands of people have since signed the letter, including leading artificial intelligence researchers at Google, Facebook, Microsoft and other industry hubs along with top computer scientists, physicists and philosophers around the world. By the end of March, about 300 research groups had applied to pursue new research into “keeping artificial intelligence beneficial” with funds contributed by the letter’s 37th signatory, the inventor-entrepreneur Elon Musk.
Russell, 53, a professor of computer science and founder of the Center for Intelligent Systems at the University of California, Berkeley, has long been contemplating the power and perils of thinking machines. He is the author of more than 200 papers as well as the field’s standard textbook, Artificial Intelligence: A Modern Approach (with Peter Norvig, head of research at Google). But increasingly rapid advances in artificial intelligence have given Russell’s longstanding concerns heightened urgency.
Recently, he says, artificial intelligence has made major strides, partly on the strength of neuro-inspired learning algorithms. These are used in Facebook’s face-recognition software, smartphone personal assistants and Google’s self-driving cars.
In a bombshell result reported recently in Nature, a simulated network of artificial neurons learned to play Atari video games better than humans in a matter of hours given only data representing the screen and the goal of increasing the score at the top—but no preprogrammed knowledge of aliens, bullets, left, right, up or down. “If your newborn baby did that you would think it was possessed,” Russell said.
QUANTA MAGAZINE: You think the goal of your field should be developing artificial intelligence that is “provably aligned” with human values. What does that mean?
STUART RUSSELL: It’s a deliberately provocative statement, because it’s putting together two things—“provably” and “human values”—that seem incompatible. It might be that human values will forever remain somewhat mysterious. But to the extent that our values are revealed in our behavior, you would hope to be able to prove that the machine will be able to “get” most of it. There might be some bits and pieces left in the corners that the machine doesn’t understand or that we disagree on among ourselves. But as long as the machine has got the basics right, you should be able to show that it cannot be very harmful.
How do you go about doing that?
That’s the question I’m working on right now: Where does a machine get hold of some approximation of the values that humans would like it to have? I think one answer is a technique called “inverse reinforcement learning.” Ordinary reinforcement learning is a process where you are given rewards and punishments as you behave, and your goal is to figure out the behavior that will get you the most rewards. That’s what the [Atari-playing] DQN system is doing; it is given the score of the game, and its goal is to make that score bigger. Inverse reinforcement learning is the other way around. You see the behavior, and you’re trying to figure out what score that behavior is trying to maximize. For example, your domestic robot sees you crawl out of bed in the morning and grind up some brown round things in a very noisy machine and do some complicated thing with steam and hot water and milk and so on, and then you seem to be happy. It should learn that part of the human value function in the morning is having some coffee.
There’s an enormous amount of information out there in books, movies and on the web about human actions and attitudes to the actions. So that’s an incredible resource for machines to learn what human values are—who wins medals, who goes to jail, and why.
You’ve spent much of your career trying to understand what intelligence is as a prerequisite for understanding how machines might achieve it. What have you learned?
During my thesis research in the ’80s, I started thinking about rational decision-making and the problem that it’s actually impossible. If you were rational you would think: Here’s my current state, here are the actions I could do right now, and after that I can do those actions and then those actions and then those actions; which path is guaranteed to lead to my goal? The definition of rational behavior requires you to optimize over the entire future of the universe. It’s just completely infeasible computationally.
It didn’t make much sense that we should define what we’re trying to do in AI as something that’s impossible, so I tried to figure out: How do we really make decisions?
So, how do we do it?
One trick is to think about a short horizon and then guess what the rest of the future is going to look like. So chess programs, for example—if they were rational they would only play moves that guarantee checkmate, but they don’t do that. Instead they look ahead a dozen moves into the future and make a guess about how useful those states are, and then they choose a move that they hope leads to one of the good states.
“Could you prove that your systems can’t ever, no matter how smart they are, overwrite their original goals as set by the humans?”
Another thing that’s really essential is to think about the decision problem at multiple levels of abstraction, so “hierarchical decision making.” A person does roughly 20 trillion physical actions in their lifetime. Coming to this conference to give a talk works out to 1.3 billion or something. If you were rational you’d be trying to look ahead 1.3 billion steps—completely, absurdly impossible. So the way humans manage this is by having this very rich store of abstract, high-level actions. You don’t think, “First I can either move my left foot or my right foot, and then after that I can either…” You think, “I’ll go on Expedia and book a flight. When I land, I’ll take a taxi.” And that’s it. I don’t think about it anymore until I actually get off the plane at the airport and look for the sign that says “taxi”—then I get down into more detail. This is how we live our lives, basically. The future is spread out, with a lot of detail very close to us in time, but these big chunks where we’ve made commitments to very abstract actions, like, “get a Ph.D.,” “have children.”
Are computers currently capable of hierarchical decision making?
So that’s one of the missing pieces right now: Where do all these high-level actions come from? We don’t think programs like the DQN network are figuring out abstract representations of actions. There are some games where DQN just doesn’t get it, and the games that are difficult are the ones that require thinking many, many steps ahead in the primitive representations of actions—ones where a person would think, “Oh, what I need to do now is unlock the door,” and unlocking the door involves fetching the key, etcetera. If the machine doesn’t have the representation “unlock the door” then it can’t really ever make progress on that task.
But if that problem is solved—and it’s certainly not impossible - then we would see another big increase in machine capabilities. There are two or three problems like that where if all of those were solved, then it’s not clear to me that there would be any major obstacle between there and human-level AI.
What concerns you about the possibility of human-level AI?
In the first [1994] edition of my book there’s a section called, “What if we do succeed?” Because it seemed to me that people in AI weren’t really thinking about that very much. Probably it was just too far away. But it’s pretty clear that success would be an enormous thing. “The biggest event in human history” might be a good way to describe it. And if that’s true, then we need to put a lot more thought than we are doing into what the precise shape of that event might be.
The basic idea of the intelligence explosion is that once machines reach a certain level of intelligence, they’ll be able to work on AI just like we do and improve their own capabilities—redesign their own hardware and so on—and their intelligence will zoom off the charts. Over the last few years, the community has gradually refined its arguments as to why there might be a problem. The most convincing argument has to do with value alignment: You build a system that’s extremely good at optimizing some utility function, but the utility function isn’t quite right. In [Oxford philosopher] Nick Bostrom’s book [Superintelligence], he has this example of paperclips. You say, “Make some paperclips.” And it turns the entire planet into a vast junkyard of paperclips. You build a super-optimizer; what utility function do you give it? Because it’s going to do it.
What about differences in human values?
That’s an intrinsic problem. You could say machines should err on the side of doing nothing in areas where there’s a conflict of values. That might be difficult. I think we will have to build in these value functions. If you want to have a domestic robot in your house, it has to share a pretty good cross-section of human values; otherwise it’s going to do pretty stupid things, like put the cat in the oven for dinner because there’s no food in the fridge and the kids are hungry. Real life is full of these tradeoffs. If the machine makes these tradeoffs in ways that reveal that it just doesn’t get it - that it’s just missing some chunk of what’s obvious to humans—then you’re not going to want that thing in your house.
I don’t see any real way around the fact that there’s going to be, in some sense, a values industry. And I also think there’s a huge economic incentive to get it right. It only takes one or two things like a domestic robot putting the cat in the oven for dinner for people to lose confidence and not buy them.
Then there’s the question, if we get it right such that some intelligent systems behave themselves, as you make the transition to more and more intelligent systems, does that mean you have to get better and better value functions that clean up all the loose ends, or do they still continue behaving themselves? I don’t know the answer yet.
You’ve argued that we need to be able to mathematically verify the behavior of AI under all possible circumstances. How would that work?
One of the difficulties people point to is that a system can arbitrarily produce a new version of itself that has different goals. That’s one of the scenarios that science fiction writers always talk about; somehow, the machine spontaneously gets this goal of defeating the human race. So the question is: Could you prove that your systems can’t ever, no matter how smart they are, overwrite their original goals as set by the humans?
Automating air traffic control systems may require airtight proofs about real-world possibilities.Click to Open Overlay Gallery
Automating air traffic control systems may require airtight proofs about real-world possibilities. FLIGHTRADAR24
It would be relatively easy to prove that the DQN system, as it’s written, could never change its goal of optimizing that score. Now, there is a hack that people talk about called “wire-heading” where you could actually go into the console of the Atari game and physically change the thing that produces the score on the screen. At the moment that’s not feasible for DQN, because its scope of action is entirely within the game itself; it doesn’t have a robot arm. But that’s a serious problem if the machine has a scope of action in the real world. So, could you prove that your system is designed in such a way that it could never change the mechanism by which the score is presented to it, even though it’s within its scope of action? That’s a more difficult proof.
Are there any advances in this direction that you think hold promise?
There’s an area emerging called “cyber-physical systems” about systems that couple computers to the real world. With a cyber-physical system, you’ve got a bunch of bits representing an air traffic control program, and then you’ve got some real airplanes, and what you care about is that no airplanes collide. You’re trying to prove a theorem about the combination of the bits and the physical world. What you would do is write a very conservative mathematical description of the physical world - airplanes can accelerate within such-and-such envelope - and your theorems would still be true in the real world as long as the real world is somewhere inside the envelope of behaviors.
Yet you’ve pointed out that it might not be mathematically possible to formally verify AI systems.
There’s a general problem of “undecidability” in a lot of questions you can ask about computer programs. Alan Turing showed that no computer program can decide whether any other possible program will eventually terminate and output an answer or get stuck in an infinite loop. So if you start out with one program, but it could rewrite itself to be any other program, then you have a problem, because you can’t prove that all possible other programs would satisfy some property. So the question would be: Is it necessary to worry about undecidability for AI systems that rewrite themselves? They will rewrite themselves to a new program based on the existing program plus the experience they have in the world. What’s the possible scope of effect of interaction with the real world on how the next program gets designed? That’s where we don’t have much knowledge as yet.
Artificial Stupidity
In October, Elon Musk called artificial intelligence “our greatest existential threat,” and equated making machines that think with “summoning the demon.” In December, Stephen Hawking said “full artificial intelligence could spell the end of the human race.” And this year, Bill Gates said he was “concerned about super intelligence,” which he appeared to think was just a few decades away.
But if the human race is at peril from killer robots, the problem is probably not artificial intelligence. It is more likely to be artificial stupidity. The difference between those two ideas says much about how we think about computers.
In the kind of artificial intelligence, or A.I., that most people seem to worry about, computers decide people are a bad idea, so they kill them. That is undeniably bad for the human race, but it is a potentially smart move by the computers.
But the real worry, specialists in the field say, is a computer program rapidly overdoing a single task, with no context. A machine that makes paper clips proceeds unfettered, one example goes, and becomes so proficient that overnight we are drowning in paper clips.
In other words, something really dumb happens, at a global scale. As for those “Terminator” robots you tend to see on scary news stories about an A.I. apocalypse, forget it.
“What you should fear is a computer that is competent in one very narrow area, to a bad degree,” said Max Tegmark, a professor of physics at the Massachusetts Institute of Technology and the president of the Future of Life Institute, a group dedicated to limiting the risks from A.I.
In late June, when a worker in Germany was killed by an assembly line robot, Mr. Tegmark said, “it was an example of a machine being stupid, not doing something mean but treating a person like a piece of metal.”
His institute recently disbursed much of the $10 million that Mr. Musk, the founder of Tesla and SpaceX, gave it to think of ways to prevent autonomous programs from going rogue. Yet even Mr. Musk, along with other luminaries in science and tech, like Mr. Hawking and Mr. Gates, seems to be focused on the wrong potential threat.
There is little sense among practitioners in the field of artificial intelligence that machines are anywhere close to acquiring the kind of consciousness where they could form lethal opinions about their makers.
“These doomsday scenarios confuse the science with remote philosophical problems about the mind and consciousness,” Oren Etzioni, chief executive of the Allen Institute for Artificial Intelligence, a nonprofit that explores artificial intelligence, said. “If more people learned how to write software, they’d see how literal-minded these overgrown pencils we call computers actually are.”
What accounts for the confusion? One big reason is the way computer scientists work. “The term ‘A.I.’ came about in the 1950s, when people thought machines that think were around the corner,” Mr. Etzioni said. “Now we’re stuck with it.”
It is still a hallmark of the business. Google’s advanced A.I. work is at a company it acquired called DeepMind. A pioneering company in the field was called Thinking Machines. Researchers are pursuing something called Deep Learning, another suggestion that we are birthing intelligence.
Deep Learning relies on a hierarchical reasoning technique called neural networks, suggesting the neurons of a brain. Comparing a node in a neural network to a neuron, though, is at best like comparing a toaster to the space shuttle.
In fairness, the kind of work DeepMind is doing, along with much other work in the burgeoning field of machine learning, does involve spotting patterns, suggesting actions and making predictions. That is akin to the mental stuff people do.
It is among the most exciting fields in tech. There is a pattern-finding race among Amazon, Facebook and Google. Companies including Uber and General Electric are staking much of their future on machine learning.
But machine learning is automation, a better version of what computers have always done. The “learning” is not stored and generalized in the ways that make people smart.
DeepMind made a program that mastered simple video games, but it never took the learning from one game into another. The 22 rungs of a neural net it climbs to figure out what is in a picture do not operate much like human image recognition and are still easily defeated.
Moving out of that stupidity to a broader humanlike capability is called “transfer learning.” It is at best in the research phase.
“People in A.I. know that a chess-playing computer still doesn’t yearn to capture a queen,” said Stuart Russell, a professor of Computer Science at the University of California, Berkeley. He is also on the Future of Life’s board and is a recipient of some of Mr. Musk’s grant. He seeks mathematical ways to ensure dumb programs don’t conflict with our complex human values.
“What the paper clip program lacks is a background value structure,” he said. “The misunderstanding is thinking that there is only a threat if there is consciousness.”
Powerful computers will reshape humanity’s future. How to ensure the promise outweighs the perils
“THE development of full artificial intelligence could spell the end of the human race,” Stephen Hawking warns. Elon Musk fears that the development of artificial intelligence, or AI, may be the biggest existential threat humanity faces. Bill Gates urges people to beware of it.
Dread that the abominations people create will become their masters, or their executioners, is hardly new. But voiced by a renowned cosmologist, a Silicon Valley entrepreneur and the founder of Microsoft—hardly Luddites—and set against the vast investment in AI by big firms like Google and Microsoft, such fears have taken on new weight. With supercomputers in every pocket and robots looking down on every battlefield, just dismissing them as science fiction seems like self-deception. The question is how to worry wisely.
You taught me language and...
The first step is to understand what computers can now do and what they are likely to be able to do in the future. Thanks to the rise in processing power and the growing abundance of digitally available data, AI is enjoying a boom in its capabilities (see article). Today’s “deep learning” systems, by mimicking the layers of neurons in a human brain and crunching vast amounts of data, can teach themselves to perform some tasks, from pattern recognition to translation, almost as well as humans can. As a result, things that once called for a mind—from interpreting pictures to playing the video game “Frogger”—are now within the scope of computer programs. DeepFace, an algorithm unveiled by Facebook in 2014, can recognise individual human faces in images 97% of the time.
Crucially, this capacity is narrow and specific. Today’s AI produces the semblance of intelligence through brute number-crunching force, without any great interest in approximating how minds equip humans with autonomy, interests and desires. Computers do not yet have anything approaching the wide, fluid ability to infer, judge and decide that is associated with intelligence in the conventional human sense.
Briefing: AI is improving rapidly, but remains more useful than terrifying
Yet AI is already powerful enough to make a dramatic difference to human life. It can already enhance human endeavour by complementing what people can do. Think of chess, which computers now play better than any person. The best players in the world are not machines however, but what Garry Kasparov, a grandmaster, calls “centaurs”: amalgamated teams of humans and algorithms. Such collectives will become the norm in all sorts of pursuits: supported by AI, doctors will have a vastly augmented ability to spot cancers in medical images; speech-recognition algorithms running on smartphones will bring the internet to many millions of illiterate people in developing countries; digital assistants will suggest promising hypotheses for academic research; image-classification algorithms will allow wearable computers to layer useful information onto people’s views of the real world.
Even in the short run, not all the consequences will be positive. Consider, for instance, the power that AI brings to the apparatus of state security, in both autocracies and democracies. The capacity to monitor billions of conversations and to pick out every citizen from the crowd by his voice or her face poses grave threats to liberty.
And even when there are broad gains for society, many individuals will lose out from AI. The original “computers” were drudges, often women, who performed endless calculations for their higher-ups. Just as transistors took their place, so AI will probably turf out whole regiments of white-collar workers. Education and training will help and the wealth produced with the aid of AI will be spent on new pursuits that generate new jobs. But workers are doomed to dislocations.
Surveillance and dislocations are not, though, what worries Messrs Hawking, Musk and Gates, or what inspires a phalanx of futuristic AI films that Hollywood has recently unleashed onto cinema screens. Their concern is altogether more distant and more apocalyptic: the threat of autonomous machines with superhuman cognitive capacity and interests that conflict with those of Homo sapiens.
Such artificially intelligent beings are still a very long way off; indeed, it may never be possible to create them. Despite a century of poking and prodding at the brain, psychologists, neurologists, sociologists and philosophers are still a long way from an understanding of how a mind might be made—or what one is. And the business case for even limited intelligence of the general sort—the sort that has interests and autonomy—is far from clear. A car that drives itself better than its owner sounds like a boon; a car with its own ideas about where to go, less so.
...I know how to curse
But even if the prospect of what Mr Hawking calls “full” AI is still distant, it is prudent for societies to plan for how to cope. That is easier than it seems, not least because humans have been creating autonomous entities with superhuman capacities and unaligned interests for some time. Government bureaucracies, markets and armies: all can do things which unaided, unorganised humans cannot. All need autonomy to function, all can take on life of their own and all can do great harm if not set up in a just manner and governed by laws and regulations.
These parallels should comfort the fearful; they also suggest concrete ways for societies to develop AI safely. Just as armies need civilian oversight, markets are regulated and bureaucracies must be transparent and accountable, so AI systems must be open to scrutiny. Because systems designers cannot foresee every set of circumstances, there must also be an off-switch. These constraints can be put in place without compromising progress. From the nuclear bomb to traffic rules, mankind has used technical ingenuity and legal strictures to constrain other powerful innovations.
The spectre of eventually creating an autonomous non-human intelligence is so extraordinary that it risks overshadowing the debate. Yes, there are perils. But they should not obscure the huge benefits from the dawn of AI.