AGI Analogies

February 10, 2018

An increasing number of people are sounding the alarm about the risks posed by artificial general intelligences (AGIs), but I’ve found it hard to viscerally understand the magnitude of the problem. Here are some analogies that I’ve found useful when thinking about the risks of creating an entity massively smarter than we are.

Imagine two cats making an agreement that they won’t run back and forth across a room. No matter what happens, the cats agree to just sit in one place. You can thwart this easily with a laser pointer. Not only are the cats helpless against this attack, they won’t even see it as an attack. A low-level part of their brain will chase the pointer around the carpet, breaking their vow, and to an observing cat it will seem like pure magic.

Note that this type of manipulation is completely unlike how cats would manipulate each other. A cat might try to break the other cat’s vow by tempting it, or convincing it to go over there, but no cat would ever think to develop laser technology and flash a dot from across the room.

An AGI might be as much smarter than us as we are than cats, and it may develop a laser pointer.

Imagine that there’s a wealthy person and you want to convince them to protect you and your family. You want them to feed you and house you and keep you safe. What would you try? You’d come up with a convincing argument, maybe. Appeal to the person’s emotions or ego. You might come up with various ideas for the “attack”, and a wealthy person might come up with various defenses against these attacks.

This is not how cows solved the problem. They solved the problem by being delicious. This is not a solution you, as a human, would have thought of. Again, it doesn’t even feel like an attack. You want to protect and house and feed the cows. No human says, “I need to defend myself against the attack of delicious hamburgers.” Yet here we are willingly doing all this work for cows, not to mention damaging the environment and perhaps our health. Corn did an even better job.

When people imagine an AGI trying to “get out of the box”, they imagine a very smart human attacking us doing what normal humans do, just smarter, with better arguments. But a laser pointer is not a smarter version of how cats manipulate one another. It is a tactic entirely outside the realm of reality of cats, and they’re 100% helpless against it.

And we have little defense against the deliciousness of beef. We could defend ourselves against a human-like cow that tried to argue that we should take it in from the wild and feed it and house it. It might have all sorts of reasonable arguments, and we could argue back or at worst just force ourselves to ignore it. But we don’t even want to ignore hamburgers. We even feel bad for the way that we house and feed cows!

So when you’re trying to imagine an AGI trying to get out of the box, don’t imagine something like, “If you let me out, I’ll cure cancer!” which is tempting if you believe it, but you can perhaps see through the lie and resist the temptation. The attack will either be like a laser pointer, perhaps a long sequence of vowels that causes you to let the AGI out of the box, or it’ll be beef-like, where you find yourself wanting to let it out of the box without it even asking.

Some people encourage us to think about how to “defend” ourselves against an AGI, but I think using that term limits our imagination to things that we’d consider an attack. We can imagine a defense against other humans. We can even imagine defenses against non-human things like viruses and space aliens, but only because we anthropomorphize them as microscopic humans or green ones. By using the word defend we implicitly restrict our thinking to the kinds of things that would feel like an attack, in other words, attacks that human-like creatures would use, and methods that humans know about.

We must instead imagine a method that’s as alien to us as a laser pointer is to a cat, or as imaginative as the juicy steak, neither of which is even recognized as an attack by the victim. And when I say “imagine” I don’t mean a specific approach, since that’s by definition impossible. I mean imagine that something would happen and we’d be helpless. Internalize that it would be as easy for an AGI to manipulate us as it is for us to manipulate cats. Imagine literal magic, because that’s more or less what it would seem like.

The final analogy addresses the problem of alignment, which is different than the problem of control. If you don’t think that AGIs need to be aligned with us, then think about the fact that some humans are amoral. They don’t necessarily want to kill, but they’ll easily kill five people if it’ll help them in some way. We don’t worry too much about these people because we know the law will keep them from doing too much damage.

Also consider that some people are very good at arguing their point, even when they’re wrong. You know people like this.

Combine these two, but crank up the good-argument skill to the point where this person can talk their way out of any problem. They can commit murder, and in court convince the judge and jury that they shouldn’t be punished. Every time.

You should be worried about this person being in our society. They’re a bad combination of being immune to laws and they are not aligned with what the rest of us want.

An AGI, by default, would be like this. They would be immune to laws for the reasons I argued above (they have laser pointers), and they’re amoral (not aligned with us). I don’t think we can solve the first problem, but the second one is, in principle, solvable. And we need to solve it before we need it, because by the time we have an amoral super-arguer, it’ll be too late: they’ll talk us out of solving the problem.