Artificial intelligence algorithms are designed to learn in fits and starts. Instead of continuously updating their knowledge base with new information over time as humans do, algorithms can learn only during the training phase. After that, their knowledge remains frozen; they perform the task they were trained for without being able to keep learning as they do it. To learn even one new thing, algorithms must be trained again from scratch. It’s as if every time you met a new person, the only way you could learn her name would be to reboot your brain.
Training from scratch can lead to a behavior known as catastrophic forgetting, where a machine incorporates new knowledge at the cost of forgetting nearly everything it’s already learned. This situation arises because of the way that today’s most powerful AI algorithms, called neural networks, learn new things.
These algorithms are based loosely on our brains, where learning involves changing the strength of connections between neurons. But this process gets tricky. Neural connections also represent past knowledge, so changing them too much will cause forgetting.
Biological neural networks have evolved strategies over hundreds of millions of years to ensure that important information remains stable. But today’s artificial neural networks struggle to strike a good balance between new and old knowledge. Their connections are too easily overwritten when the network sees new data, which can result in a sudden and severe failure to recognize past information.
To help counter this, Christopher Kanan, a 41-year-old computer scientist at the University of Rochester, has helped establish a new field of AI research known as continual learning. His goal is for AI to keep learning new things from continuous streams of data, and to do so without forgetting everything that came before.
Kanan has been toying with machine intelligence nearly all his life. As a kid in rural Oklahoma who just wanted to have fun with machines, he taught bots to play early multi-player computer games. That got him wondering about the possibility of artificial general intelligence — a machine with the ability to think like a human in every way. This made him interested in how minds work, and he majored in philosophy and computer science at Oklahoma State University before his graduate studies took him to the University of California, San Diego.
Now Kanan finds inspiration not just in video games, but also in watching his nearly 2-year-old daughter learn about the world, with each new learning experience building on the last. Because of his and others’ work, catastrophic forgetting is no longer quite as catastrophic.
Quanta spoke with Kanan about machine memories, breaking the rules of training neural networks, and whether AI will ever achieve human-level learning. The interview has been condensed and edited for clarity.
How does your training in philosophy impact the way you think about your work?
It has served me very well as an academic. Philosophy teaches you, “How do you make reasoned arguments,” and “How do you analyze the arguments of others?” That’s a lot of what you do in science. I still have essays from way back then on the failings of the Turing test, and things like that. And so those things I still think about a lot.
My lab has been inspired by asking the question: Well, if we can’t do X, how are we going to be able to do Y? We learn over time, but neural networks, in general, don’t. You train them once. It’s a fixed entity after that. And that’s a fundamental thing that you’d have to solve if you want to make artificial general intelligence one day. If it can’t learn without scrambling its brain and restarting from scratch, you’re not really going to get there, right? That’s a prerequisite capability to me.
How have researchers dealt with catastrophic forgetting so far?
The most successful method, called replay, stores past experiences and then replays them during training with new examples, so they are not lost. It’s inspired by memory consolidation in our brain, where during sleep the high-level encodings of the day’s activities are “replayed” as the neurons reactivate.
In other words, for the algorithms, new learning can’t completely eradicate past learning since we are mixing in stored past experiences.
There are three styles for doing this. The most common style is “veridical replay,” where researchers store a subset of the raw inputs — for example, the original images for an object recognition task — and then mix those stored images from the past in with new images to be learned. The second approach replays compressed representations of the images. A third far less common method is “generative replay.” Here, an artificial neural network actually generates a synthetic version of a past experience and then mixes that synthetic example with new examples. My lab has focused on the latter two methods.
Unfortunately, though, replay isn’t a very satisfying solution.
To learn something new, the neural network has to store at least some information about every concept that it learned in the past. And from a neuroscientific perspective, the hypothesis is that you and I have replay of a relatively recent experience — not something that happened in our childhoods — to prevent forgetting of that recent experience. Whereas in the way we do it in deep neural networks, that’s not true. It doesn’t necessarily have to store everything it has seen, but it has to store something about every task it learned in the past to use replay. And it’s unclear what it should store. So replay as it’s done today still seems like it’s not all the way there.
If we could completely solve catastrophic forgetting, would that mean AI could learn new things continuously over time?
Not exactly. I think the big, big, big open questions in the field of continual learning are not in catastrophic forgetting. What I’m really interested in is: How does past learning make future learning more efficient? And how does learning something in the future correct the learnings of the past? Those are things that not very many people are measuring, and I think doing so is a critical part of pushing the field forward because really, it’s not about just forgetting stuff. It’s about becoming a better learner.
That’s where I think the field is kind of missing the forest for the trees. Much of the community is setting up the problem in ways that don’t match either interesting biological questions or interesting engineering applications. We can’t just have everybody do the same toy problem forever. You’ve got to say: What’s our gauntlet task? How do we push things forward?
Then why do you think most people are focusing on those simple problems?
I can only speculate. Most work is done by students who are following past work. They are copying the setup of what others have done and showing some minor gains in performance with the same measurements. Making new algorithms is more likely to lead to a publication, even if those algorithms aren’t really enabling us to make significant progress in learning continually. What surprises me is that the same sort of work is produced by large companies who don’t have the same incentives, except for intern-driven work.
Also, this work is nontrivial. We need to establish the correct experiment and algorithmic setup to measure whether past learning helps future learning. The big issue is we don’t have good data sets for studying continual learning right now. I mean, we’re basically taking existing data sets that are used in traditional machine learning and repurposing them.
Essentially, in the dogma of machine learning (or at least whenever I start teaching machine learning), we have a training set, we have a test set — we train on the training set, we test on the test set. Continual learning breaks those rules. Your training set then becomes something that evolves as the learner learns. But we’re still limited to existing data sets. We need to work on this. We need a really good continual learning environment in which we can really push ourselves.
What would the ideal continual learning environment look like?
It’s easier to tell you what it’s not than what it is. I was on a panel where we identified this as a critical problem, but it’s not one where I think anybody immediately has the answer.
I can tell you the properties it might have. So for now, let’s assume the AI algorithms are not embodied agents in simulations. Then at the very least, ideally, we’re learning from videos, or something like that, like multimodal video streams, and hopefully doing more than just classification [of static images].
There are a lot of open questions about this. I was in a continual learning workshop a few years ago and some people like me were saying, “We’ve got to stop using a data set called MNIST, it’s too simple.” And then someone said, “OK, well, let’s do incremental learning of [the strategy-based video game] StarCraft.” And I’m doing that too now for various reasons, but I don’t think that really gets at it either. Life is a much richer thing than learning to play StarCraft.
How has your lab tried to design algorithms that can keep learning over time?
With my former student Tyler Hayes, I pioneered a continual learning task for analogical reasoning. We thought that would be a good area to study the idea of transfer learning, where you acquire skills and now need to use more complex skills to solve more complex problems. Specifically, we measured backward transfer — how well does learning something in the past help you in the future, and vice versa. And we found good evidence for transfer, much more significant than for a simple task like object recognition.
Your lab also focuses on training algorithms to learn continuously from one example at a time, or from very small sets of examples. How does that help?
A lot of continual learning setups still use very large batches of examples. So they will essentially say to the algorithm, “Here’s 100,000 things; learn them. Here’s the next 100,000 things; learn them.” That doesn’t really match what I would say is the real-world application, which is, “Here’s one new thing; learn it. Here’s another new thing; learn it.”
If we want AI to learn more like us, should we also aim to replicate how humans learn different things at different ages, always refining our knowledge?
I think that’s a very fruitful avenue for making progress in this field. People tell me that I’m just obsessed with development now that I have a child, but I can see that my daughter is capable of one-shot learning, where she sees me do something once and she can copy it immediately. And machine learning algorithms can’t do anything like that today.
It really opened my eyes. There’s got to be a lot more going on in our heads than in our modern neural networks. That’s why I think the field needs to go toward this idea of learning over time, where algorithms become better learners by building upon past experience.
Do you think AI will ever really learn the same way humans do?
I think they will. Definitely. It’s a lot more promising today because there’s just so many people working in the field. But we still need more creativity. So much of the culture in the machine learning community is a follow-the-leader approach.
I think of us as just biochemical machines, and eventually we’ll figure out how to make our algorithms for the correct architectures that I think will have more of our capabilities than they have today. There’s no convincing argument for me that says it’s impossible.