Machine learning models are incredibly powerful tools. They extract deeply hidden patterns in large data sets that our limited human brains can’t parse. These complex algorithms, then, need to be incomprehensible “black boxes,” because a model that we could crack open and understand would be useless. Right?
That’s all wrong, at least according to Cynthia Rudin, who studies interpretable machine learning at Duke University. She’s spent much of her career pushing for transparent but still accurate models to replace the black boxes favored by her field.
The stakes are high. These opaque models are becoming more common in situations where their decisions have real consequences, like the decision to biopsy a potential tumor, grant bail or approve a loan application. Today, at least 581 AI models involved in medical decisions have received authorization from the Food and Drug Administration. Nearly 400 of them are aimed at helping radiologists detect abnormalities in medical imaging, like malignant tumors or signs of a stroke.
Many of these algorithms are black boxes — either because they’re proprietary or because they’re too complicated for a human to understand. “It makes me very nervous,” Rudin said. “The whole framework of machine learning just needs be changed when you’re working with something higher-stakes.”
But changed to what? Recently, Rudin and her team set out to prove that even the most complex machine learning models, neural networks doing computer vision tasks, can be transformed into interpretable glass boxes that show their work to doctors.
Rudin, who grew up outside Buffalo, New York, grew to share her father’s love of physics and math — he’s a medical physicist who helped calibrate X-ray machines — but she realized she preferred to solve problems with computers. Now she leads Duke’s Interpretable Machine Learning lab, where she and her colleagues scrutinize the most complex puzzle boxes in machine learning — neural networks — to create accurate models that show their work.
Quanta spoke with Rudin about these efforts, ethical obligations in machine learning and weird computer poetry. The interviews have been condensed and edited for clarity.
Did you always dream of being a computer scientist?
No, definitely not. As a kid, I wanted to be an orchestra conductor, or something like it. And I wanted to be a composer and write music.
What kind of music?
That’s the problem. I write French music from the turn of the previous century, like Ravel and Debussy. And then I realized that few people cared about that kind of music, so I decided not to pursue it as a career. As an undergraduate, I wanted to be an applied mathematician — but I went in the opposite direction, which was machine learning.
When did you begin thinking about interpretability?
After I graduated, I ended up working at Columbia with the New York City power company, Con Edison. And they were doing real-world work. We were supposed to predict which manholes were going to have a fire or an explosion — at the time, it was about 1% of the manholes in Manhattan every year. I joked that I was always trying to take a picture of myself on the “most likely to explode” manhole — though I never actually did.
I found out very quickly that this was not a problem that machine learning was helping with, because the data was so messy. They had accounting records dating back to the 1890s. So we processed all the data and turned it into these tiny models that the company could understand and work with. It was interpretable machine learning, though I didn’t know that at the time.
What did you know about interpretability back then?
I didn’t really know anything about interpretability because they didn’t teach it to anyone. Machine learning was designed to be black box — predictive models that are either too complicated for any human to understand or proprietary, somebody’s secret sauce. The whole idea was that you didn’t need to deal with the data; the algorithm would handle all that under the hood. It was so elegant, but that just made it very difficult to figure out what was going on.
But why does knowing what’s going on under the hood matter?
If you want to trust a prediction, you need to understand how all the computations work. For example, in health care, you need to know if the model even applies to your patient. And it’s really hard to troubleshoot models if you don’t know what’s in them. Sometimes models depend on variables in ways that you might not like if you knew what they were doing. For example, with the power company in New York, we gave them a model that depended on the number of neutral cables. They looked at it and said, “Neutral cables? That should not be in your model. There’s something wrong.” And of course there was a flaw in the database, and if we hadn’t been able to pinpoint it, we would have had a serious problem. So it’s really useful to be able to see into the model so you can troubleshoot it.
When did you first get concerned about non-transparent AI models in medicine?
My dad is a medical physicist. Several years ago, he was going to medical physics and radiology conferences. I remember calling him on my way to work, and he was saying, “You’re not going to believe this, but all the AI sessions are full. AI is taking over radiology.” Then my student Alina [Barnett] roped us into studying [AI models that examine] mammograms. Then I realized, OK, hold on. They’re not using interpretable models. They’re using just these black boxes; then they’re trying to explain their results. Maybe we should do something about this.
So we decided we would try to prove that you could construct interpretable models for mammography that did not lose accuracy over their black box counterparts. We just wanted to prove that it could be done.
How do you make a radiology AI that shows its work?
We decided to use case-based reasoning. That’s where you say, “Well, I think this thing looks like this other thing that I’ve seen before.” It’s like what Dr. House does with his patients in the TV show. Like: “This patient has a heart condition, and I’ve seen her condition before in a patient 20 years ago. This patient is a young woman, and that patient was an old man, but the heart condition is similar.” And so I can reason about this case in terms of that other case.
We decided to do that with computer vision: “Well, this part of the image looks like that part of that image that I’ve seen before.” This would explain the reasoning process in a way that is similar to how a human might explain their reasoning about an image to another human.
These are high-complexity models. They’re neural networks. But as long as they’re reasoning about a current case in terms of its relationship to past cases, that’s a constraint that forces the model to be interpretable. And we haven’t lost any accuracy compared to the benchmarks in computer vision.
Would this ‘Dr. House’ technique work for other areas of health care?
You could use case-based reasoning for anything. Once we had the mammography project established, my students Alina Barnett and Stark Guo, and a physician collaborator named Brandon Westover, transferred their knowledge directly to EEG scans for critically ill patients. It’s a similar neural architecture, and they trained it within a couple of months, very quick.
If this approach is just as accurate as black boxes, why not use it for everything?
Well, first of all, it’s much harder to train an interpretable model, because you have to think about the reasoning process and make sure that’s correct. For low-stakes decisions, it’s not really worth it. Like for advertising, if the ad gets to the right people and makes money, then people tend to be happy. But for high-stakes decisions, I think it’s worth that extra effort.
Are there other ways to figure out what a neural network is doing?
Around 2017, people started working on “explainability,” which was explaining the predictions of a black box. So you have some complicated function — like a neural network. You can think about these explanation methods as trying to approximate these functions. Or they might try to pick out which variables are important for a specific prediction.
And that work has some serious problems with it. The explanations have to be wrong, because if their explanations were always right, you could just replace the black box with the explanations. And so the fact that the explainability people casually claim the same kinds of guarantees that the interpretability people are actually providing made me very uncomfortable, especially when it came to high-stakes decisions. Even with an explanation, you could have your freedom denied if you were a prisoner and truly not understand why. Or you could be denied a loan that would give you a house, and again, you wouldn’t be able to know why. They could give you some crappy explanation, and there’s nothing you could do about it, really.
Are people taking interpretability more seriously now?
I think so. It used to be that I would give a talk and some people would come up and yell at me after. And they’d be like, “We don’t need interpretable models; we just test it really carefully and it’s fine.” Now people are coming up afterward and saying, “Yeah, I agree with you, and I’m working on this too.” I think you still have the explainability people ruling the land at the moment — again, it’s easier to poke at a black box than it is to replace it. Those guys I haven’t managed to convince, and I view that as somewhat of a personal failure, but I’m working on it. [Laughs.] I’m hoping that this next generation will help me out.
Would any low-stakes applications of machine learning benefit from more interpretability?
People are working on interpretable models for natural language processing. These large language-generation models like ChatGPT are very difficult to understand. We’ve realized now when they say something offensive, it would be useful to know why they did that. It’s really hard to troubleshoot these black box models. Before ChatGPT, I used to run our computer-generated poetry team at Duke. We were working with GPT-2, a predecessor to ChatGPT, and I often felt like we were trying to convince it to do something it really didn’t want to do. It just wasn’t good at figuring out which words generally make sense together.
Why did you make computer-generated poetry?
Well, I was hoping to do something meta-creative. The team started with sonnets, then went on to limericks. They wrote this paper called “There Once Was a Really Bad Poet, It Was Automated but You Didn’t Know It.” We forced the model to follow a certain template — like Mad Libs on steroids. There were a whole bunch of poems that were just a riot. It’s so fun when you get some weird piece of poetry that the computer wrote and you’re like, “Wow, that’s pretty funky.”
But all of this was before ChatGPT, which has no trouble with text generation, even with very difficult constraints like rhyming and iambic pentameter. But ChatGPT taught me something important. If we don’t have interpretability on large scale language and image generation models, they are harder to control, which means they are likely to assist in propagating dangerous misinformation more quickly. So they changed my mind on the value of interpretability — even for low-stakes decisions it seems we need it.
Do you ever use machine learning to compose music?
We published a beautiful computer generation algorithm for four-part harmony that is fully interpretable, written by one of my students, Stephen Hahn. All of the co-authors were musicians, and we incorporated music theory into the algorithm. It isn’t a neural network, and it produces beautiful music.
I mean, when we find a tiny little model for predicting whether someone will have a seizure, I think that’s beautiful, because it’s a very small pattern that someone can appreciate and use. And music is all about patterns. Poetry is all about patterns. They’re all beautiful patterns.