On nights and weekends, Justin Gilmer attacked an old question in pure math using the tools of information theory.

Introduction

In mid-October, Justin Gilmer flew from California to New York to attend a friend’s wedding. While on the East Coast he visited his former adviser, Michael Saks, a mathematician at Rutgers University, where Gilmer had received his doctorate seven years earlier.

Saks and Gilmer caught up over lunch, but they didn’t talk about math. In fact, Gilmer had not thought seriously about math since finishing at Rutgers in 2015. That was when he’d decided he didn’t want a career in academia and instead started to teach himself to program. As he and Saks ate, Gilmer told his old mentor about his job at Google, where he works on machine learning and artificial intelligence.

It was sunny the day Gilmer visited Rutgers. As he walked around, he recalled how in 2013 he’d spent the better part of a year walking those same campus paths, thinking about a problem called the union-closed conjecture. It had been a fixation, though a fruitless one: For all his effort, Gilmer had only succeeded in teaching himself why the simple-seeming problem about sets of numbers was so difficult to solve.

“I think a lot of people think about the problem until they become satisfied that they understand why it’s hard. I probably spent more time on it than most people,” Gilmer said.

Following his October visit, something unexpected happened: He got a new idea. Gilmer began to think about ways to apply techniques from information theory to solve the union-closed conjecture. He pursued the idea for a month, at every turn expecting it to fail. But instead, the path to a proof kept opening up. Finally, on November 16 he posted a first-of-its-kind result that gets mathematicians much of the way toward proving the full conjecture.

The paper set off a flurry of follow-up work. Mathematicians at the University of Oxford, the Massachusetts Institute of Technology and the Institute for Advanced Study, among other institutions, quickly built on Gilmer’s novel methods. But before they did, they asked a question of their own: Just who is this guy?

Half Full

The union-closed conjecture is about collections of numbers called sets, such as {1, 2} and {2, 3, 4}. You can perform operations on sets, including taking their union, which means combining them. For example, the union of {1, 2} and {2, 3, 4} is {1, 2, 3, 4}.

A collection, or family, of sets is considered “union-closed” if the union of any two sets in the family equals any existing set in the family. For example, consider this family of four sets:

{1}, {1, 2}, {2, 3, 4}, {1, 2, 3, 4}.

Combine any pair and you get a set that’s already in the family, making the family union-closed.

Mathematicians chatted about versions of the union-closed conjecture as far back as the 1960s, but it received its first formal statement in 1979, in a paper by Péter Frankl, a Hungarian mathematician who emigrated to Japan in the 1980s and who counts street performing among his pursuits.

Frankl conjectured that if a family of sets is union-closed, it must have at least one element (or number) that appears in at least half the sets. It was a natural threshold for two reasons.

First, there are readily available examples of union-closed families in which all elements appear in exactly 50% of the sets. Like all the different sets you can make from the numbers 1 to 10, for instance. There are 1,024 such sets, which form a union-closed family, and each of the 10 elements appears in 512 of them. And second, at the time Frankl made the conjecture no one had ever produced an example of a union-closed family in which the conjecture didn’t hold.

So 50% seemed like the right prediction.

That didn’t mean it was easy to prove. In the years since Frankl’s paper, there have been few results. Prior to Gilmer’s work, those papers only managed to establish thresholds that varied with the number of sets in the family (as opposed to being the same 50% threshold for set families of all sizes).

“It feels like it should be easy, and it’s similar to a lot of problems that are easy, but it has resisted attacks,” said Will Sawin of Columbia University.

The lack of progress reflected both the tricky nature of the problem and the fact that many mathematicians preferred not to think about it; they worried they’d lose years of their careers chasing a beguiling problem that was impossible to solve. Gilmer remembers a day in 2013 when he went to Saks’ office and brought up the union-closed conjecture. His adviser — who in the past had wrestled with the problem himself — nearly threw him out of the room.

“Mike said, ‘Justin, you’re going to get me thinking about this problem again and I don’t want to do that,’” said Gilmer.

An Insight of Uncertainty

Following his visit to Rutgers, Gilmer rolled the problem around in his mind, trying to understand why it was so hard. He prompted himself with a basic fact: If you have a family of 100 sets, there are 4,950 different ways of choosing two and taking their union. Then he asked himself: How is it possible that 4,950 different unions map back onto just 100 sets if no element appears in those unions with at least some frequency?

Even at that point he was on his way to a proof, though he didn’t know it yet. Techniques from information theory, which provides a rigorous way of thinking about what to expect when you pull a pair of objects at random, would take him there.

Information theory developed in the first half of the 20th century, most famously with Claude Shannon’s 1948 paper, “A Mathematical Theory of Communication.” The paper provided a precise way of calculating the amount of information needed to send a message, based on the amount of uncertainty around what exactly the message would say. This link — between information and uncertainty — was Shannon’s remarkable, fundamental insight.

To take a toy example, imagine I flip a coin five times and send the resulting sequence to you. If it’s a normal coin, it takes five bits of information to transmit. But if it’s a loaded coin — say, 99% likely to land on heads — it takes a lot less. For example, we could agree ahead of time that I’ll send you a 1 (a single bit of information) if the loaded coin lands heads all five times, which it’s very likely to do. There’s more surprise in the outcome of a fair coin flip than there is with a biased one, and therefore more information.

The same thinking applies to the information contained in sets of numbers. If I have a family of union-closed sets — say the 1,024 sets made from the numbers 1 to 10 — I could pick two sets at random. Then I could communicate the elements of each set to you. The amount of information it takes to send that message reflects the amount of uncertainty around what those elements are: There’s a 50% chance, for example, that the first element in the first set is a 1 (because 1 appears in half the sets in the family), just as there’s a 50% chance the first result in a sequence of fair coin flips is heads.

Information theory appears often in combinatorics, an area of mathematics concerned with counting objects, which is what Gilmer had studied as a graduate student. But as he flew back home to California, he worried that the way he thought to connect information theory to the union-closed conjecture was the naïve insight of an amateur: Surely working mathematicians had come across this shiny object before and recognized it as fool’s gold.

“To be honest, I’m a little surprised no one thought of this before,” said Gilmer. “But maybe I shouldn’t be surprised, because I myself had thought about it for a year, and I knew information theory.”

More Likely Than Not

Gilmer worked on the problem at night, after finishing his work at Google, and on weekends throughout the second half of October and early November. He was encouraged by ideas that a group of mathematicians had explored years earlier in an open collaboration on the blog of a prominent mathematician named Tim Gowers. He also worked with a textbook by his side so he could look up formulas he’d forgotten.

“You’d think someone who comes up with a great result shouldn’t have to consult Chapter 2 of Elements of Information Theory, but I did,” Gilmer said.

Gilmer’s strategy was to imagine a union-closed family in which no element appeared in even 1% of all the sets — a counterexample that, if it really existed, would falsify Frankl’s conjecture.

Let’s say you choose two sets, A and B, from this family at random and consider the elements that could be in those sets, one at a time. Now ask: What are the odds that set A contains the number 1? And set B? Since every element has a little less than a 1% chance of appearing in any given set, you wouldn’t expect either A or B to contain 1. Which means there’s little surprise — and little information gained — if you learn that neither in fact does.

Next, think about the chance that the union of A and B contains 1. It’s still unlikely, but it’s more likely than the odds that it appears in either of the individual sets. It’s the sum of the likelihood it appears in A and the likelihood it appears in B minus the likelihood it appears in both. So, maybe just under 2%.

This is still low, but it’s closer to a 50-50 proposition. That means it takes more information to share the result. In other words, if there’s a union-closed family in which no element appears in at least 1% of all the sets, there’s more information in the union of two sets than in either of the sets themselves.

“The idea of revealing things element by element and looking at the amount of information you learn is extremely clever. That’s the main idea of the proof,” said Ryan Alweiss of Princeton University.

At this point Gilmer was starting to close in on Frankl’s conjecture. That’s because it’s easy to demonstrate that in a union-closed family, the union of two sets necessarily contains less information than the sets themselves — not more.

To see why, think about that union-closed family containing the 1,024 different sets you can make from the numbers 1 to 10. If you pick two of those sets at random, on average you’ll end up with sets containing five elements. (Of those 1,024 sets, 252 contain five elements, making that the most common set size.) You’re also likely to end up with a union containing about seven elements. But there are only 120 different ways of making sets containing seven elements.

The point is, there’s more uncertainty about the contents of two randomly chosen sets than there is about their union. The union skews to larger sets with more elements, for which there are fewer possibilities. When you take the union of two sets in a union-closed family, you kind of know what you’re going to get — like when you flip a biased coin — which means the union contains less information than the sets it’s composed of.

With that, Gilmer had a proof. He knew if no element appears in even 1% of the sets, the union is forced to contain more information. But the union must contain less information. Therefore there must be at least one element that appears in at least 1% of the sets.

The Push to 50

When Gilmer posted his proof on November 16, he included a note that he thought it was possible to use his method to get even closer to a proof of the full conjecture, potentially raising the threshold to 38%.

Five days later, three different groups of mathematicians posted papers within hours of each other that built on Gilmer’s work to do just that. Additional papers followed, but the initial burst seems to have taken Gilmer’s methods as far as they will go; getting to 50% will likely take additional new ideas.

Still, for some of the authors of the follow-up papers, getting to 38% was relatively straightforward, and they wondered why Gilmer didn’t just do it himself. The simplest explanation turned out to be the correct one: After more than a half-decade out of math, Gilmer just didn’t know how to do some of the technical analytic work required to pull it off.

“I was a bit rusty, and to be honest, I was stuck,” Gilmer said. “But I was eager to see where the community would take it.”

Yet Gilmer thinks the same circumstances that left him out of practice probably made his proof possible in the first place.

“It’s the only way I can explain why I thought about the problem for a year in graduate school and made no progress, I left math for six years, then returned to the problem and made this breakthrough,” he said. “I don’t know how to explain it other than being in machine learning biased my thinking.”

Correction: January 3, 2023
The original headline referred to Gilmer as a “Google engineer.” In fact, he is a researcher.