# Grad Students Find Inevitable Patterns in Big Sets of Numbers

## Introduction

In late 2017, Ashwin Sah and Mehtaab Sawhney met as undergraduates at the Massachusetts Institute of Technology. Since then, the pair have written a mind-boggling 57 math proofs together, many of them profound advances in various fields.

In February, Sah and Sawhney announced yet another joint accomplishment. With James Leng, a graduate student at the University of California, Los Angeles, they obtained a long-sought improvement on an estimate of how big sets of integers can get before they must contain sequences of evenly spaced numbers, like {9, 19, 29, 39, 49} or {30, 60, 90, 120}. The proof joins a long line of work on the mathematical impossibility of complete disorder. It also marks the first progress in decades on one of the biggest unsolved problems in the field of combinatorics.

“It’s phenomenally impressive that they managed to do this,” said Ben Green, a mathematician at the University of Oxford. At the time the work was released, the trio were all still in graduate school.

Sequences of regularly spaced numbers are called arithmetic progressions. Though they’re simple patterns, they hide astounding mathematical complexity. And they’re difficult, often impossible, to avoid, no matter how hard you might try.

In 1936, the mathematicians Paul Erdős and Pál Turán conjectured that if a set consists of a nonzero fraction of the whole numbers — even if it’s just 0.00000001% — then it must contain arbitrarily long arithmetic progressions. The only sets that can avoid arithmetic progressions are those that comprise a “negligible” portion of the whole numbers. For example, the set {2, 4, 8, 16, …}, in which each number doubles the one before it, is so spread out along the number line that it’s said to make up 0% of the whole numbers. This set has no progressions.

Forty years later, in 1975, a mathematician named Endre Szemerédi proved the conjecture. His work spawned multiple lines of research that mathematicians are still exploring today. “Many of the ideas from his proof grew into worlds of their own,” said Yufei Zhao, Sah and Sawhney’s doctoral adviser at MIT.

Mathematicians have built on Szemerédi’s result in the context of finite sets of numbers. In this case, you start with a limited pool — every integer between 1 and some number *N*. What’s the largest fraction of the starting pool you can use in your set before you inevitably include a forbidden progression? And how does that fraction change as *N *changes?

For example, let *N *be 20. How many of these 20 numbers can you write down while still avoiding progressions that are, say, five or more numbers long? The answer, it turns out, is 16 — 80% of the starting pool.

Now let *N *be 1,000,000. If you use 80% of this new pool, you’re looking at sets that contain 800,000 numbers. It’s impossible for such large sets to avoid five-term progressions. You’ll have to use a smaller fraction of the pool.

Szemerédi was the first to prove that this fraction must shrink to zero as *N *grows. Since then, mathematicians have tried to quantify exactly how quickly that happens. Last year, breakthrough work by two computer scientists nearly solved this question for three-term progressions, like {6, 11, 16}.

But when you’re instead trying to avoid arithmetic progressions with four or more terms, the problem becomes tougher. “The thing I love about this problem is it just sounds so innocent, and it’s not. It really bites,” Sawhney said.

That’s because longer progressions reflect an underlying structure that is difficult for classical mathematical techniques to uncover. The numbers *x*, *y* and *z* in a three-term arithmetic progression always satisfy the simple equation *x* – 2*y* + *z* = 0. (Take the progression {10, 20, 30}, for instance: 10 – 2(20) + 30 = 0.) It’s relatively easy to prove whether or not a set contains numbers that satisfy this kind of condition. But the numbers in a four-term progression have to additionally satisfy the more complicated equation *x*^{2} – 3*y*^{2} + 3*z*^{2} – *w*^{2} = 0. Progressions with five or more terms must satisfy equations that are even more elaborate. This means that sets containing such progressions exhibit subtler patterns. It’s harder for mathematicians to show whether such patterns exist.

In the late 1990s, Timothy Gowers, a mathematician now at the Collège de France, developed a theory to overcome this obstacle. He was later awarded the Fields Medal, math’s highest honor, in part for that work. In 2001, he applied his techniques to Szemerédi’s theorem, proving a better bound on the size of the largest sets that avoid arithmetic progressions of any given length. While mathematicians used Gowers’ framework to tackle other problems over the next two decades, his 2001 record remained steadfast.

In 2022, Leng — then in his second year of graduate school at UCLA — set out to understand Gowers’ theory. He didn’t have Szemerédi’s theorem in mind; rather, he hoped to answer a technical question related to the techniques Gowers had developed. Other mathematicians, fearing that the effort needed to solve the problem would eclipse the result, tried to dissuade him. “For good reason,” Leng later said.

For more than a year, he didn’t get anywhere. But eventually, he started making progress. Sah and Sawhney, who had been thinking about related questions, learned about his work. They were intrigued. “I was amazed it’s even possible to think like this,” Sawhney said.

They realized that Leng’s research might help them make further progress on Szemerédi’s theorem. Within a few months, the three young mathematicians figured out how to get a better upper bound on the size of sets with no five-term progressions. They then extended their work to progressions of any length, marking the first advance on the problem in the 23 years since Gowers’ proof. Gowers had shown that, as your starting pool of numbers gets bigger, the progression-avoiding sets you can make get relatively smaller at a certain rate. Leng, Sah and Sawhney proved that this happens at a rate that’s exponentially faster.

“It’s a huge achievement,” Zhao said. “This is the kind of problem that I really would not suggest to any student because it is so incredibly hard.”

Mathematicians are even more excited by the method the trio used to get their new bound. For everything to work, they first had to strengthen an older, more technical result by Green, Terence Tao of UCLA and Tamar Ziegler of Hebrew University. Mathematicians feel that this result — a sort of elaboration of Gowers’ theory — can be improved even further. “It feels like we have an imperfect understanding of the theory,” Green said. “We’re just seeing a few shadows of it.”

Since completing the proof in February, Sah and Sawhney have both graduated. But the pair’s collaboration has not yet slowed down. “Their incredible strength is taking something that is extremely technically demanding and understanding it and improving upon it,” said Zhao. “It’s difficult to overstate the level of their overall accomplishments.”

*Correction:** August 8, 2024*

*An earlier version of this article said that Sah was still completing his graduate studies. In fact, he recently graduated.*