# Machine Learning Becomes a Mathematical Collaborator

## Introduction

Mathematicians often work together when they’re searching for insight into a hard problem. It’s a kind of freewheeling collaborative process that seems to require a uniquely human touch.

But in two new results, the role of human collaborator has been replaced in part by a machine. The papers were completed at the end of November and summarized in a recent *Nature *article.

“The things that I love about mathematics are its intuitive and creative aspects,” said Geordie Williamson, a mathematician at the University of Sydney and co-author of one of the papers. “The [machine learning] models were supporting that in a way that I hadn’t felt from computers before.”

Two separate groups of mathematicians worked alongside DeepMind, a branch of Alphabet, Google’s parent company, dedicated to the development of advanced artificial intelligence systems.

András Juhász and Marc Lackenby of the University of Oxford taught DeepMind’s machine learning models to look for patterns in geometric objects called knots. The models detected connections that Juhász and Lackenby elaborated to bridge two areas of knot theory that mathematicians had long speculated should be related. In separate work, Williamson used machine learning to refine an old conjecture that connects graphs and polynomials.

Computers have aided in mathematical research for years, as proof assistants that make sure the logical steps in a proof really work and as brute force tools that can chew through huge amounts of data to search for counterexamples to conjectures.

The new work represents a different form of human-machine collaboration. It demonstrates that by selectively incorporating machine learning into the generative phase of research, mathematicians can uncover leads that might have been hard to find without machine assistance.

“The most amazing thing about this work — and it really is a big breakthrough — is the fact that all the pieces came together and that these people worked as a team,” said Radmila Sazdanovic of North Carolina State University. “It’s a truly transdisciplinary collaboration.”

Some observers, however, view the collaboration as less of a sea change in the way mathematical research is conducted. While the computers pointed the mathematicians toward a range of possible relationships, the mathematicians themselves needed to identify the ones worth exploring.

“All the hard work was done by the human mathematicians,” wrote Ernest Davis, a computer scientist at New York University, in an email.

## Patterns in Data

Machine learning predicts outputs from inputs: Feed a model health data and it will output a diagnosis; show it an image of an animal and it will reply with the name of the species.

This is often done using a machine learning approach called supervised learning in which researchers essentially teach the computer to make predictions by giving it many examples.

For instance, imagine you want to teach a model to identify whether an image contains a cat or a dog. Researchers start by feeding the model many examples of each animal. Based on that training data, the computer constructs an extremely complicated mathematical function, which is essentially a machine for making predictions. Once the predictive function is established, researchers show the model a new image, and it will respond with the probability that the image is a cat or a dog.

To make supervised learning useful as a research tool, mathematicians had to find the right questions for DeepMind to tackle. They needed problems that involved mathematical objects for which a lot of training data was available — a criterion that many mathematical investigations don’t meet.

They also needed to find a way to take advantage of DeepMind’s powerful ability to perceive hidden connections, while also navigating its significant limitations as a collaborator. Often, machine learning works as a black box, producing outputs from inputs according to rules that human beings can’t decipher.

“[The computer] could see really unusual things, but also struggled to explain very effectively,” said Alex Davies, a researcher at DeepMind.

The mathematicians weren’t looking for DeepMind to merely output correct answers. To really advance the field they needed to also know why the connections held — a step that the computer couldn’t take.

## Bridging Invariants

In 2018, Williamson and Demis Hassabis, the CEO and co-founder of DeepMind, were both elected as fellows of the Royal Society, a British organization of distinguished scientists. During a coffee break at the admissions ceremony, they discovered a mutual interest.

“I’d thought a little bit about how machine learning could help mathematics, and he’d thought a lot about it,” said Williamson. “We just kind of bounced ideas off each other.”

They decided that a branch of mathematics known as knot theory would be the ideal testing ground for a human-computer collaboration. It involves mathematical objects called knots, which you can think of as tangled loops of string. Knot theory fits the requirements for machine learning because it has abundant data — there are many millions of relatively simple knots — and because many properties of knots can be easily computed using existing software.

Williamson suggested that DeepMind contact Lackenby, an established knot theorist, to find a specific problem to work on.

## Introduction

Juhász and Lackenby understood the strengths and weaknesses of machine learning. Given those, they hoped to use it to find novel connections between different types of invariants, which are properties used to distinguish knots from each other.

Two knots are considered different when it’s impossible to untangle them (without cutting them) so that they look like each other. Invariants are inherent properties of the knot that do not change during the untangling process (hence the name “invariant”). So if two knots have different values for an invariant, they can never be manipulated into one another.

There are many different types of knot invariants, characterized by how they describe the knot. Some are more geometric, others are algebraic, and some are combinatorial. However, mathematicians have been able to prove very little about the relationships between invariants from different fields. They typically don’t know whether different invariants actually measure the same feature of a knot from multiple perspectives.

Juhász and Lackenby saw an opportunity for machine learning to spot connections between different categories of invariants. From these connections they could gain a deeper insight into the nature of knot invariants.

## Signature Verification

To pursue Juhász and Lackenby’s question, researchers at DeepMind developed a data set with over 2 million knots. For each knot, they computed different invariants. Then they used machine learning to search for patterns that tied invariants together. The computer perceived many, most of which were not especially interesting to the mathematicians.

“We saw quite a few patterns that were either known or were known not to be true,” said Lackenby. “As mathematicians, we weeded out quite a lot of the stuff the machine learning was sending to us.”

Unlike Juhász and Lackenby, the machine learning system does not understand the underlying mathematical theory. The input data was computed from knot invariants, but the computer only sees lists of numbers.

“As far as the machine learning system was concerned, these could have been sales records of various kinds of foods at McDonald’s,” said Davis.

Eventually the two mathematicians settled on trying to teach the computer to output an important algebraic invariant called the “signature” of a knot, based only on information about the knot’s geometric invariants.

After Juhász and Lackenby identified the problem, researchers at DeepMind began to build the specific machine learning algorithm. They trained the computer to take 30 geometric invariants of a knot as an input and to output the knot’s signature. It worked well, and after a few weeks of work, DeepMind could accurately predict the signature of most knots.

Next, the researchers needed to find out how the model was making these predictions. To do this, the team at DeepMind turned to a technique known as saliency analysis, which can be used to tease out which of the many inputs are most responsible for producing the output. They slightly changed the value of each input, one at a time, and examined which change had the most dramatic impact on the output.

If an algorithm is designed to predict whether an image shows a cat, researchers performing saliency analysis will blur tiny sections of the picture and then check whether the computer still recognizes the cat. They might find, for instance, that the pixels in the corner of the image are less important than those that compose the cat’s ear.

When the researchers applied saliency analysis to the data, they observed that three of the 30 geometric invariants seemed especially important to how the model was making predictions. All three of these invariants measure features of the cusp, which is a hollow tube encasing the knot, like the rubber coating around a cable.

Based on this information, Juhász and Lackenby constructed a formula which relates the signature of a knot to those three geometric invariants. The formula also uses another common invariant, the volume of a sphere with the knot carved out of it. When they tested the formula on specific knots, it seemed to work, but that wasn’t enough to establish a new mathematical theorem. The mathematicians were looking for a precise statement that they could prove was always valid — and that was harder.

“It just wasn’t quite working out,” said Lackenby.

Juhász and Lackenby’s intuition, built up through years of studying similar problems, told them that the formula was still missing something. They realized they needed to introduce another geometric invariant, something called the injectivity radius, which roughly measures the length of certain curves related to the knot. It was a step that used the mathematicians’ trained intuition, but it was enabled by the particular insights they were able to glean from the many unedited connections identified by DeepMind’s model.

“The good thing is that [machine learning models] have completely different strengths and weaknesses than humans do,” said Adam Zsolt Wagner of Tel Aviv University.

The modification was successful. By combining information about the injectivity radius with the three geometric invariants DeepMind had singled out, Juhász and Lackenby created a failproof formula for computing the signature of a knot. The final result had the spirit of a real collaboration.

“It was definitely an iterative process involving both the machine learning experts from DeepMind and us,” said Lackenby.

## Converting Graphs Into Polynomials

Building on the momentum of the knot theory project, in early 2020 DeepMind turned back to Williamson to see if he wanted to test a similar process in his field, representation theory. Representation theory is a branch of math that looks for ways of combining basic elements of mathematics like symmetries to make more sophisticated objects.

Within this field, Kazhdan-Lusztig polynomials are particularly important. They are based on ways of rearranging objects — such as by swapping the order of two objects in a list — called permutations. Each Kazhdan-Lusztig polynomial is built from a pair of permutations and encodes information about their relationship. They’re also very mysterious, and it is often difficult to compute their coefficients.

## Introduction

Given this, mathematicians try to understand Kazhdan-Lusztig polynomials in terms of easier objects to work with called Bruhat graphs. Each vertex on a Bruhat graph represents a permutation of a specific number of objects. Edges connect vertices whose permutations differ by swapping just two elements.

In the 1980s, George Lusztig and Matthew Dyer independently predicted that there should be a relationship between a Bruhat graph and a Kazhdan-Lusztig polynomial. The relationship would be useful because the polynomial is more fundamental, while the graph is simpler to compute.

And, just like the problem of predicting one knot invariant by using another, this problem was well suited to DeepMind’s abilities. The DeepMind team started by training the model on nearly 20,000 paired Bruhat graphs and Kazhdan-Lusztig polynomials.

Soon it was able to frequently predict the right Kazhdan-Lusztig polynomial from a Bruhat graph. But to write down a recipe for getting from one to the other, Williamson needed to know how the computer was making its predictions.

## A Formula, if You Can Prove It

Here, again, the DeepMind researchers turned to saliency techniques. Bruhat graphs are huge, but the computer’s predictions were based mostly on a small number of edges. Edges that represented exchanging faraway numbers (like 1 and 9) were more important for the predictions than edges connecting permutations that flipped nearby numbers (like 4 and 5). It was a lead that Williamson then had to develop.

“Alex [Davies] is telling me these edges, for whatever reason, are way more important than others,” said Williamson. “The ball was back in my court, and I kind of stared at these for a few months.”

Williamson ultimately devised 10 or so formulas for converting Bruhat graphs into Kazhdan-Lusztig polynomials. The DeepMind team checked them against millions of examples of Bruhat graphs. For Williamson’s first several formulas, the DeepMind team quickly found examples that didn’t work — places the recipes failed.

But eventually Williamson found a formula that seems likely to stick. It involves breaking the Bruhat graph into pieces which resemble cubes and using that information to compute the associated polynomial. DeepMind researchers have since verified the formula on millions of examples. Now it’s up to Williamson and other mathematicians to prove the recipe always works.

Using computers to check for counterexamples is a standard part of mathematical research. But the recent collaborations make computers useful in a new way. For data-heavy problems, machine learning can help guide mathematicians in novel directions, much like a colleague making a casual suggestion.