# Latest Neural Nets Solve World’s Hardest Equations Faster Than Ever Before

## Introduction

In high school physics, we learn about Newton’s second law of motion — force equals mass times acceleration — through simple examples of a single force (say, gravity) acting on an object of some mass. In an idealized scenario where the only independent variable is time, the second law is effectively an “ordinary differential equation,” which one can solve to calculate the position or velocity of the object at any moment in time.

But in more involved situations, multiple forces act on the many moving parts of an intricate system over time. To model a passenger jet scything through the air, a seismic wave rippling through Earth or the spread of a disease through a population — to say nothing of the interactions of fundamental forces and particles — engineers, scientists and mathematicians resort to “partial differential equations” (PDEs) that can describe complex phenomena involving many independent variables.

The problem is that partial differential equations — as essential and ubiquitous as they are in science and engineering — are notoriously difficult to solve, if they can be solved at all. Approximate methods can be used to solve them, but even then, it can take millions of CPU hours to sort out complicated PDEs. As the problems we tackle become increasingly complex, from designing better rocket engines to modeling climate change, we’ll need better, more efficient ways to solve these equations.

Now researchers have built new kinds of artificial neural networks that can approximate solutions to partial differential equations orders of magnitude faster than traditional PDE solvers. And once trained, the new neural nets can solve not just a single PDE but an entire family of them without retraining.

To achieve these results, the scientists are taking deep neural networks — the modern face of artificial intelligence — into new territory. Normally, neural nets map, or convert data, from one finite-dimensional space (say, the pixel values of images) to another finite-dimensional space (say, the numbers that classify the images, like 1 for cat and 2 for dog). But the new deep nets do something dramatically different. They “map between an infinite-dimensional space and an infinite-dimensional space,” said the mathematician Siddhartha Mishra of the Swiss Federal Institute of Technology Zurich, who didn’t design the deep nets but has been analyzing them mathematically.

Such techniques will, without doubt, speed up many models that involve PDEs. “Ultimately, our goal [is] to replace the very expensive traditional solvers that are very slow,” said the computer scientist Anima Anandkumar of the California Institute of Technology, a member of one of the teams that developed the new methods.

Caltech Strategic Communications; Courtesy of Kamyar Azizzadenesheli

## Introduction

But the new approaches do more than just speed up the process. For some phenomena, researchers only have data and little idea of how to even come up with the relevant PDEs to model them. “There are many, many problems where the physics is sort of flaky. It’s not well defined,” said Mishra. “So in those problems you’re sort of driving blind.” In such cases, the new neural networks, once trained on the data, will almost certainly be the only way to solve such problems.

## Pretty Dramatic Equations

What makes PDEs useful — and extremely difficult to solve — is their complexity, which allows them to model all kinds of phenomena. Take, for example, the two-dimensional perspective of a fluid flowing around some object, such as air moving around an airplane wing. Modelers want to know the velocity and pressure of the fluid at any point in space (also called the flow field) and at different times. Specific PDEs, known as the Navier-Stokes equations, model such fluid flows, taking into account the laws of conservation of energy, mass and momentum. Solve the PDE and you get a formula that describes something about the system. In this case, the solution may be a formula that lets you calculate the flow field at at different times.

Some PDEs can be solved analytically, using the tools of math, if you have enough knowledge about the initial and boundary conditions, such as the value of the flow field at time *t *= 0, and at the edges of the region being studied. But often PDEs are so complex that universal analytic solutions are impossible. This is particularly true of the most general form of the Navier-Stokes equations: Mathematicians have yet to prove whether unique solutions even exist, let alone actually find them analytically.

In these cases, modelers turn instead to numerical methods. This involves converting the PDE into a set of tractable algebraic equations that are assumed to hold over tiny increments of space and time. For our example of 2D fluid flow, the computations start with some initial and boundary conditions and proceed step by step, inching their way along the *x*– and *y*-axes, calculating the fluid’s velocity and pressure at various points. The outcome is a 2D map of the flow field, say, second by second — not a formula.

Solving complex PDEs numerically can take months on supercomputers. And if you change the initial or boundary conditions or the geometry of the system being studied (such as the wing design), you’ll have to start over. Also, the smaller the increments you use — or the finer the mesh, as the researchers say — the higher the resolution of the model, and the longer it takes to solve numerically.

Despite the costs, “for every scientific field, the trend is towards higher resolution … and this endless drive to compute things over larger domains,” said Zachary Ross, a seismologist at Caltech who was not involved with the new work. “It’s always a race to do the next biggest thing.”

## Neural Nets Join the Fray

Recently, deep neural networks have been changing the nature of that race, offering ways to solve PDEs without using analytic or numerical methods. The basic element of a deep net is an artificial neuron, which takes in a set of inputs, multiplies each one by a weight and then sums up the results. The neuron then determines an output based on that total — say, zero if the sum is below some threshold, and the sum itself otherwise. Modern neural networks have one input layer, one output layer and at least one “hidden” layer sandwiched in between. Networks with only one hidden layer are colloquially called “shallow” networks; otherwise, they are called deep neural networks.

Mathematically, the input to such a neural net is a vector — a set of numbers — and the output is another vector. If a function exists that maps a set of input vectors to a set of output vectors, the network can be trained to learn that mapping. “Neural networks are universal in that space,” said Mishra. “Any function between two finite-dimensional spaces can be approximated by a neural network.”

In 2016, researchers studied how deep neural networks normally used for image recognition could be co-opted for solving PDEs. First, the researchers generated the data to train the deep net: A numerical solver calculated the velocity field for a fluid flowing over simple objects with different basic shapes (triangles, quadrilaterals, and so on) of different sizes and orientations, scattered in the *xy*-plane. That meant the training data set consisted of a number of images: 2D images encoding information about the geometry of objects and the fluid’s initial conditions serving as inputs, and 2D snapshots of the corresponding velocity fields as outputs.

Armed with the data, the researchers trained their neural network to learn the correlation between those inputs and outputs. Training involves feeding the network an input and letting it produce some output, which it then compares to the expected output. An algorithm then adjusts the weights of the neurons to minimize the difference between the generated and expected outputs. This process is repeated until the network gets it reliably right, within some acceptable error limit. Once trained, the network can be shown a new input and, in all likelihood, will produce the correct output.

For instance, when shown new 2D shapes representing previously unseen shapes — jeeps, vans and sports cars — the deep net predicted the velocity fields around the automobiles. The predictions only differed slightly (about 10%) from those calculated independently by the numerical solver, but the net was orders of magnitude faster.

Teaching neural networks how to solve PDEs was exciting, but past efforts weren’t very flexible. Once trained on a certain mesh size, the neural net is “very specific to that resolution,” said Anandkumar. The deep net had learned to approximate a function that mapped data from one finite-dimensional space to another. But often you need to solve the PDE at a different resolution because you want a finer-grained look at the flow field, or you have a different set of initial and boundary conditions, and if so you’d need to start over and retrain. In each case, the deep net would need to learn to approximate a new function.

For the researchers who deal with PDEs every day, that wasn’t enough.

## From Infinity to Infinity

That’s why the new work is a leap forward — we now have deep neural networks that can learn how to approximate not just functions, but “operators” that map functions to functions. And they seem to do so without suffering from the “curse of dimensionality,” a problem that can plague neural networks and other computer algorithms that learn from data. For example, if you want a neural net’s error rate to go down from 10% to 1%, the amount of training data or the size of the network needed to do so can explode exponentially, making the task impossible.

But before worrying about the curse, the researchers had to figure out how to make neural networks learn operators to solve PDEs. “In operator [learning], you go from an infinite-dimensional space to an infinite-dimensional space,” said George Karniadakis of Brown University, who helped develop one of the new methods. Mathematically, an operator acts on one function and turns it into another function. As an example, consider an operator that transforms a function into its derivative (turning the sine of *x* into the cosine of *x*, for example, or *x*^{3} into 3*x*^{2}, and so on). The input and output sides are infinite-dimensional since, for example, *x* can be any value, and the function can be any transformation acting on *x*.

Deep nets that learn to approximate operators can be used to solve for a whole family of PDEs at once, modeling the same phenomena for a range of initial and boundary conditions and physical parameters. Such a family of PDEs could be a set of functions on the input side, with the corresponding solutions to the PDEs (formulas) represented by the functions on the output side.

Samuel Velasco/Quanta Magazine; Source: arXiv:1910.03193

## Introduction

In October 2019, Karniadakis and his colleagues came up with what they call DeepONet: a deep neural network architecture that can learn such an operator. It’s based on work from 1995, when researchers showed that a shallow network can approximate an operator. Because a neural network is involved, such operators are called neural operators, approximations of the actual operators.

“We extended the theorem to deep neural networks,” Karniadakis said.

What makes DeepONet special is its bifurcated architecture, which processes data in two parallel networks, a “branch” and a “trunk.” The former learns to approximate a number of functions on the input side, and the latter does the same for functions on the output side. DeepONet then combines the outputs of the two networks to learn a PDE’s desired operator. Training DeepONet involves repeatedly showing it the input-output data for a family of PDEs, generated using a numerical solver, and adjusting the weights in the branch and trunk networks in each iteration, until the entire network is making acceptably few errors.

So DeepONet, once trained, learns to approximate an operator. It can take data representing a PDE on the input side (which belongs to the same family of PDEs on which the network was trained) and transform it into the data representing the solution to the PDE on the output side. If you give it, say, 100 samples representing initial/boundary conditions and physical parameters that weren’t in the training data, and the locations where you want the flow field, DeepONet can give you the flow field in fractions of a second.

But even though DeepONet is blazingly fast next to numerical solvers, it still has to perform intensive computations during training. This can become an issue when the deep net has to be trained with enormous amounts of data to make the neural operator more and more precise. Could neural operators be sped up even more?

## Changing Perspective

Last year, Anandkumar and her colleagues at Caltech and Purdue University built a deep neural network, called the Fourier neural operator (FNO), with a different architecture that they claim is faster. Their network also maps functions to functions, from infinite-dimensional space to infinite-dimensional space, and they tested their neural net on PDEs. “We chose PDEs because PDEs are immediate examples where you go from functions to functions,” said Kamyar Azizzadenesheli of Purdue.

At the heart of their solution is something called a Fourier layer. Basically, before they push their training data through a single layer of a neural network, they subject it to a Fourier transform; then when the layer has processed that data via a linear operation, they perform an inverse Fourier transform, converting it back to the original format. (This transform is a well-known mathematical operation that decomposes a continuous function into multiple sinusoidal functions.) The entire neural network is made of a handful of such Fourier layers.

This process turns out to be much more computationally straightforward than DeepONet’s and is akin to solving a PDE by performing a hairy mathematical operation called a convolution between the PDE and some other function. But in the Fourier domain, a convolution involves a simple multiplication, which is equivalent to passing the Fourier-transformed data through one layer of artificial neurons (with the exact weights learned during training) and then doing the inverse Fourier transform. So, again, the end result is that the FNO learns the operator for an entire family of PDEs, mapping functions to functions.

Samuel Velasco/Quanta Magazine; Source: arXiv:2010.08895

## Introduction

“It’s a very neat architecture,” said Mishra.

It also provides solutions at dramatically improved speeds. In one relatively simple example that required 30,000 simulations, involving solutions of the infamous Navier-Stokes equation, the FNO took fractions of a second for each simulation (comparable to DeepONet’s speed, had it been tested on this problem), for a total of 2.5 seconds; the traditional solver in this case would have taken 18 hours.

## Making Mathematical Sense

Both team’s approaches have proved successful, but as with neural nets broadly, it’s not clear exactly why they work so well and if they’ll do so in all situations. Mishra and his colleagues are now working on a full mathematical understanding of both methods.

After a year of effort, in February Mishra’s team, with input from Karniadakis, provided a 112-page mathematical analysis of the DeepONet architecture. They proved that the approach is truly universal, in that it can map any set of functions on the input side to any set of functions on the output side, not just PDEs, without having to make certain assumptions that went into Karniadakis’ theorem for deep nets and its 1995 predecessor. The team hasn’t yet completed their paper analyzing the FNO, but Mishra said that while the method will likely be universal for PDEs — and could, at first glance, solve some of them more efficiently than DeepONet — it may not work as well for learning certain other types of operators.

His team is working on a detailed analysis of FNO that includes a close comparison with DeepONet. “In a few months, we’ll know,” he said.

What’s clear, though, is that both methods will blow past traditional solvers. And for phenomena where there are no established PDEs, learning neural operators may be the only way to model such systems. Consider the problem of traffic flow: Writing a PDE that accurately captures the dynamics of traffic is near impossible. But there’s plenty of data to learn from. “Instead of writing the PDEs, given data, you can use this neural operator to just learn the mapping,” said Azizzadenesheli.

Of course, these are just the first steps toward a new approach to solving PDEs. “This is interesting and impressive work,” said Gavin Schmidt, who works on large-scale climate models as director of the NASA Goddard Institute for Space Studies in New York City. But he has concerns about how easily it can be adopted for more chaotic systems, like climate models. For example, he said the FNO has only been demonstrated on “nice” equations, not on equations as difficult and complicated as those used in climate modeling.

From a computational perspective, however, there’s more good news. Mishra’s team has shown that the new techniques don’t suffer from the curse of dimensionality. When they analyzed DeepONet for a number of cases, he said, “we actually prove that these will break the curse of dimensionality, which is very nice.” Preliminary findings indicate that the Fourier neural operator isn’t cursed either. “The theory is coming soon.”

Breaking the curse is crucial if neural operators are to replace traditional PDE solvers, Karniadakis said. “[It’s] the future for scientific machine learning.”