Inside a soundproofed crate sits one of the world’s worst neural networks. After being presented with an image of the number 6, it pauses for a moment before identifying the digit: zero. Peter McMahon, the physicist-engineer at Cornell University who led the development of the network, defends it with a sheepish smile, pointing out that the handwritten number looks sloppy. Logan Wright, a postdoc visiting McMahon’s lab from NTT Research, assures me that the device usually gets the answer right, but acknowledges that mistakes are common. “It’s just this bad,” he said.
Despite the underwhelming performance, this neural network is a groundbreaker. The researchers tip the crate over, revealing not a computer chip but a microphone angled toward a titanium plate that’s bolted to a speaker. Other neural networks operate in the digital world of 0s and 1s, but this device runs on sound. When Wright cues up a new image of a digit, its pixels get converted into audio and a faint chattering fills the lab as the speaker shakes the plate. Metallic reverberations do the “reading” rather than software running on silicon. That the device often succeeds beggars belief, even to its designers.
“Whatever the function of the shaking metal is, it shouldn’t have anything to do with classifying a handwritten digit,” said McMahon.
The primitive reading ability of the device, which the Cornell group unveiled in a paper in Nature in January, gives McMahon and others hope that its distant descendants could revolutionize computing.
When it comes to conventional machine learning, computer scientists have discovered that bigger is better. Stuffing a neural network with more artificial neurons — nodes that store numerical values — improves its ability to tell a dachshund from a Dalmatian, or to succeed at myriad other pattern recognition tasks. Truly tremendous neural networks can pull off unnervingly human undertakings like composing essays and creating illustrations. With more computational muscle, even grander feats may become possible. This potential has motivated a multitude of efforts to develop more powerful and efficient methods of computation.
McMahon and a band of like-minded physicists champion an unorthodox approach: Get the universe to crunch the numbers for us. “Many physical systems can naturally do some computation way more efficiently or faster than a computer can,” McMahon said. He cites wind tunnels: When engineers design a plane, they might digitize the blueprints and spend hours on a supercomputer simulating how air flows around the wings. Or they can stick the vehicle in a wind tunnel and see if it flies. From a computational perspective, the wind tunnel instantly “calculates” how wings interact with air.
A wind tunnel is a single-minded machine; it simulates aerodynamics. Researchers like McMahon are after an apparatus that can learn to do anything — a system that can adapt its behavior through trial and error to acquire any new ability, such as classifying handwritten digits or distinguishing one spoken vowel from another. Recent work has shown that physical systems like waves of light, networks of superconductors and branching streams of electrons can all learn.
“We are reinventing not just the hardware,” said Benjamin Scellier, a mathematician at the Swiss Federal Institute of Technology Zurich in Switzerland who helped design a new physical learning algorithm, but “also the whole computing paradigm.”
Learning to Think
Learning is an exotic process; until about a decade ago, brains were the only systems that did it well. It was the structure of the brain that loosely inspired computer scientists to design deep neural networks, now the most popular artificial learning models.
A deep neural network is a computer program that learns through practice. The network can be thought of as a grid: Layers of nodes called neurons, which store values, are connected to neurons in adjacent layers by lines, or “synapses.” Initially, these synapses are just random numbers known as “weights.”
When you want the network to read a digit — 4, say — you make the first layer of neurons represent a raw image of the 4, perhaps storing the shade of each pixel as a value in a corresponding neuron. Then the network “thinks,” moving layer by layer, multiplying the neuron values by the synaptic weights to populate the next layer of neurons. The neuron with the highest value in the final layer indicates the network’s answer. If it’s the second neuron, for instance, the network guesses that it saw a 2.
To teach the network to make smarter guesses, a learning algorithm works backward. After each trial, it calculates the difference between the guess and the correct answer (which, in our example, would be represented by a high value for the fourth neuron in the final layer and low values elsewhere). Then an algorithm steps back through the network layer by layer, calculating how to tweak the weights in order to get the values of the final neurons to rise or fall as needed. This procedure, known as backpropagation, lies at the heart of deep learning.
Through many guess-and-tweak repetitions, backpropagation guides the weights to a configuration of numbers that will, through a cascade of multiplications initiated by an image, spit out the digit written there.
But compared to whatever goes on in the brain, the digitized version of learning that happens in artificial neural networks looks dramatically inefficient. On less than 2,000 calories a day, a human child learns to talk, read, play games, and much more in a few years. On such a restricted energy diet, the groundbreaking GPT-3, a neural network capable of fluent conversation, would have taken a millennium to learn to chat.
From a physicist’s perspective, a large digital neural network is simply trying to do too much math. Today’s biggest behemoths must record and manipulate more than half a trillion numbers. The universe, meanwhile, constantly pulls off tasks far beyond the limits of computers’ meager bookkeeping abilities. A room might have trillions of trillions of air molecules bouncing around; that’s an impossible number of moving pieces for a computer to track in a full-fledged simulation of collisions, but the air itself has no trouble deciding how to behave from moment to moment.
The challenge is to build physical systems that can naturally pull off both of the processes needed for AI — the “thinking” involved in (say) classifying an image, and the “learning” needed to classify such images correctly. A system that mastered both tasks would leverage the universe’s ability to act mathematically without actually doing math.
“We never compute 3.532 times 1.567 or something,” Scellier said. “It’s done, but implicitly, just by the laws of physics directly.”
The Thinking Part
McMahon and his collaborators have made progress on the “thinking” half of the puzzle.
While setting up his lab at Cornell in the final months of the pre-pandemic world, McMahon mulled over a curious finding. For years, the top-performing image-recognition neural networks had been getting ever deeper. That is, networks with more layers were better able to take in a bunch of pixels and put out a label, such as “poodle.” The trend inspired mathematicians to study the transformation (from pixels to “poodle”) that the networks were achieving, and in 2017 several groups proposed that the networks were acting like approximate versions of a smooth mathematical function. In math, a function turns an input (often a position along the x-axis) into an output (the y-value, or height, of a curve at that position). In a specific type of neural network, more layers do better because the function is less jagged, moving closer to some ideal curve.
The research got McMahon thinking. Perhaps with a smoothly changing physical system, one could sidestep the blockiness inherent in the digital approach.
The trick was to find a way to domesticate a complicated system — to adapt its behavior with training. McMahon and collaborators chose the titanium plate as one such system because its many patterns of vibrations blend incoming sound in convoluted ways. To make the plate act like a neural network, they fed in one sound that encoded the input image (a handwritten 6, for example), and another representing the synaptic weights; peaks and troughs needed to hit the titanium plate at precisely the right moments for the device to merge the sounds and give the answer — such as a new sound that’s loudest in the sixth millisecond, representing the classification “6.”
The group also implemented their scheme in an optical system — where the input image and weights are encoded in two beams of light that get jumbled together by a crystal — and in an electronic circuit capable of similarly shuffling inputs. In principle, any system with Byzantine behavior will do, though the researchers believe the optical system holds particular promise. Not only can a crystal blend light extremely quickly, but light also contains abundant data about the world. McMahon imagines miniaturized versions of his optical neural network someday serving as the eyes of self-driving cars, identifying stop signs and pedestrians before feeding that information to the vehicle’s computer chip, much as our retinas perform some basic visual processing on incoming light.
The Achilles heel of these systems, however, is that training them requires a return to the digital world. Backpropagation involves running a neural network in reverse, but plates and crystals don’t readily unmix sounds and light. So the group constructed a digital model of each physical system. Reversing these models on a laptop, they could use the backpropagation algorithm to calculate how to adjust the weights to give accurate answers.
With this training, the plate learned to classify handwritten digits correctly 87% of the time. The circuit and laser reached 93% and 97% accuracy, respectively. The results showed “that not only standard neural networks can be trained through backpropagation,” said Julie Grollier, a physicist at the French National Center for Scientific Research (CNRS). “That’s beautiful.”
The group’s quivering metal plate has not yet brought computing closer to the shocking efficiency of the brain. It doesn’t even approach the speed of digital neural networks. But McMahon views his devices as striking, if modest, proof that you don’t need a brain or computer chip to think. “Any physical system can be a neural network,” he said.
The Learning Part
Ideas abound for the other half of the puzzle — getting a system to learn all by itself.
Florian Marquardt, a physicist at the Max Planck Institute for the Science of Light in Germany, believes one option is to build a machine that runs backward. Last year, he and a collaborator proposed a physical analogue of the backpropagation algorithm that could run on such a system.
To show that it works, they digitally simulated a laser setup somewhat like McMahon’s, with the adjustable weights encoded in a light wave that mixes with another input wave (encoding, say, an image). They nudge the output to be closer to the right answer and use optical components to unmix the waves, reversing the process. “The magic,” Marquardt said, is that “when you try the device once more with the same input, [the output] now has a tendency to be closer to where you want it to be.” Next, they are collaborating with experimentalists to build such a system.
But focusing on systems that run in reverse limits the options, so other researchers are leaving backpropagation behind entirely. They take encouragement from knowing that the brain learns in some other way than standard backpropagation. “The brain doesn’t work like this,” said Scellier. Neuron A communicates with neuron B, “but it’s only one-way.”
In 2017, Scellier and Yoshua Bengio, a computer scientist at the University of Montreal, developed a unidirectional learning method called equilibrium propagation. To get a sense of how it works, imagine a network of arrows that act like neurons, their direction indicating a 0 or 1, connected in a grid by springs that act as synaptic weights. The looser a spring, the less the linked arrows tend to snap into alignment.
First, you twist arrows in the leftmost row to reflect the pixels of your handwritten digit and hold them fixed while the disturbance ripples out through the springs, flipping other arrows. When the flipping stops, the rightmost arrows give the answer.
Crucially, you don’t have to train this system by un-flipping the arrows. Instead, you connect another set of arrows showing the correct answer along the bottom of the network; these flip arrows in the upper set, and the whole grid settles into a new equilibrium. Finally, you compare the new orientations of the arrows with the old orientations and tighten or loosen each spring accordingly. Over many trials, the springs acquire smarter tensions in a way that Scellier and Bengio have shown is equivalent to backpropagation.
“It was thought that there was no possible link between physical neural networks and backpropagation,” said Grollier. “Very recently that’s what changed, and that’s very exciting.”
Initial work on equilibrium propagation was all theoretical. But in an upcoming publication, Grollier and Jérémie Laydevant, a physicist at CNRS, describe an execution of the algorithm on a machine called a quantum annealer, built by the company D-Wave. The apparatus has a network of thousands of interacting superconductors that can act like arrows linked by springs and naturally calculate how the “springs” should be updated. The system cannot update these synaptic weights automatically, though.
Closing the Circle
At least one team has gathered the pieces to build an electronic circuit that does all the heavy lifting — thinking, learning and updating weights — with physics. “We’ve been able to close the loop for a small system,” said Sam Dillavou, a physicist at the University of Pennsylvania.
The goal for Dillavou and his collaborators is to emulate the brain, a literal smart substance: a relatively uniform system that learns without any single structure calling the shots. “Every neuron is doing its own thing,” he said.
To this end, they built a self-learning circuit, in which variable resistors act as the synaptic weights and neurons are the voltages measured between the resistors. To classify a given input, it translates the data into voltages that are applied to a few nodes. Electric current courses through the circuit, seeking the paths that dissipate the least energy and changing the voltages as it stabilizes. The answer is the voltage at specified output nodes.
Their major innovation came in the ever-challenging learning step, for which they devised a scheme similar to equilibrium propagation called coupled learning. As one circuit takes in data and “thinks up” a guess, an identical second circuit starts with the correct answer and incorporates it into its behavior. Finally, electronics connecting each pair of resistors automatically compare their values and adjust them to achieve a “smarter” configuration.
The group described their rudimentary circuit in a preprint last summer, showing that it could learn to distinguish three types of flowers with 95% accuracy. Now they’re working on a faster, more capable device.
Even that upgrade won’t come close to beating a state-of-the-art silicon chip. But the physicists building these systems suspect that digital neural networks — as mighty as they seem today — will eventually appear slow and inadequate next to their analog cousins. Digital neural networks can only scale up so much before getting bogged down by excessive computation, but bigger physical networks need not do anything but be themselves.
“It’s such a big, fast-moving and varied field that I find it hard to believe that there won’t be some pretty powerful computers made with these principles,” Dillavou said.