How our brain, a three-pound mass of tissue encased within a bony skull, creates perceptions from sensations is a long-standing mystery. Abundant evidence and decades of sustained research suggest that the brain cannot simply be assembling sensory information, as though it were putting together a jigsaw puzzle, to perceive its surroundings. This is borne out by the fact that the brain can construct a scene based on the light entering our eyes, even when the incoming information is noisy and ambiguous.
Consequently, many neuroscientists are pivoting to a view of the brain as a “prediction machine.” Through predictive processing, the brain uses its prior knowledge of the world to make inferences or generate hypotheses about the causes of incoming sensory information. Those hypotheses — and not the sensory inputs themselves — give rise to perceptions in our mind’s eye. The more ambiguous the input, the greater the reliance on prior knowledge.
“The beauty of the predictive processing framework [is] that it has a really large — sometimes critics might say too large — capacity to explain a lot of different phenomena in many different systems,” said Floris de Lange, a neuroscientist at the Predictive Brain Lab of Radboud University in the Netherlands.
However, the growing neuroscientific evidence for this idea has been mainly circumstantial and is open to alternative explanations. “If you look into cognitive neuroscience and neuro-imaging in humans, [there’s] a lot of evidence — but super-implicit, indirect evidence,” said Tim Kietzmann of Radboud University, whose research lies in the interdisciplinary area of machine learning and neuroscience.
So researchers are turning to computational models to understand and test the idea of the predictive brain. Computational neuroscientists have built artificial neural networks, with designs inspired by the behavior of biological neurons, that learn to make predictions about incoming information. These models show some uncanny abilities that seem to mimic those of real brains. Some experiments with these models even hint that brains had to evolve as prediction machines to satisfy energy constraints.
And as computational models proliferate, neuroscientists studying live animals are also becoming more convinced that brains learn to infer the causes of sensory inputs. While the exact details of how the brain does this remain hazy, the broad brushstrokes are becoming clearer.
Unconscious Inferences in Perception
Predictive processing may seem at first like a counterintuitively complex mechanism for perception, but there is a long history of scientists turning to it because other explanations seemed wanting. Even a thousand years ago, the Muslim Arab astronomer and mathematician Hasan Ibn Al-Haytham highlighted a form of it in his Book of Optics to explain various aspects of vision. The idea gathered force in the 1860s, when the German physicist and physician Hermann von Helmholtz argued that the brain infers the external causes of its incoming sensory inputs rather than constructing its perceptions “bottom up” from those inputs.
Helmholtz expounded this concept of “unconscious inference” to explain bi-stable or multi-stable perception, in which an image can be perceived in more than one way. This occurs, for example, with the well-known ambiguous image that we can perceive as a duck or a rabbit: Our perception keeps flipping between the two animal images. In such cases, Helmholtz asserted that the perception must be an outcome of an unconscious process of top-down inferences about the causes of sensory data since the image that forms on the retina doesn’t change.
During the 20th century, cognitive psychologists continued to build the case that perception was a process of active construction that drew on both bottom-up sensory and top-down conceptual inputs. The effort culminated in an influential 1980 paper, “Perceptions as Hypotheses,” by the late Richard Langton Gregory, which argued that perceptual illusions are essentially the brain’s erroneous guesses about the causes of sensory impressions. Meanwhile, computer vision scientists stumbled in their efforts to use bottom-up reconstruction to enable computers to see without an internal “generative” model for reference.
“Trying to make sense of data without a generative model is doomed to failure — all one can do is make statements about patterns in data,” said Karl Friston, a computational neuroscientist at University College London.
But while acceptance of predictive processing grew, questions remained about how it might be implemented in the brain. One popular model, called predictive coding, argues for a hierarchy of information processing levels in the brain. The highest level represents the most abstract, high-level knowledge (for instance, the perception of a snake in the shadows ahead). This layer makes predictions, anticipating the neural activity of the layer below, by sending signals downward. The lower layer compares its actual activity against the prediction from above. If there’s a mismatch, the layer generates an error signal that flows upward, so that the higher layer can update its internal representations.
This process happens simultaneously for each pair of consecutive layers, all the way down to the bottommost layer, which receives actual sensory input. Any discrepancy between what’s received from the world and what’s being anticipated results in an error signal that ripples back up the hierarchy. The highest layer eventually updates its hypothesis (that it wasn’t a snake after all, just a coiled rope on the ground).
“In general, the idea of predictive coding, especially when it’s applied to the cortex, is that the brain has basically two populations of neurons,” de Lange said: one that encodes the current best prediction about what is being perceived and another that signals errors in that prediction.
In 1999, the computer scientists Rajesh Rao and Dana Ballard (then at the Salk Institute for Biological Studies and the University of Rochester, respectively) built a formidable computational model of predictive coding that had neurons explicitly for prediction and error correction. They modeled parts of a pathway in the visual processing system of primate brains that consists of hierarchically organized regions responsible for recognizing faces and objects. They showed that the model could recapitulate some unusual behaviors of the primate visual system.
This work, however, was done before the advent of modern deep neural networks, which have one input layer, one output layer and multiple hidden layers sandwiched between the two. By 2012, neuroscientists were using deep neural networks to model the primate ventral visual stream. But almost all these models were feedforward networks, in which information flows only from the input to the output. “The brain is clearly not a purely feedforward machine,” de Lange said. “There’s lots of feedback in the brain, about as much as there is feedforward [signaling].”
So neuroscientists turned to another type of model, called a recurrent neural network (RNN). These have features that make them “an ideal substrate” for modeling the brain, according to Kanaka Rajan, a computational neuroscientist and assistant professor at the Icahn School of Medicine at Mount Sinai in New York, whose lab uses RNNs to understand brain function. RNNs have both feedforward and feedback connections between their neurons, and they have constant ongoing activity that is independent of inputs. “The ability to produce these dynamics over a very long period of time, essentially forever, is what gives these networks the ability to then be trained,” said Rajan.
Prediction Is Energy-Efficient
RNNs caught the attention of William Lotter and his doctoral thesis advisers David Cox and Gabriel Kreiman at Harvard University. In 2016, the team showed off an RNN that learned to predict the next frame in a video sequence. They called it PredNet (“I’ll take blame for not having enough creativity to come up with something better,” said Lotter). The team designed the RNN in keeping with the principles of predictive coding as a hierarchy of four layers, each one predicting the input it’s anticipating from the layer below and sending an error signal upward if there’s a mismatch.
They then trained the network on videos of city streets shot from a camera mounted on a car. PredNet learned to continuously predict the next frame in a video. “We didn’t know if it would actually work,” said Lotter. “We tried it and saw it was actually making predictions. And that was pretty cool.”
The next step was to connect PredNet to neuroscience. Last year in Nature Machine Intelligence, Lotter and colleagues reported that PredNet demonstrates behaviors seen in monkey brains in response to unexpected stimuli, including some that are hard to replicate in simple feedforward networks.
“That’s fantastic work,” Kietzmann said of PredNet. But he, Marcel van Gerven and their colleagues at Radboud were after something more basic: Both the Rao and Ballard model and PredNet explicitly incorporated artificial neurons for prediction and error correction, along with mechanisms that caused correct top-down predictions to inhibit the error neurons. But what if those weren’t explicitly specified? “We wondered whether all of this ‘baking in’ architectural constraints is really needed or whether we would get away with an even simpler approach,” said Kietzmann.
What occurred to Kietzmann and van Gerven was that neural communication is energetically costly (the brain is the most energy-intensive organ in the body). A need to conserve energy might therefore constrain the behavior of any evolving neural network in organisms.
The researchers decided to see whether any of the computational mechanisms for predictive coding might emerge in RNNs that had to accomplish their tasks using as little energy as possible. They figured that the strengths of the connections, also known as weights, between the artificial neurons in their networks could serve as a proxy for synaptic transmission, which is what accounts for much of the energy usage in biological neurons. “If you reduce weights between artificial units, that means that you communicate with less energy,” said Kietzmann. “We take this as minimizing synaptic transmission.”
The team then trained an RNN on numerous sequences of consecutive digits in ascending, wraparound order: 1234567890, 3456789012, 6789012345 and so on. Each digit was shown to the network in the form of a 28-by-28-pixel image. The RNN learned an internal model that could predict what the next digit would be, starting from any random place in the sequence. But the network was forced to do this with the smallest possible weights between units, analogous to low levels of neural activity in a biological nervous system.
Under these conditions, the RNN learned to predict the next number in the sequence. Some of its artificial neurons acted as “prediction units” representing a model of the expected inputs. Other neurons acted as “error units” that were most active when the prediction units hadn’t yet learned to correctly anticipate the next number. These error units became subdued when the prediction units started getting it right. Crucially, the network arrived at this architecture because it was compelled to minimize energy usage. “It just learns to do the sort of inhibition that people have typically been building into the system explicitly,” said Kietzmann. “Our system does it out of the box, as an emergent thing to do, to be energy-efficient.”
The takeaway is that a neural network that minimizes energy usage will end up implementing some sort of predictive processing — making a case that biological brains are probably doing the same.
Rajan called Kietzmann’s work a “very neat example of how top-down constraints like energy minimization can indirectly lead to a specific function like predictive coding.” It prompted her to wonder whether the emergence of specific error and prediction units in the RNN could be an unintended consequence of the fact that only neurons at the edge of the network were receiving inputs. If the inputs were distributed throughout the network, “my knee-jerk guess is you won’t find the separation between error units and predictive units, but you’ll still find predictive activity,” she said.
A Unifying Framework for Brain Behaviors
Persuasive as these insights from computational studies may seem, in the end, only evidence from live brains can convince neuroscientists of predictive processing in the brain. To this end, Blake Richards, a neuroscientist and computer scientist at McGill University and Mila, the Quebec Artificial Intelligence Institute, and his colleagues formulated some clear hypotheses about what they should see in brains learning to make predictions about unexpected events.
To test their hypotheses, they turned to researchers at the Allen Institute for Brain Science in Seattle, who carried out experiments on mice while monitoring the neural activity in their brains. Of particular interest were certain pyramidal neurons in the brain’s neocortex, which are thought to be anatomically suited to predictive processing. They can receive both local bottom-up sensory signals from nearby neurons (through inputs to their cell body) and top-down prediction signals from more distant neurons (through their apical dendrites).
The mice were shown many sequences of Gabor patches, which consist of stripes of light and dark. All four patches in each sequence had roughly the same orientation, and the mice came to expect that. (“Must have been boring as hell, just watching these sequences,” said Richards.) Then the researchers inserted an unexpected event: a fourth Gabor patch randomly rotated to a different orientation. The animals were initially surprised, but over time, they came to expect the element of surprise too. All the while, the researchers observed the activity in the mice’s brains.
What they saw was that lots of neurons responded differently to expected and unexpected stimuli. Crucially, this difference was strong in the local, bottom-up signals on the first day of testing, but it waned on the second and third days. In the context of predictive processing, this suggested that newly formed top-down expectations began inhibiting the responses to incoming sensory information as the stimuli became less surprising.
Meanwhile, the opposite was happening in the apical dendrites: The difference in their response to unexpected stimuli increased over time. The neural circuits appeared to be learning to represent properties of the surprising events better, to make better predictions the next time around.
“This study provides further support for the idea that something like predictive learning or predictive coding is happening in the neocortex,” said Richards.
It’s true that individual observations of neuronal activity or an animal’s behavior can at times be explained by some other model of the brain. For example, the waning responses in neurons to the same input, instead of being interpreted as the inhibition of error units, might simply be due to a process of adaptation. But then “you get this whole phone book of explanations for different phenomena,” said de Lange.
Predictive processing, on the other hand, provides a unifying framework to explain many phenomena in one go, hence its allure as a theory of how the brain works. “I think the evidence at this point is pretty compelling,” said Richards. “I’m willing to put a lot of money on that claim, actually.”