Despite the wild success of ChatGPT and other large language models, the artificial neural networks (ANNs) that underpin these systems might be on the wrong track.
For one, ANNs are “super power-hungry,” said Cornelia Fermüller, a computer scientist at the University of Maryland. “And the other issue is [their] lack of transparency.” Such systems are so complicated that no one truly understands what they’re doing, or why they work so well. This, in turn, makes it almost impossible to get them to reason by analogy, which is what humans do — using symbols for objects, ideas and the relationships between them.
Such shortcomings likely stem from the current structure of ANNs and their building blocks: individual artificial neurons. Each neuron receives inputs, performs computations and produces outputs. Modern ANNs are elaborate networks of these computational units, trained to do specific tasks.
Yet the limitations of ANNs have long been obvious. Consider, for example, an ANN that tells circles and squares apart. One way to do it is to have two neurons in its output layer, one that indicates a circle and one that indicates a square. If you want your ANN to also discern the shape’s color — blue or red — you’ll need four output neurons: one each for blue circle, blue square, red circle and red square. More features mean even more neurons.
This can’t be how our brains perceive the natural world, with all its variations. “You have to propose that, well, you have a neuron for all combinations,” said Bruno Olshausen, a neuroscientist at the University of California, Berkeley. “So, you’d have in your brain, [say,] a purple Volkswagen detector.”
Instead, Olshausen and others argue that information in the brain is represented by the activity of numerous neurons. So the perception of a purple Volkswagen is not encoded as a single neuron’s actions, but as those of thousands of neurons. The same set of neurons, firing differently, could represent an entirely different concept (a pink Cadillac, perhaps).
This is the starting point for a radically different approach to computation known as hyperdimensional computing. The key is that each piece of information, such as the notion of a car, or its make, model or color, or all of it together, is represented as a single entity: a hyperdimensional vector.
A vector is simply an ordered array of numbers. A 3D vector, for example, comprises three numbers: the x, y and z coordinates of a point in 3D space. A hyperdimensional vector, or hypervector, could be an array of 10,000 numbers, say, representing a point in 10,000-dimensional space. These mathematical objects and the algebra to manipulate them are flexible and powerful enough to take modern computing beyond some of its current limitations and foster a new approach to artificial intelligence.
“This is the thing that I’ve been most excited about, practically in my entire career,” Olshausen said. To him and many others, hyperdimensional computing promises a new world in which computing is efficient and robust, and machine-made decisions are entirely transparent.
Enter High-Dimensional Spaces
To understand how hypervectors make computing possible, let’s return to images with red circles and blue squares. First we need vectors to represent the variables SHAPE and COLOR. Then we also need vectors for the values that can be assigned to the variables: CIRCLE, SQUARE, BLUE and RED.
The vectors must be distinct. This distinctness can be quantified by a property called orthogonality, which means to be at right angles. In 3D space, there are three vectors that are orthogonal to each other: One in the x direction, another in the y and a third in the z. In 10,000-dimensional space, there are 10,000 such mutually orthogonal vectors.
But if we allow vectors to be nearly orthogonal, the number of such distinct vectors in a high-dimensional space explodes. In a 10,000-dimensional space there are millions of nearly orthogonal vectors.
Now let’s create distinct vectors to represent SHAPE, COLOR, CIRCLE, SQUARE, BLUE and RED. Because there are so many possible nearly orthogonal vectors in a high-dimensional space, you can just assign six random vectors to represent the six items; they’re almost guaranteed to be nearly orthogonal. “The ease of making nearly orthogonal vectors is a major reason for using hyperdimensional representation,” wrote Pentti Kanerva, a researcher at the Redwood Center for Theoretical Neuroscience at the University of California, Berkeley, in an influential 2009 paper.
The paper built upon work done in the mid-1990s by Kanerva and Tony Plate, at the time a doctoral student with Geoff Hinton at the University of Toronto. The two independently developed the algebra for manipulating hypervectors and hinted at its usefulness for high-dimensional computing.
Given our hypervectors for shapes and colors, the system developed by Kanerva and Plate shows us how to manipulate them using certain mathematical operations. Those actions correspond to ways of symbolically manipulating concepts.
The first operation is multiplication. This is a way of combining ideas. For example, multiplying the vector SHAPE with the vector CIRCLE binds the two into a representation of the idea “SHAPE is CIRCLE.” This new “bound” vector is nearly orthogonal to both SHAPE and CIRCLE. And the individual components are recoverable — an important feature if you want to extract information from bound vectors. Given a bound vector that represents your Volkswagen, you can unbind and retrieve the vector for its color: PURPLE.
The second operation, addition, creates a new vector that represents what’s called a superposition of concepts. For example, you can take two bound vectors, “SHAPE is CIRCLE” and “COLOR is RED,” and add them together to create a vector that represents a circular shape that is red in color. Again, the superposed vector can be decomposed into its constituents.
The third operation is permutation; it involves rearranging the individual elements of the vectors. For example, if you have a three-dimensional vector with values labeled x, y and z, permutation might move the value of x to y, y to z, and z to x. “Permutation allows you to build structure,” Kanerva said. “It allows you to deal with sequences, things that happen one after another.” Consider two events, represented by the hypervectors A and B. We can superpose them into one vector, but that would destroy information about the order of events. Combining addition with permutation preserves the order; the events can be retrieved in order by reversing the operations.
Together, these three operations proved enough to create a formal algebra of hypervectors that allowed for symbolic reasoning. But many researchers were slow to grasp the potential of hyperdimensional computing, including Olshausen. “It just didn’t sink in,” he said.
Harnessing the Power
In 2015, a student of Olshausen’s named Eric Weiss demonstrated one aspect of hyperdimensional computing’s unique abilities. Weiss figured out how to represent a complex image as a single hyperdimensional vector that contains information about all the objects in the image, including their properties, such as colors, positions and sizes.
“I practically fell out of my chair,” Olshausen said. “All of a sudden the lightbulb went on.”
Soon more teams began developing hyperdimensional algorithms to replicate simple tasks that deep neural networks had begun tackling about two decades before, such as classifying images.
Consider an annotated data set that consists of images of handwritten digits. An algorithm analyzes the features of each image using some predetermined scheme. It then creates a hypervector for each image. Next, the algorithm adds the hypervectors for all images of zero to create a hypervector for the idea of zero. It then does the same for all digits, creating 10 “class” hypervectors, one for each digit.
Now the algorithm is given an unlabeled image. It creates a hypervector for this new image, then compares the hypervector against the stored class hypervectors. This comparison determines the digit that the new image is most similar to.
Yet this is just the beginning. The strengths of hyperdimensional computing lie in the ability to compose and decompose hypervectors for reasoning. The latest demonstration of this came in March, when Abbas Rahimi and colleagues at IBM Research in Zurich used hyperdimensional computing with neural networks to solve a classic problem in abstract visual reasoning — a significant challenge for typical ANNs, and even some humans. Known as Raven’s progressive matrices, the problem presents images of geometric objects in, say, a 3-by-3 grid. One position in the grid is blank. The subject must choose, from a set of candidate images, the image that best fits the blank.
“We said, ‘This is really … the killer example for visual abstract reasoning, let’s jump in,’” Rahimi said.
To solve the problem using hyperdimensional computing, the team first created a dictionary of hypervectors to represent the objects in each image; each hypervector in the dictionary represents an object and some combination of its attributes. The team then trained a neural network to examine an image and generate a bipolar hypervector — an element can be +1 or −1 — that’s as close as possible to some superposition of hypervectors in the dictionary; the generated hypervector thus contains information about all the objects and their attributes in the image. “You guide the neural network to a meaningful conceptual space,” Rahimi said.
Once the network has generated hypervectors for each of the context images and for each candidate for the blank slot, another algorithm analyzes the hypervectors to create probability distributions for the number of objects in each image, their size, and other characteristics. These probability distributions, which speak to the likely characteristics of both the context and candidate images, can be transformed into hypervectors, allowing the use of algebra to predict the most likely candidate image to fill the vacant slot.
Their approach was nearly 88% accurate on one set of problems, whereas neural network–only solutions were less than 61% accurate. The team also showed that, for 3-by-3 grids, their system was almost 250 times faster than a traditional method that uses rules of symbolic logic to reason, since that method must search through an enormous rulebook to determine the correct next step.
A Promising Start
Not only does hyperdimensional computing give us the power to solve problems symbolically, it also addresses some niggling issues of traditional computing. The performance of today’s computers degrades rapidly if errors caused by, say, a random bit flip (a 0 becomes 1 or vice versa) cannot be corrected by built-in error-correcting mechanisms. Moreover, these error-correcting mechanisms can impose a penalty on performance of up to 25%, said Xun Jiao, a computer scientist at Villanova University.
Hyperdimensional computing tolerates errors better, because even if a hypervector suffers significant numbers of random bit flips, it is still close to the original vector. This implies that any reasoning using these vectors is not meaningfully impacted in the face of errors. Jiao’s team has shown that these systems are at least 10 times more tolerant of hardware faults than traditional ANNs, which themselves are orders of magnitude more resilient than traditional computing architectures. “We can leverage all [that] resilience to design some efficient hardware,” Jiao said.
Another advantage of hyperdimensional computing is transparency: The algebra clearly tells you why the system chose the answer it did. The same is not true for traditional neural networks. Olshausen, Rahimi and others are developing hybrid systems in which neural networks map things in the physical world to hypervectors, and then hyperdimensional algebra takes over. “Things like analogical reasoning just fall in your lap,” Olshausen said. “This is what we should expect of any AI system. We should be able to understand it just like we understand an airplane or a television set.”
All of these benefits over traditional computing suggest that hyperdimensional computing is well suited for a new generation of extremely sturdy, low-power hardware. It’s also compatible with “in-memory computing systems,” which perform the computing on the same hardware that stores data (unlike existing von Neumann computers that inefficiently shuttle data between memory and the central processing unit). Some of these new devices can be analog, operating at very low voltages, making them energy-efficient but also prone to random noise. For von Neumann computing, this randomness is “the wall that you can’t go beyond,” Olshausen said. But with hyperdimensional computing, “you can just punch through it.”
Despite such advantages, hyperdimensional computing is still in its infancy. “There’s real potential here,” Fermüller said. But she points out that it still needs to be tested against real-world problems and at bigger scales, closer to the size of modern neural networks.
“For problems at scale, this needs very efficient hardware,” Rahimi said. “For example, how [do you] efficiently search over 1 billion items?”
All of this should come with time, Kanerva said. “There are other secrets [that] high-dimensional spaces hold,” he said. “I see this as the very beginning of time for computing with vectors.”