More than a quarter billion people today are infected with the hepatitis B virus (HBV), the World Health Organization estimates, and more than 850,000 of them die every year as a result. Although an effective and inexpensive vaccine can prevent infections, the virus, a major culprit in liver disease, is still easily passed from infected mothers to their newborns at birth, and the medical community remains strongly interested in finding better ways to combat HBV and its chronic effects. It was therefore notable last month when Reidun Twarock, a mathematician at the University of York in England, together with Peter Stockley, a professor of biological chemistry at the University of Leeds, and their respective colleagues, published their insights into how HBV assembles itself. That knowledge, they hoped, might eventually be turned against the virus.
Their accomplishment has gained further attention because only this past February the teams also announced a similar discovery about the self-assembly of a virus related to the common cold. In fact, in recent years, Twarock, Stockley and other mathematicians have helped reveal the assembly secrets of a variety of viruses, even though that problem had seemed forbiddingly difficult not long before.
Their success represents a triumph in applying mathematical principles to the understanding of biological entities. It may also eventually help to revolutionize the prevention and treatment of viral diseases in general by opening up a new, potentially safer way to develop vaccines and antivirals.
A Geodesic Insight
In 1962, the biologist-chemist duo Donald Caspar and Aaron Klug published a seminal paper on the structural organization of viruses. Among a series of sketches, models and X-ray diffraction patterns that the paper featured was a photograph of a building designed by Richard Buckminster Fuller, the inventor and architect: It was a geodesic dome, the design for which Fuller would become famous. And it was, in part, the lattice structure of the geodesic dome, a convex polyhedron assembled from hexagons and pentagons, themselves divided into triangles, that would inspire Caspar and Klug’s theory.
At the same time that Fuller was promoting the advantages of his domes — namely, that their structure made them more stable and efficient than other shapes — Caspar and Klug were trying to solve a structural problem in virology that had already attracted some of the field’s greats, not least among them James Watson, Francis Crick and Rosalind Franklin. Viruses consist of a short string of DNA or RNA packaged in a protein shell called a capsid, which protects the genomic material and facilitates its insertion into a host cell. Of course, the genomic material has to encode for the formation of such a capsid, and longer strands of DNA or RNA require larger capsids to shield them. It didn’t seem possible that strands as short as those found in viruses could achieve this.
Then, in 1956, three years after their work on DNA’s double helix, Watson and Crick came up with a plausible explanation. A viral genome could include instructions for only a limited number of distinct capsid proteins, which meant that in all likelihood viral capsids were symmetric: The genomic material needed to describe only some small subsection of the capsid and then give orders for it to be repeated in a symmetric pattern. Experiments using X-ray diffraction and electron microscopes revealed that this was indeed the case, making it apparent that viruses were predominantly either helical or icosahedral in shape. The former were rod-shaped structures that resembled an ear of corn, the latter polyhedra that approximated the sphere, consisting of 20 triangular faces glued together.
This 20-sided shape, one of the Platonic solids, can be rotated in 60 different ways without seeming to change in appearance. It also allows for the placement of 60 identical subunits, three on each triangular face, that are equally related to the symmetry axes — a setup that works perfectly for smaller viruses with capsids that consist of 60 proteins.
But most icosahedral viral capsids comprise a much larger number of subunits, and placing the proteins in this way never allows for more than 60. Clearly, a new theory was necessary to model larger viral capsids. That’s where Caspar and Klug entered the picture. Having recently read about Buckminster Fuller’s architectural creations, the pair realized it might have relevance to the structures of the viruses they were studying, which in turn sparked an idea. Dividing the icosahedron further into triangles (or, more formally, applying a hexagonal lattice to the icosahedron and then replacing each hexagon with six triangles) and positioning proteins in the corners of those triangles provided a more general and accurate picture of what these kinds of viruses looked like. This partitioning allowed for “quasi-equivalence,” in which subunits differ minimally in how they bond with their neighbors, forming either five-fold or six-fold positions on the lattice.
Such microscopic geodesic domes quickly became the standard way to represent icosahedral viruses, and, for a while, it seemed that Caspar and Klug had solved the problem. A handful of experiments conducted in the 1980s and ’90s, however, revealed some exceptions to the rule, most notably among groups of cancer-causing viruses called polyomaviridae and papillomaviridae.
It became necessary once more for an outside approach — made possible by theories in pure mathematics — to provide insights into the biology of viruses.
Following in Caspar and Klug’s Footsteps
About 15 years ago, Twarock came across a lecture about the different ways in which viruses realize their symmetrical structures. She thought she might be able to extend to these viruses some of the symmetry techniques she had been working on with spheres. “That snowballed,” Twarock said. She and her colleagues realized that with knowledge of structures, “we could make an impact on understanding how viruses function, how they assemble, how they infect, how they evolve.” She didn’t look back: She has spent her time since then working as a mathematical biologist, using tools from group theory and discrete math to continue where Caspar and Klug left off. “We really developed this integrative, interdisciplinary approach,” she said, “where the math drives the biology and the biology drives the math.”
Twarock first wanted to generalize the lattices that could be used so she could identify the positions of capsid subunits that Caspar and Klug’s work failed to explain. The proteins of the human papilloma viruses, for instance, were arranged in five-fold pentagonal structures, rather than hexagonal ones. Unlike hexagons, however, regular pentagons cannot be built from equilateral triangles, nor can they tessellate a plane: When slid next to each other to tile a surface, gaps and overlaps inevitably arise.
So Twarock turned to Penrose tilings, a mathematical technique developed in the 1970s to tile a plane with five-fold symmetry by fitting together four-sided figures called kites and darts. The patterns generated by Penrose tilings do not repeat periodically, making it possible to piece together its two component shapes without leaving any gaps. Twarock applied this concept by importing symmetry from a higher-dimensional space — in this case, from a lattice in six dimensions — into a three-dimensional subspace. This projection does not retain the periodicity of the lattice, but it does produce long-range order, like a Penrose tiling. It also encompasses the surface lattices used by Caspar and Klug. Twarock’s tilings therefore applied to a wider range of viruses, including the polyomaviruses and papillomaviruses that had evaded Caspar and Klug’s classification.
Moreover, Twarock’s constructions not only informed the locations and orientations of the capsid’s protein subunits, but they also provided a framework for how the subunits interacted with each other and with the genomic material inside. “I think this is where we made a very big contribution,” Twarock said. “By knowing about the symmetry of the container, you can understand better determinants of the asymmetric organization of the genomic material [and] constraints on how it must be organized. We were the first to actually float the idea that there should be order, or remnants of that order, in the genome.”
Twarock has been pursuing that line of research ever since.
The Role of Viral Genomes in Capsid Formation
Caspar and Klug’s theory applied only to the surfaces of capsids, not to their interiors. To know what was happening there, researchers had to turn to cryo-electron microscopy and other imaging techniques. Not so for Twarock’s tiling model, she said. She and her team set out hunting for combinatorial constraints on viral assembly pathways, this time using graph theory. In the process, they showed that in RNA viruses, the genomic material played a much more active role in the formation of the capsid than previously thought.
Specific positions along the RNA strand, called packaging signals, make contact with the capsid from inside its walls and help it form. Locating these signals with bioinformatics alone proves an incredibly difficult task, but Twarock realized she could simplify it by applying a classification based on a type of graph called a Hamiltonian path. Imagine the packaging signals as sticky pieces along the RNA string. One of them is stickier than the others; a protein will adhere to it first. From there, new proteins come into contact with other sticky pieces, forming an ordered pathway that never doubles back on itself. In other words, a Hamiltonian path.
Coupled with the geometry of the capsid, which places certain constraints on the local configurations in which the RNA can contact neighboring RNA-capsid binding sites, Twarock and her team mapped subsets of Hamiltonian paths to describe potential positions of the packaging signals. Weeding out the unpromising ones, Twarock said, was “a matter of taking care of dead ends.” Placements that would be both plausible and efficient, enabling effective and rapid assembly, were more limited than expected. The researchers concluded that a number of RNA-capsid binding sites must occur in every viral particle and are probably conserved features of genome organization. If so, the sites might be good novel targets for antiviral therapies.
Twarock and her colleagues, in collaboration with Stockley’s team in Leeds, have employed this model to delineate the packaging mechanism for several different viruses, starting with the bacteriophage MS2 and the satellite tobacco mosaic virus. They predicted the presence of packaging signals in MS2 in 2013 using Twarock’s mathematical tools, then provided experimental evidence to back up those claims in 2015. This past February, the researchers identified sequence-specific packaging signals in the human parechovirus, part of the picornavirus family, which includes the common cold. And last month, they published their insights into the assembly of the hepatitis B virus. They plan on doing similar work on several other types of viruses, including alphaviruses, and hope to apply their findings to gain a better understanding of how such viruses evolve.
Going Beyond the Geometry
When Twarock’s team announced their finding on the parechovirus in February, headlines claimed they were closing in on a cure for the common cold. That’s not quite right, but it is a goal they’ve kept in mind in their partnership with Stockley.
The most immediate application would be to find a way to disrupt these packaging signals, creating antivirals that interfere with capsid formation and leave the virus vulnerable. But Stockley hopes to go a different route, focusing on prevention before treatment. Vaccine development has come a long way, he acknowledged, but the number of available vaccines pales in comparison to the number of infections that pose threats. “We’d like to vaccinate people against several hundred infections,” Stockley said, whereas only dozens of vaccines have been approved. Creating a stable, noninfectious immunogen to prepare the immune system for the real thing has its limitations. Right now, approved strategies for vaccines rely on either chemically inactivated viruses (killed viruses that the immune system can still recognize) or attenuated live viruses (live viruses that have been made to lose much of their potency). The former often provide only short-lived immunity, while the latter carry the risk of being converted from attenuated viruses to virulent forms. Stockley wants to open up a third route. “Why not make something that can sort of replicate but doesn’t have pathological features to it?” he asked.
In a poster presented at the Microbiology Society Annual Conference in April, Stockley, Twarock and other researchers describe one of their current areas of focus: using the research on packaging signals and self-assembly to probe a world of synthetic viruses. By understanding capsid formation, it may be possible to engineer viruslike particles (VLPs) with synthetic RNA. These particles would not be able to replicate, but they would allow the immune system to recognize viral protein structures. Theoretically, VLPs could be safer than attenuated live viruses and might provide greater protection for longer periods than do chemically inactivated viruses.
Twarock’s mathematical work also has applications beyond viruses. Govind Menon, a mathematician at Brown University, is exploring self-assembling micro- and nanotechnologies. “The mathematical literature on synthetic self-assembly is quite thin,” Menon said. “However, there were many models to study the self-assembly of viruses. I began to study these models to see if they were flexible enough to model synthetic self-assembly. I soon found that models rooted in discrete geometry were better suited to [our research]. Reidun’s work is in this vein.”
Miranda Holmes-Cerfon, a mathematician at the Courant Institute of Mathematical Sciences at New York University, sees connections between Twarock’s virus studies and her own research into how tiny particles floating in solutions can self-organize. That relevance speaks to what she regards as one of the valuable aspects of Twarock’s investigations: the mathematician’s ability to apply her expertise to problems in biology.
“If you talk to biologists,” Holmes-Cerfon said, “the language they use is so different than the language they use in physics and math. The questions are different, too.” The challenge for mathematicians is tied to their willingness to seek out questions with answers that inform the biology. One of Twarock’s real talents, she said, “is doing that interdisciplinary work.”