New Letters Added to the Genetic Alphabet

Scientists hope that new genetic letters, created in the lab, will endow DNA with new powers.

The two new letters are named P and Z, and fit seamlessly into existing DNA.

Olena Shmahalo/Quanta Magazine

The two new letters are named P and Z, and fit seamlessly into existing DNA.

DNA stores our genetic code in an elegant double helix. But some argue that this elegance is overrated. “DNA as a molecule has many things wrong with it,” said Steven Benner, an organic chemist at the Foundation for Applied Molecular Evolution in Florida.

Nearly 30 years ago, Benner sketched out better versions of both DNA and its chemical cousin RNA, adding new letters and other additions that would expand their repertoire of chemical feats. He wondered why these improvements haven’t occurred in living creatures. Nature has written the entire language of life using just four chemical letters: G, C, A and T. Did our genetic code settle on these four nucleotides for a reason? Or was this system one of many possibilities, selected by simple chance? Perhaps expanding the code could make it better.

Benner’s early attempts at synthesizing new chemical letters failed. But with each false start, his team learned more about what makes a good nucleotide and gained a better understanding of the precise molecular details that make DNA and RNA work. The researchers’ efforts progressed slowly, as they had to design new tools to manipulate the extended alphabet they were building. “We have had to re-create, for our artificially designed DNA, all of the molecular biology that evolution took 4 billion years to create for natural DNA,” Benner said.

Now, after decades of work, Benner’s team has synthesized artificially enhanced DNA that functions much like ordinary DNA, if not better. In two papers published in the Journal of the American Chemical Society last month, the researchers have shown that two synthetic nucleotides called P and Z fit seamlessly into DNA’s helical structure, maintaining the natural shape of DNA. Moreover, DNA sequences incorporating these letters can evolve just like traditional DNA, a first for an expanded genetic alphabet.

The new nucleotides even outperform their natural counterparts. When challenged to evolve a segment that selectively binds to cancer cells, DNA sequences using P and Z did better than those without.

“When you compare the four-nucleotide and six-nucleotide alphabet, the six-nucleotide version seems to have won out,” said Andrew Ellington, a biochemist at the University of Texas, Austin, who was not involved in the study.

Benner has lofty goals for his synthetic molecules. He wants to create an alternative genetic system in which proteins — intricately folded molecules that perform essential biological functions — are unnecessary. Perhaps, Benner proposes, instead of our standard three-component system of DNA, RNA and proteins, life on other planets evolved with just two.

Better Blueprints for Life

The primary job of DNA is to store information. Its sequence of letters contains the blueprints for building proteins. Our current four-letter alphabet encodes 20 amino acids, which are strung together to create millions of different proteins. But a six-letter alphabet could encode as many as 216 possible amino acids and many, many more possible proteins.

Olena Shmahalo/Quanta Magazine

Expanding the genetic alphabet dramatically expands the number of possible amino acids and proteins that cells can build, at least in theory. The existing four-letter alphabet produces 20 amino acids (small circle) while a six-letter alphabet could produce 216 possible amino acids (large circle).

Why nature stuck with four letters is one of biology’s fundamental questions. Computers, after all, use a binary system with just two “letters” — 0s and 1s. Yet two letters probably aren’t enough to create the array of biological molecules that make up life. “If you have a two-letter code, you limit the number of combinations you get,” said Ramanarayanan Krishnamurthy, a chemist at the Scripps Research Institute in La Jolla, Calif.

On the other hand, additional letters could make the system more error prone. DNA bases come in pairs — G pairs with C and A pairs with T. It’s this pairing that endows DNA with the ability to pass along genetic information. With a larger alphabet, each letter has a greater chance of pairing with the wrong partner, and new copies of DNA might harbor more mistakes. “If you go past four, it becomes too unwieldy,” Krishnamurthy said.

But perhaps the advantages of a larger alphabet can outweigh the potential drawbacks. Six-letter DNA could densely pack in genetic information. And perhaps six-letter RNA could take over some of the jobs now handled by proteins, which perform most of the work in the cell.

Proteins have a much more flexible structure than DNA and RNA and are capable of folding into an array of complex shapes. A properly folded protein can act as a molecular lock, opening a chamber only for the right key. Or it can act as a catalyst, capturing and bringing together different molecules for chemical reactions.

Adding new letters to RNA could give it some of these abilities. “Six letters can potentially fold into more, different structures than four letters,” Ellington said.

Back when Benner was sketching out ideas for alternative DNA and RNA, it was this potential that he had in mind. According to the most widely held theory of life’s origins, RNA once performed both the information-storage job of DNA and the catalytic job of proteins. Benner realized that there are many ways to make RNA a better catalyst.

“With just these little insights, I was able to write down the structures that are in my notebook as alternatives that would make DNA and RNA better,” Benner said. “So the question is: Why did life not make these alternatives? One way to find out was to make them ourselves, in the laboratory, and see how they work.”

Courtesy of Steven Benner

Steven Benner’s lab notebook from 1985 outlining plans to synthesize “better” DNA and RNA by adding new chemical letters.

It’s one thing to design new codes on paper, and quite another to make them work in real biological systems. Other researchers have created their own additions to the genetic code, in one case even incorporating new letters into living bacteria. But these other bases fit together a bit differently from natural ones, stacking on top of each other rather than linking side by side. This can distort the shape of DNA, particularly when a number of these bases cluster together. Benner’s P-Z pair, however, is designed to mimic natural bases.

One of the new papers by Benner’s team shows that Z and P are yoked together by the same chemical bond that ties A to T and C to G. (This bond is known as Watson-Crick pairing, after the scientists who discovered DNA’s structure.) Millie Georgiadis, a chemist at Indiana University-Purdue University Indianapolis, along with Benner and other collaborators, showed that DNA strands that incorporate Z and P retain their proper helical shape if the new letters are strung together or interspersed with natural letters.

“This is very impressive work,” said Jack Szostak, a chemist at Harvard University who studies the origin of life, and who was not involved in the study. “Finding a novel base pair that does not grossly disrupt the double-helical structure of DNA has been quite difficult.”

The team’s second paper demonstrates how well the expanded alphabet works. Researchers started with a random library of DNA strands constructed from the expanded alphabet and then selected the strands that were able to bind to liver cancer cells but not to other cells. Of the 12 successful binders, the best had Zs and Ps in their sequences, while the weakest did not.

“More functionality in the nucleobases has led to greater functionality in nucleic acids themselves,” Ellington said. In other words, the new additions appear to improve the alphabet, at least under these conditions.

Courtesy of Steven Benner

Steven Benner, an organic chemist at the Foundation for Applied Molecular Evolution in Florida, is expanding the genetic alphabet.

But additional experiments are needed to determine how broadly that’s true. “I think it will take more work, and more direct comparisons, to be sure that a six-letter version generally results in ‘better’ aptamers [short DNA strands] than four-letter DNA,” Szostak said. For example, it’s unclear whether the six-letter alphabet triumphed because it provided more sequence options or because one of the new letters is simply better at binding, Szostak said.

Benner wants to expand his genetic alphabet even further, which could enhance its functional repertoire. He’s working on creating a 10- or 12-letter system and plans to move the new alphabet into living cells. Benner’s and others’ synthetic molecules have already proved useful in medical and biotech applications, such as diagnostic tests for HIV and other diseases. Indeed, Benner’s work helped to found the burgeoning field of synthetic biology, which seeks to build new life, in addition to forming useful tools from molecular parts.

Why Life’s Code Is Limited

Benner’s work and that of other researchers suggests that a larger alphabet has the capacity to enhance DNA’s function. So why didn’t nature expand its alphabet in the 4 billion years it has had to work on it? It could be because a larger repertoire has potential disadvantages. Some of the structures made possible by a larger alphabet might be of poor quality, with a greater risk of misfolding, Ellington said.

Nature was also effectively locked into the system at hand when life began. “Once [nature] has made a decision about which molecular structures to place at the core of its molecular biology, it has relatively little opportunity to change those decisions,” Benner said. “By constructing unnatural systems, we are learning not only about the constraints at the time that life first emerged, but also about constraints that prevent life from searching broadly within the imagination of chemistry.”

Olena Shmahalo/Quanta Magazine

The genetic code — made up of the four letters, A, T, G and C — stores the blueprint for proteins. DNA is first transcribed into RNA and then translated into proteins, which fold into specific shapes.

Benner aims to make a thorough search of that chemical space, using his discoveries to make new and improved versions of both DNA and RNA. He wants to make DNA better at storing information and RNA better at catalyzing reactions. He hasn’t shown directly that the P-Z base pairs do that. But both bases have the potential to help RNA fold into more complex structures, which in turn could make proteins better catalysts. P has a place to add a “functional group,” a molecular structure that helps folding and is typically found in proteins. And Z has a nitro group, which could aid in molecular binding.

In modern cells, RNA acts as an intermediary between DNA and proteins. But Benner ultimately hopes to show that the three-biopolymer system — DNA, RNA and proteins — that exists throughout life on Earth isn’t essential. With better-engineered DNA and RNA, he says, perhaps proteins are unnecessary.

Indeed, the three-biopolymer system may have drawbacks, since information flows only one way, from DNA to RNA to proteins. If a DNA mutation produces a more efficient protein, that mutation will spread slowly, as organisms without it eventually die off.

What if the more efficient protein could spread some other way, by directly creating new DNA? DNA and RNA can transmit information in both directions. So a helpful RNA mutation could theoretically be transformed into beneficial DNA. Adaptations could thus lead directly to changes in the genetic code.

Benner predicts that a two-biopolymer system would evolve faster than our own three-biopolymer system. If so, this could have implications for life on distant planets. “If we find life elsewhere,” he said, “it would likely have the two-biopolymer system.”

This article was reprinted on

View Reader Comments (18)

Leave a Comment

Reader CommentsLeave a Comment

  • This is a great insight into possible exobiology, but i would question the validity of two-biopolymers due to the availability of the core components being the same. In an environment that is bearly habitable, the need for fast adaptation may limit the usefulness of DNA in its 4 base system, but I would argue that it would hold at least some to keep the core of the information that is essential to life in those conditions. Evolution would probably insist on some long term memory, if climbes were particularly irratic.
    Other bases probably exist, due to the P nucleotide’s functional reservior, and may give rise to epigenetics that are much stronger than in our format of DNA. Our relative stability has probably evolved out the need for such devices, but maybe they were essential in earth’s early history.

  • An increased alphabet may find a solution more rapidly, but I can easily see it producing robust solutions more rarely. Proteins are very robust to mutations on 20 letters, but what about mutations on 216 letters? The robustness constraint is a critical component of the evolutionary optimization problem.

    Nevertheless this is fascinating research.

  • Always enjoy Emily’s articles; a pleasure to read, thank you.

    Sounds like interesting research to explore an early life, RNA-world hypothesis. And the possibility of Lamarckian evolution, to boot!

    Maybe it’s Earth-centric to say — or simply lacking in the “imagination of chemistry” — but it seems reasonable to assume that the DNA-RNA-protein biopolymer system has evolved with its advantages. Compartmentalization of tasks certainly has its strengths. Go, eukaryotes!

    Having a stable information molecule (DNA) sequestered more or less safely in the nucleus; having a stable messenger molecule (RNA) to carry this information to the cytoplasm; and having a special class of molecules (proteins) that can fold into seemingly limitless conformations to facilitate the reactive and structural functions of the cell. This seems pretty well-defined and streamlined already, without overlapping or conflating roles.

    Sometimes less is more, but not always. It behooves life to explore the “possibility space,” but also to conserve what works. Maintaining the fidelity of a hard-won genetic knowledge base, and facilitating the directional flow of its information, seems like a good idea. Certainly for biology on this planet, and most likely elsewhere.

  • I question if the additional pair really maintains the macro structure of the helix. I believe that stability is what limited the base pair combination.

    Great work trying this…good working history of the evolutionary development!

  • Very very very interesting article. Well written even for those of us with little knowledge on this. Why are there only 4 letters and not 6. Good question. God knows why. 4 have gone a long way. Maybe 6 would make our brains better to understand math and the universe. But maybe 6 letters would generate to many mistakes and we would be retarded mentally. Another puzzle as big as the universe. Thanks

  • “Did our genetic code settle on these four nucleotides for a reason?”

    Yes – only those four letters are found in GATTACA.

  • While I applaud the efforts of Dr. Benner and his research colleagues, the already natural nucleotide families (hypoxanthine,inosine, IMP) and (xanthine, xanthosine, XMP) all ready play significant roles in purine synthesis de novo (IMP), synthesis by salvage (HGPRT) and purine catabolic degradation to uric acid (Xanthine Oxidase). These three enzymes are essential for nucleic acids RNA and DNA to even exist. The genetic + epigenetic code must be based on the metabolic pathways which produce the end product. Since proteins only account for at most 4% of the 3.2 billion nucleotide base pairs what about the remaining 96%. Brenner only talks about DNA =ATGC, but what about uracil which substitutes for thymine in the RNA genetic code i.e. AUGC.
    Essentially there are 5 nucleotides in the DNA and RNA genetic codes. ATGCU =5. An odd prime number. Novagon’s 15 years of research has developed the ATGCUIX = 7 natural nucleotides, also called a heptad. The amino acid counterpart heptad repeats i.e. leucine zipper , is the most common motif for alpha helical coiled coil repeats, the first protein structure discovered by scientific man. We are examining how the heptad nucleotide (ATGCUIX) codes meshes with the hydrophobic core of the various heptad amino acids. It is interesting to note that Inosine at I34 in tRNA wobble codes for every one of the hydrophobic amino acids through Crick’s wobble theory i.e. leucine, alanine, serine, threonine, valine isoleucine, proline and arginine). Further evidence is found in the most important post transcriptional modification of mRNA i.e. Adenosine to Inosine RNA editing which provides alternative splicing transcripts which enable one gene to produce multiple new and varied proteins.
    I believe current genomics should look to explain the 96% of the human genome which epigenetically control gene expression at the gene not the single base pair nucleotide level, thus enabling real time system wide adaptations when critical external and internal stressors and tensors threaten the stability of the 3′ CTD and 5′ NTD platforms for transcription and replication.
    I am amazed at the hubris of life science who believe they can improve on nature’s 3.6 billion year old evolution paradigm. Synthetic biology has no idea of the number and kinds of subtle to profound changes their artificial tinkering will cause. Remember the butterfly to tsami metaphor in proving all carbon base life forms are connected even though our technology is incapable of detecting the higher order genetic and epigenetic networks which cascade with the tiniest perturbation. The number 7 or heptad has special meaning throughout mankind’s history; we should not ignore universal archetypes in number theory even though few understand higher order hyperbolic molecular structures i.e. Felix Kleins Quartic Curve which tiles the hyperbolic plane with 168 symmetry operations, 336 if mirror imagining is allowed.
    I believe we should focus on the natural elements and metabolic processes which have brought homo sapiens to this point in time, and not waste valuable time and resources on research which will inevitably make us but a footnote in evolutionary history. Synthetic chemistry will lead to extinction, and mankind’s hubris will be the determining factor.
    John Berger, Ph.d.
    Founder Novagon DNA home of the 7 nucleotide integrated DNA, RNA and Epigenetic rDNA code.

  • I’m curious why the author identifies the chemical bonds in DNA by any name other than hydrogen bonding. In this specific instance the chosen moniker mostly serves as an opportunity to mention the scientists who first published the complete structure of the DNA molecule. I love chemistry, and am thankful for the work done by Watson & Crick, but why not stick with H-bonding? It works just like a magnet, with the (-) electrons attracted to the (+) nucleus of the adjacent atoms, so it’s pretty easy to understand intuitively.

    P.S. Apparently the word ‘hydrogen’ is not even used in the text of the article.

  • Am I reading this correctly that Benner is synthesizing the RNA incorporating P/Z and that he expects the RNA to perform biological functions itself? Or has he added proteins to the genetic code and created transfer RNAs appropriate to implement his expanded genetic code? If he has not done the latter, then how can P/Z-containing RNAs be capable of self-reproduction?

  • fascinating work, more complex, more evolved organims …. but wouldn’t it enhance the chance of mutation too ???

  • Four digit DNA code is binary (with mixed-logic). Binary code is advantaged with preservation and error avoidance in replication. DNA is a type of nanotechnology. By definition, technology has a creator. DNA nanotechnology characteristics are consistent with good nanotechnology principles: self-evolving, self-developing, self-replicating, self-provisioning, self-healing (within limits), self-rejuvenating (but subject to aging), self-regulating, self-policing, self-terminating, etc. Essentially humans are biologic “Mind-Body” nanotech “wet” machines or, if you prefer, androids, which are generally (but not always) possessed by doppelgänger spirit.

  • I’m confused how this article doesn’t mention polymerase at any point. There’s no use making a new “letter” if you don’t have anything to read it with.

    Also annoyed at the confusion that it is 20 vs 216. It is 4^3 vs 6^3, hence 64 vs 216. The fact that there are 20 amino acids that are represented by the 64 is because there is duplication of codes (for example for error correction where specific amino acids are more important, and where mutation from one DNA code to another would cause more problems). And you have your stop and start codons etc.

    Argh! Interesting. But argh!

  • @Tarwin thanks for your comment. It’s true that only a subset of the 216 possible triplet sequences will code for amino acids. The scientist I consulted with said it’s impossible to predict the total number of amino acids for a new code (because of codon blocks, wobble positions, etc.) so we went with that higher number. Benner’s team does have polymerases that work with the GATCZP alphabet.

  • This is a nice article but really should give credit to the first people to demonstrate a TRUE functional unnatural base pair. It’s weird that the Scripps scientist in this article didn’t even acknowledge the work of his colleague Prof Floyd Romesberg who was the first to demonstrate the replication of an unnatural base pair inside a living cell.
    Although Benner’s work is interesting, it’s all in vitro with unnatural nucleotides that can be evolved as aptamers, or nucleic acids that Andy Ellington and others have long thought useful novel classes of molecules completely independent of proteins. I think the article confuses things as Benner has not demonstrated the ability to decode his expanded DNA into RNA and protein yet and that’s probably a long long way off and I would suspect unlikely given the properties of his unnatural nucleotides.

  • miRNA, snRP, alteration of DNA sequence by “mis”-repair of DNA methylation, so many things exist that this little article failed to mention in its simplistic misclassification of molecular biology according to a model that is decades behind the times.

  • Considering what we know about the origin and elements of life (as it were) and…In this context; The biodiversity, and the enviroments each spieces adapted to and thrives in, I would find it revealing , to learn more, of any information on the DNA-RNA-Protein developement , and mutation (even in petri dishes) of any facilitating speices. Be it in water-darkness-pressures etc. Especialy with the vision and intent of human space exploration. Suffice it that living adapting and surviving "Out there" (our station) is all experimental at best. So…. in the interest of pure bioscience, I wonder if there are any a logical Parallel endeavor, yet not tamper with the "HUMAN LIFE" implications.

  • This smacks of future control failure.

    We're only just scratching the surface of understanding complex adaptive systems, jamming our big fingers into a 3 billion year complex process is dangerous. Doesn't mean we shouldn't be pushing the boundaries, we just need to be careful of thinking our own complex systems in our brains are better than the ones that have existed for far longer

Comments are closed.