Origins of Life

How Structure Arose in the Primordial Soup

Life’s first epoch saw incredible advances — cells, metabolism and DNA, to name a few. Researchers are resurrecting ancient proteins to illuminate the biological dark ages.

[No Caption]

Olena Shmahalo/Quanta Magazine

About 4 billion years ago, molecules began to make copies of themselves, an event that marked the beginning of life on Earth. A few hundred million years later, primitive organisms began to split into the different branches that make up the tree of life. In between those two seminal events, some of the greatest innovations in existence emerged: the cell, the genetic code and an energy system to fuel it all. All three of these are essential to life as we know it, yet scientists know disappointingly little about how any of these remarkable biological innovations came about.

“It’s very hard to infer even the relative ordering of evolutionary events before the last common ancestor,” said Greg Fournier, a geobiologist at the Massachusetts Institute of Technology. Cells may have appeared before energy metabolism, or perhaps it was the other way around. Without fossils or DNA preserved from organisms living during this period, scientists have had little data to work from.

Fournier is leading an attempt to reconstruct the history of life in those evolutionary dark ages — the hundreds of millions of years between the time when life first emerged and when it split into what would become the endless tangle of existence.

David Kaplan, Petr Stepanek and Ryan Griffin for Quanta Magazine; music by Kai Engel

Video: How Did Life Begin on Earth?
David Kaplan explores the leading theories for the origin of life on our planet.

He is using genomic data from living organisms to infer the DNA sequence of ancient genes as part of a growing field known as paleogenomics. In research published online in March in the Journal of Molecular Evolution, Fournier showed that the last chemical letter added to the code was a molecule called tryptophan — an amino acid most famous for its presence in turkey dinners. The work supports the idea that the genetic code evolved gradually.

Using similar methods, he hopes to decipher the temporal order of more of the code — determining when each letter was added to the genetic alphabet — and to date key events in the origins of life, such as the emergence of cells.

Dark Origins

Life emerged so long ago that even the rock formations covering the planet at that time have been destroyed — and with them, most chemical and geological clues to early evolution. “There’s a huge chasm between the origins of life and the last common ancestor,” said Eric Gaucher, a biologist at the Georgia Institute of Technology in Atlanta.

Olena Shmahalo/Quanta Magazine

The stretch of time between the origins of life and the last universal common ancestor saw a series of remarkable innovations — the origins of cells, metabolism and the genetic code. But scientists know little about when they happened or the order in which they occurred.

Scientists do know that at some point in that time span, living creatures began using a genetic code, a blueprint for making complex proteins. It is those proteins that carry out the vital functions of the cell. (The structure of DNA and RNA also enables genetic information to be replicated and passed on from generation to generation, but that’s a separate process from the creation of proteins.) The components of the code and the molecular machinery that assembles them “are some of the oldest and most universal aspects of cells, and biologists are very interested in understanding the mechanisms by which they evolved,” said Paul Higgs, a biophysicist at McMaster University in Hamilton, Ontario.

How the code came into being presents a chicken-and-egg problem. The key players in the code — DNA, RNA, amino acids, and proteins — are chemically complicated structures that work together to make proteins. But in modern cells, proteins are used to make the components of the code. So how did a highly structured code emerge?

Most researchers believe that the code began simply with basic proteins made from a limited alphabet of amino acids. It then grew in complexity over time, as these proteins learned to make more sophisticated molecules. Eventually, it developed into a code capable of creating all the diversity we see today. “It’s long been hypothesized that life’s ‘standard alphabet’ of 20 amino acids evolved from a simpler, earlier alphabet, much as the English alphabet has accumulated extra letters over its history,” said Stephen Freeland, a biologist at the University of Maryland, Baltimore County.

The earliest amino acid letters in the code were likely the simplest in structure, those that can be made from purely chemical means, without the assistance of a protein helper. (For example, the amino acids glycine, alanine and glutamic acid have been found on meteorites, suggesting they can form spontaneously in a variety of environments.) These are like the letters A, E and S — primordial units that served as the foundation for what came later.

Tryptophan, in comparison, has a complex structure and is comparatively rare in the protein code, like a Y or Z, leading scientists to theorize that it was one of the latest additions to the code.

That chemical evidence is compelling, but circumstantial. Enter Fournier. He suspected that by extending his work on paleogenomics, he would be able to prove tryptophan’s status as the last letter added to the code.

The Last Letter

Scientists have been reconstructing ancient proteins for more than a decade, primarily to figure out how ancient proteins differed from modern ones — what they looked like and how they functioned. But these efforts have focused on the period of evolution after the last universal common ancestor (or LUCA, as researchers call it). Fournier’s work delves further back than any other previous efforts. To do so, he had to move beyond the standard application of comparative genomics, which analyzes the differences between branches on the tree of life. “By definition, anything pre-LUCA lies beyond the deepest split in the tree,” he said.

Fournier started with two related proteins, TrpRS (tryptophanyl tRNA synthetase) and TyrRS (tyrosyl tRNA synthetase), which help decode RNA letters into the amino acids tryptophan and tyrosine. TrpRS and TyrRS are more closely related to each other than to any other protein, indicating that they evolved from the same ancestor protein. Sometime before LUCA, that parent protein mutated slightly to produce these two new proteins with distinct functions. Fournier used computational techniques to decipher what that ancestral protein must look like.

Helen Hill

Greg Fournier, a geobiologist at MIT, is searching for the origins of the genetic code.

He found that the ancestral protein has all the amino acids but tryptophan, suggesting that its addition was the finishing touch to the genetic code. “It shows convincingly that tryptophan was the last amino acid added, as has been speculated before but not really nailed as has been done here,” said Nigel Goldenfeld, a physicist at the University of Illinois, Urbana-Champaign, who was not involved in the study.

Fournier now plans to use tryptophan as a marker to date other major pre-LUCA events such as the evolution of metabolism, cells and cell division, and the mechanisms of inheritance. These three processes form a sort of biological triumvirate that laid the foundation for life as we know it today. But we know little about how they came into existence. “If we understand the order of those basic steps, it creates an arrow pointing to possible scenarios for the origins of life,” Fournier said.

For example, if the ancestral proteins involved in metabolism lack tryptophan, some form of metabolism probably evolved early. If proteins that direct cell division are studded with tryptophan, it suggests those proteins evolved comparatively late.

Different models for the origins of life make different predictions for which of these three processes came first. Fournier hopes his approach will provide a way to rule out some of these models. However, he cautions that it won’t definitively sort out the timing of these events.

Fournier plans to use the same techniques to figure out the order in which other amino acids were added to the code. “It really reinforces the idea that evolution of the code itself was a progressive process,” said Paul Schimmel, a professor of molecular and cell biology at the Scripps Research Institute, who was not involved in the study. “It speaks to the refinement and subtlety that nature was using to perfect these proteins and the diversity it needed to form this vast tree of life.”

This article was reprinted on

View Reader Comments (15)

Leave a Comment

Reader CommentsLeave a Comment

  • I found the article very interesting, but the author should be more careful with her use of language. Specifically, she should avoid using colorful language that imply an intelligence or goal of evolution, I would cite the sentence: “It then grew in complexity over time, as these proteins learned to make more sophisticated molecules.” Do proteins learn? Are the newer structures “more sophisticated ” or just more complex?

  • It is not obvious how the history of protein synthesis could determine whether membranes, metabolism, or reproduction (let alone replication) came first. Well-established plausible arguments exist for all three, and all three currently have research programs advancing their claims.

  • Good points, Rik. I would add that the title is too flamboyant: The last synthetases were hardly part of the primordial soup. In fact they were close to the end of prebiotic evolution, on the threshold of the first prokaryotes.

  • I would like to say that I think that the author should have mentioned the fact that right hand and left hand amino acids would have been in the early primordial soup in a 50/50 mix. And since life is made of only left hand amino acids that implies that right hand amino acids would be toxic to life and since they would have been in a 50/50 mix with the right hand amino acids and seems they were in water (which they don’t call the universal SOLVENT for nothing) it makes the idea of life arising in that early “Soup” impossible.

  • I had the impression that mutations are not benefical.
    I also find the idea of a mutation to be the fudge factor.
    In other words we cannot get here from there so we proclaim a mutation occurred and that solves everything.

  • @charles, thanks for your comment. Genetic mutations are generally random and can have a positive, negative or neutral effect.

  • @Jason, you’re correct that the mix of left and right-handed mix of amino acids in the primordial soup has posed a challenge for scientists studying the origins of life. However, researchers have come up with a number of possible solutions, including one described here–an RNA enzyme that works in a 50:50 mix.

  • @Hans Thanks for your question, which I passed along to Greg Fournier.
    He says: Yes, many groups have tried to add new nucleotide pairs to the code (you have to add a pair at a time). See:
    Hirao, I. et al. (2006). “An unnatural hydrophobic base pair system: site-specific incorporation of nucleotide analogs into DNA and RNA”. Nat. Methods 6: 729–735.

    Malyshev, Denis A.; Dhami, Kirandeep; Lavergne, Thomas; Chen, Tingjian; Dai, Nan; Foster, Jeremy M.; Corrêa, Ivan R.; Romesberg, Floyd E. (May 7, 2014). “A semi-synthetic organism with an expanded genetic alphabet”. Nature (journal). doi:10.1038/nature13314.

    Apparently it can be done!


  • Regarding the paleogeology, these are no doubt feats of biochemistry performed by Greg Fourneir’s group, however to the very first point “molecules replicate themselves” I believe there is a much larger problem here, which seems to be entirely overlooked, and that is energy or entropy, which are intricately linked. We don’t have evidence to support the notion that chemical systems can decrease their entropy, nor can they pump out entropy. The consequence is that they undergo a “heat death.” Since this article deals with primordial groups of molecules, I’m not sure that “mutation” is really a term that applies chemically, and it’s not clear what sort of mutation “beneficial, harmful, or neutral” would allow these molecules to overcome diffusion or heat death. I explore such issues in

  • I would like to know if a chemist thinks that naturally occurring amino acids could have been present in sufficient concentration to make it plausible that such amino acids would used in the origin of life. What would keep the malliard reaction from turning everything into tar?

  • Matthew, as long as there is an external heat source or a reservoir where unneeded materials can diffuse out into, the remaining material can decrease in entropy. Consider how a refrigerator works. A mutation in a chemical system would occur when a molecule is able to catalyze the creation of near-copies of itself from surrounding raw materials. If the resulting product is different from the original and functions better, then a beneficial mutation has occurred.

  • Jon, good points however, that is quite a leap made between molecules to refrigerators. I was actually referring to a pre-biotic system, a natural aggregation of molecules (or presumed aggregation?) likely in the ocean or other body of water. It would not have a boundary of any kind. Certainly based on what is known (bench-wise) about chemical processes, collections of molecules don’t remove heat or excess entropy, like refrigerators do. I discuss some issues relating to this problem here (apologies for other link not operating:

  • Matthew, in cold climates seawater partially freezes at night as heat and entropy radiate into space. The next day the sun melts the ice, leaving fresher water near the surface and saltier water lower down. In warmer climates, it is more extreme as seawater evaporates, and then cools over land, radiating heat and entropy into space, and fresh water falls to the ground. In both case the entropy on Earth is decreased for a time while the entropy in space increases to make up for it. The same an apply to other chemical processes, such as polymerization.

  • Jon, it is not so much the details of “how” in the model you discuss, but the relative differences in entropy or disorder that I’m referring to. The earth is in a state of shedding order, and heat, it is “trying” to reach equilibrium, and if we imagine that the sun is not inputting energy, then the earth’s conveyors and salt density gradients would progress towards equilibrium. That, on a global scale is what is occurring in a simple bench top beaker system with a collection of molecules. But that relative disorder is key because you’re assuming that clusters of molecules in one region, will have “advantage” over others, when in fact each cluster is “trying’ to rob energy, i.e. increase entropy of the other cluster, as in a salt density gradient trying to diffuse towards another region around it. In such unbounded states you have a real theoretical problem of showing feasibility of advantaged or stable gradients of molecules, especially ever achieving useful polymers, as we know that useful polymers in beakers are degraded. Storms and ocean density gradients encourage mass scale mixing and molecular diffusion, working against order. What evidence might there be to say these natural forces/processes you mention do not increase disorder? It is a fascinating question.

Comments are closed.