After a string of Insights puzzles related to physics, on the relationship between time and entropy, half-lives, overhang and quantum weirdness, we turn this month to the mysteries of biological evolution. Carrie Arnold’s article, “Evolution Runs Faster on Short Timescales,” explores new research showing that genetic changes that are quite brisk when measured over a few generations seem to slow down considerably when measured over millions of years. One of the researchers who have studied this in genomes is Simon Ho, an evolutionary biologist at the University of Sydney. To quote Arnold:
When [Ho] calculated how quickly DNA mutations accumulated in birds and primates over just a few thousand years, Ho found the genomes chock-full of small mutations. This indicated a briskly ticking evolutionary clock. But when he zoomed out and compared DNA sequences separated by millions of years, he found something very different. The evolutionary clock had slowed to a crawl.
The article goes on to describe how, from a mathematical perspective, evolutionary rates decrease exponentially as the timescale increases.
Can we replicate this in a toy version of a single gene? Let’s find out. But first, let’s go over the basics for those who need to brush up on them: Here’s DNA 101 for puzzle enthusiasts.
A gene is a piece of DNA, which is essentially a linear chain of chemical bases that are abbreviated using the letters A, C, G and T. Each of these four letters (bases) appears in random sequence along a given gene, in about equal amounts. Thus the sequence CATGGTACCGAT represents a piece of DNA that is 12 units long. The way DNA works is that each successive three-letter piece of DNA, called a triplet, codes for one of 20 possible units, called amino acids, that make up a protein. Proteins are the body’s workhorses, each one performing different functions thanks to its unique structure and its unique linear sequence of amino acids. Thus in our DNA sequence above, there are four triplets, CAT, GGT, ACC and GAT, each of which codes for a specific amino acid. This piece of DNA, acting through the cell machinery, will form a piece of a specific protein fragment that is four units long.
Now, DNA is generally copied with high fidelity from cell to cell across generations. But, on rare occasions, you can get a “point mutation,” in which one of the letters of the gene sequence is replaced by another random one, causing the gene to produce a different protein, which may be more or less efficient at doing what it was supposed to. This is basically how evolutionary change happens. We can define the speed at which DNA mutates over time as the evolutionary rate: We can measure it over a given period by counting the number of letters that have changed between the original DNA sequence and the current sequence, divided by the number of years that have passed.
OK, lesson over. That’s all the biology we need for our puzzle.
Imagine a gene that is 108 letters with A, T, G, C in random sequence. Assume that every year, there is a random change — one of the letters somewhere on this gene mutates and is replaced by one of the other three. After each year, you compare the current copy of the gene with the original and tally how many letters have changed. After a certain time “the evolutionary clock will have slowed to a crawl” — that is, the number of changed letters will have stopped rising. The evolutionary rate from here on is zero. How many letters of the original gene will have changed at that point? How many years will it take to get to this point? Is the curve exponential?
The above scenario is not very realistic. Every letter in a real-life gene sequence has a different chance of having a mutation that “sticks.” The letters at some locations in the DNA sequence are preserved, because changes in them are catastrophic; others, at inconsequential locations, can change readily. One general rule is that the third letter of every triplet can change easily. This is because the third position in a triplet is often redundant: The first two positions fix the amino acid the triplet codes for.
Assume that the third letter of each triplet is three times as likely to get mutated as are the first and second. Now try to answer the same questions as in Question 1.
I gave this hypothetical piece of DNA 108 letters. What was my reason for choosing that number? Is it because 108 has mystical significance, as a Google search will indicate?
I hope these simple mathematical models give you a feel for how evolutionary rates work. This is, of course, nothing like the complexity of what actually happens in a single real-world gene. First, most genes have many more than 108 letters. Second, even scenario 2 above was an oversimplification: The letters at every location have a different likelihood of being preserved, depending on how important the amino acid they code for is to the function of the final protein. Welcome to biology! Yes, mathematical models can work and can give us some insight, but we must always remember that they are gross oversimplifications. In biology, analytical mathematics can take us only so far, and any attempt to capture the nuance of the real world requires highly sophisticated computer models.
How different this is from the physics we discussed in the last Insights puzzle. In it, some commenters, legitimately voicing one of the influential schools of thought, insisted that mathematics is all there is at the quantum level: Reality either does not exist or cannot be known apart from the models! Whether that is possible is a point worth pondering by all those who make and use mathematical models. For me, it is reassuring to return to the messiness of biology once in a while. Happy puzzling!
Editor’s note: The reader who submits the most interesting, creative or insightful solution (as judged by the columnist) in the comments section will receive a Quanta Magazine T-shirt. And if you’d like to suggest a favorite puzzle for a future Insights column, submit it as a comment below, clearly marked “NEW PUZZLE SUGGESTION” (it will not appear online, so solutions to the puzzle above should be submitted separately).
Note that we may hold comments for the first day or two to allow for independent contributions by readers.
Update: The solution has been published here.