insights puzzle

Solution: ‘Triumph or Cooperation in Game Theory and Evolution’

How well does the Nash equilibrium concept from game theory map to the real world?

Our November Insights puzzle set out three scenarios exploring how competition and cooperation are modeled in game theory and how they might actually interact in modifying the equilibrium between two genes. Let’s work through them to gain a deeper appreciation for the intricacies in applying game theory to real-world situations.

Problem 1

Morra is a competitive hand-and-finger game played between two opponents. It begins like Rock-Paper-Scissors, with both players concealing their hands. At a prearranged signal, both players simultaneously show their hands, which reveal one to five outstretched fingers. In some advanced variants of Morra, you have to guess how many fingers your opponent will show, but for our puzzle, we will restrict our attention to a simpler version called Odds and Evens.

In Odds and Evens one of the players is designated Odd, and the other Even. In our variant both players may choose to show either a single outstretched index finger, or the entire palm with a folded thumb, thus showing four fingers. The sum of the number of fingers shown by both players decides who wins and the number of points the winner gets. Thus, if the first player shows one finger and the other shows four, the sum is five, so Odd wins and gets 5 points, and so on.

Imagine two new players trying their hand (so to speak) at this game. Even reasons as follows: “The game obviously gives both players even chances. In four rounds, on average, I will win either 2 or 8 points, for a total of 10, while Odd will win 5 points two times, also for a total of 10.” So, leaving all to chance, he went ahead and showed one finger half the time and four fingers half the time, at random. Odd, on the other, ahem, hand, thought: “I think there’s something odd about this game. I’m going to mix it up and randomly play one finger three-fifths of the time, and four fingers two-fifths of the time.”

Who wins in the above game? Why does this happen, even though the game looks symmetric? Does the winner have a better strategy?

Solution 1

With the strategies as stated, Odd wins in the long run. Let’s tabulate all the scenarios that can happen in a series of 20 games that cover all possibilities with the specified frequencies.

No. of
Total Odd
6 1 1 2 0 12
6 1 4 5 30 0
4 4 1 5 20 0
4 4 4 8 0 32
Total 50 44

So in the long run, Odd is up 6 points in 20 games, giving an average win of 0.3 points per game. The game is symmetric only if both players play each alternative half of the time, but Odd deviates from this, reducing the possibility of Even winning big and increasing the possibility of Even winning small, thus winning more points even though each player still wins half of the games.

But as Mark Pearson pointed out, Even can observe Odd’s strategy and change his own to get better results: By playing four fingers all the time, Even wins by 0.2 points per game. In turn, Odd can adapt her strategy. Will this cat-and-mouse game ever end? It can, if Odd discovers the best strategy, one that yields the vaunted Nash equilibrium.

This equilibrium strategy, as nightrider pointed out, requires Odd to play, randomly, one finger 13 out of 20 times and four fingers seven out of 20 times. Let’s see what happens when Even tries his previous two strategies against this one.

If Even uses the 50-50 one finger/four fingers strategy, in 40 games, Odd will win 20 games with 5 points each to get 100 points, while Even will win 13 games with 2 points and seven games with 8 points, getting 82 points. Odd accumulates an average of 0.45 extra points per game.

If, on the other hand, Even uses the “four finger always” strategy, Odd wins 26 games at 5 points each to get 130 points, while Even will win 14 games at 8 points each to get 112 points. Odd again defeats Even by an average of 0.45 points per game.

In fact, as you can verify, no matter what strategy Even employs, Odd always does better by an average of 0.45 points per game. That’s the beauty of the Nash equilibrium. Odd has a stranglehold that cannot be broken (and reciprocally, Even has his own strategy that guarantees that he cannot lose by more than this).

How do we find this wonderful Nash equilibrium strategy? As nightrider pointed out, we have to find the point where the partial derivatives of the payoff with respect to both player’s probabilities of displaying a given number of fingers become zero. For a simplified way to get to this answer, let p = the probability of Odd playing 1, and q = the probability of Even playing 1. The expected winnings for Odd per round are:

–2pq + 5(1 – p)q + 5p(1 – q) – 8(1 – p)(1 – q)

This simplifies to 13p + 13q – 20pq – 8.

Now, Odd is looking for a strategy that she can use no matter what Even does. Let’s assume that there is such a strategy that Odd can always use, so p is a constant.

Then the above expression becomes q(13 – 20p) – (8 – 13p). Notice that if we make the first part of the expression equal to zero, then Odd’s expected winnings will become constant, which means that Even will not be able to lower Odd’s winnings by changing q. This happens when 13 – 20p is 0 or p = 13/20, which is the Nash equilibrium, as we verified above. The second part of the expression, 13p – 8, simplifies to 169/20 – 8 = 0.45, which gives Odd’s expected winnings for any value of q. (Mathematically, the above procedure is nothing but a simple equivalent of setting the derivative of our linear expression to zero.)

Problem 2

Amy and Bob are a pair of young twins who, like siblings everywhere, fight a lot and love cake. Their mother frequently bakes a cake that she distributes to them in the following way. She talks independently to each twin and asks about the other twin’s behavior. If neither of them has any complaints, each of them gets half the cake. If only one of them reports a valid infraction by the other, that person gets three-quarters of the cake, the other gets none, and Mom gets the remaining quarter. If both of them report valid infractions, they each get only one-quarter of the cake and Mom gets the remaining half.

  1. What is the best strategy for Amy and Bob if they do not trust each other?
  2. What is the best strategy for them if, on the other hand, they do trust each other?
  3. If there are 100 such events, and you know the total amount of cake that was consumed by the twins, when can you say that there was more cooperation than betrayal between them and vice versa?
  4. As an aside, the mother’s behavior in this example is interesting. How would you quantify the value she places on various factors like fostering trust, reward and punishment, and her own fondness for cake?

Solution 2

  1. If the twins distrust each other, each knows that the other will rat them out on the slightest pretext. Therefore, each one should complain about the other. Both will get only one-quarter of the cake, but that will avoid the worst-case scenario of not complaining and getting nothing. It becomes a competitive game, and this solution is the Nash equilibrium.
  2. If the twins trust each other, their best policy is to overlook the other’s infractions, if any, and not complain. That way they both get half the cake. This is a Pareto-optimal solution, as mentioned in the original puzzle, and it is also equitable. If we treat the amount of cake that the twins eat as a “common good” from their perspective, then this solution also maximizes the common good.
  3. As Mark Pearson wrote: “If both trust/collaborate and say ‘no complaints,’ they get a whole cake between them. If both twins betray each other, they get half a cake between them. If there were 100 possible cakes available (100 repeated events), then if the number of cakes the twins get to eat is closer to 100 than it is to 50 (half of 100 cakes), I’d say that there was more cooperation than betrayal. In other words, more than 75 cakes is more cooperation than betrayal.”

Right on! In a similar way, based on the rarity of cheating phenomena in genomes in contrast to rule-abiding genes, we can say that cooperation is far more common than selfishness among genes.

Note that this question was asked from a historic perspective. In a competitive, adversarial game that recurs only a limited number of times, trust cannot be fostered, as nightrider remarked. In real life, this is the reason to avoid fly-by-night operations: Without open-ended repeated interactions, cheating goes unpunished.

  1. The mother prizes trust and cooperation infinitely more than her liking for cake, as she is willing to forgo cake entirely to achieve it. If trust and cooperation are breached, then she does want to hear about unilateral infractions, if any. She prefers this twice as much as the reporting of bilateral infractions, based solely on the assumption that twice the helping of cake soothes her twice as much. A lot of wrinkles can be added to this, but that would require more knowledge about what exactly is considered a reportable infraction.

As Mark Pearson commented, the problem’s settings do not explicitly reward honesty. For better parenting, Mark came up with an interesting alternate reward structure that separately rewards honesty and good behavior. As you can see, this simple problem can become highly complex when we take it to the real world.

So, in general, the Nash equilibrium is the best solution in competitive situations between entities that are entirely motivated by the game’s obvious payoff. In situations where scenarios are repetitive and trust can be fostered, however, other solutions might be more rewarding to real-world participants, and the Nash equilibrium ceases to be optimal. Most human beings are not motivated by a single kind of reward, so real-world situations will always have extraneous motivations such as fairness or group allegiance that do not conform to the game-theory assumption of constant single-factor self-utility,. Furthermore, for most people there is a physiological and psychological price to repeated conflict (and a corresponding reward for peace) that may not be taken into account in simple game-theory models. Perhaps this is the reason why, as Robert Karl Stonjek mentioned, referencing a paper on the subject, simple game-theory models do not work as predictors of ordinary human behavior. Gametheoryman made a spirited defense of game theory, citing “indirect reciprocity” games. The results mentioned do seem to fit our intuitions about reputation and honor. But as we saw above, humans have all kinds of motivations, including prizing group above self, and in such complexity, Nash equilibriums may not even be reachable or relevant. Moreover, the complexity of human society is immense. For the relatively simple game of three-finger Morra, there is a way to find the optimal strategy known as the Brown procedure, but it requires many tens of thousands of iterations. The complexity of real-world situations dwarfs this game by many orders of magnitude.

As I said in a comment originally made in reply to nightrider, game theory may be able to point to general principles, but the subject material it is applied to is far too complex, and game-theory basics are still far from being fully explicated. Take my analogy of the theory of gravitation: If we lived in a system of six suns, as imagined by Isaac Asimov in his classic story “Nightfall,” their motions would be far too difficult for us to predict in practice using analytical methods, even if we were fully proficient in the theory of gravitation. We could only get some idea in understanding short-term trends in simple situations. Now imagine applying game theory to real-world populations with hundreds of players, each competitive with some, cooperative with others, forming shifting groups of variable sizes, and with some engaging in zero-sum games and others in nonzero-sum ones. There is no way that simplistic game-theory principles can be applied to such situations predictively. The only thing that could come close is a supercomputer simulation that is, firstly, based on a fully developed game theory (that does not exist yet); secondly (and far more importantly), it would need to be based on measured real-world data of the strength and polarity of every pairwise interaction, the size and strength of every shifting alliance, whether it was zero-sum or not, and probably many other factors.

Problem 3

Imagine a pair of alleles A and a that exist in equilibrium at a ratio of 0.6 to 0.4 under normal conditions, in a species that lives for a year and reproduces once a year. The allele A is dominant, so both AA and Aa individuals have similar physical characteristics. A constant allele ratio is generally maintained in the long run by “push-pull” mechanisms in nature. There may be some environmental factors that favor individuals carrying the A allele (AA’s and Aa’s) and would, if unchecked, increase its proportion, whereas other factors would tend to favor the a allele and resist A’s increase. For simplicity, let us assume that such factors occur serially. Assume that under normal circumstances, without any segregation distortion, you have three years during which the environment is such that the A allele is favored. Both AA and Aa individuals have a certain survival/reproductive advantage over aa individuals, and this causes the A allele to increase its proportion by 10 percent in the first year, rising to 0.66. The same degree of advantage is present in the second and third years, allowing the proportion of the A allele to rise further. However, in the fourth year the conditions change and the allele ratio falls back again to the equilibrium value. This happens because aa individuals are favored in the fourth year, and extra copies of the a allele survive and find their way to the next generation. The advantage to aa individuals in the fourth year is proportional to the square of the difference in their numbers from the equilibrium value of 0.16. As an example, if the proportion of aa individuals is 0.12 at the start of the fourth year, the advantage they possess will be four times what they would have had if their proportion had been 0.14. Thus the “force” pulling the gene ratio back to equilibrium increases up to a maximum, the more the ratio deviates from it.

Now let’s say that allele A manages to distort segregation so that 60 percent of the copies of A genes go into the next generation in an Aa individual instead of 50. What would the new equilibrium ratio be? What proportion of A’s cheating will the above mechanism let it get away with?

Solution 3

As nightrider showed, in this situation, the cheating A allele will flood the population and will become “fixed.” This can also be shown using a somewhat complex spreadsheet table that I will not reproduce here. Coincidentally, two subsequent Quanta articles have reported similar phenomena and results: “New Model Warns About CRISPR Gene Drives in the Wild,” by Brooke Borel, which suggests that using CRISPR to “cheat” in editing animal genomes could overpower evolution; and “Choosy Eggs May Pick Sperm for Their Genes, Defying Mendel’s Law,” by Carrie Arnold, which describes similar cheating by egg cells.

This kind of selfishness by genes is, nevertheless, quite rare. There are thousands of dominant genes that have survived in the human genome for millennia. Clearly, there are strong mechanisms by which Mendel’s law of segregation (that decrees equal access to gametes by allelic pairs) is enforced. One possibility is that if one gene can cheat, so can its alleles, restoring equilibrium. If segregation does get distorted, the simple “push-pull” mechanism I described is unable to resist it by itself. A stronger push-pull mechanism proportional to the fifth or higher power, rather than the square of the distance from equilibrium, would be necessary in this case. Note that if a cheating gene prevails, the fitness of the entire species is decreased, and in extreme cases this could lead to extinction.

As usual, there were some great comments from readers. I enjoyed the dialogue between Robert Karl Stonjek and gametheoryman, as well as nightrider’s many mathematically accurate contributions. The Quanta T-shirt for this month goes to Mark Pearson. Thanks to all who contributed.

There will be no Insights column this month. Happy holidays to everyone, and see you next year for new insights.

Comment on this article