Solution: ‘Information From Randomness?’
It’s great to see the enthusiastic reader response to our first Insights puzzle. The puzzle has a deeply counterintuitive solution and raises philosophical questions about randomness and information. I cannot guarantee that all your doubts about the validity of the solution will be laid to rest, but I’ll try my best. I have tried to keep the actual math to a minimum so readers who are not professional mathematicians can still follow the story.
The main puzzle is as follows:
I write down two different numbers that are completely unknown to you, and hold one in my left hand and one in my right. You have absolutely no idea how I generated these two numbers. Which is larger? You can point to one of my hands, and I will show you the number in it. Then you can decide to either select the number you have seen or switch to the number you have not seen, held in the other hand, as your final choice. Is there a strategy that will give you a greater than 50 percent chance of choosing the larger number, no matter which two numbers I write down?
A solution to this problem was published by Thomas M. Cover of Stanford University in 1987 (Thomas M. Cover, “Pick the Largest Number,” in Open Problems in Communication and Computation, ed. Thomas M. Cover and B. Gopinath [New York: Springer, 1987], 152). Incidentally, Cover mentored my colleague Joseph Chang, who brought this puzzle to my attention.
Let’s consider the abstract or “pure math” version of this game: It is played just once and is non-adversarial (i.e., the game host is not intentionally trying to get the guesser to fail). The puzzle was first solved in our column’s comments by Jason, and his solution was well explained in detail by Erica Klarreich, g g, Jeremy Magland, Greg Egan, Steven Alexander and Jamie H., among others. The solution is simple: First choose one of the host’s hands at random. Now comes the key step: You generate an arbitrary or random number entirely from your imagination. Following Erica, let’s call this the guide number G, the original numbers being A and B, with B being the larger. Depending on which hand you selected, you will see either A or B. Now compare the number you see to G. If it is smaller than G, switch to the other hand. If it is larger, stay with the number shown.
How does this solution work? There are three cases to consider. I’ve shown them below on the number line with numbers increasing as usual from left to right. Note that only the relative positions of the numbers is relevant, not their actual values. In the following illustrations, cover the numbers A and B in turn to simulate choosing the number in each of the host’s hands. The decision that the comparison procedure forces you to make is given in bold.
Case 1. G is less than both A and B. The number you see is either A or B.
In either instance, the number revealed is larger than G. Stay! Since you are shown A or B with equal probability, your chance of choosing the larger number B as your final choice is 50 percent.
Case 2. G is greater than both A and B. As before, you see either A or B.
In either instance, the number revealed is smaller than G. Switch! Since you will initially be shown A or B with equal probability, after switching your chance of ending up with B is again 50 percent.
Case 3. G lies between A and B. Again, you see either A or B with equal probability.
This is where the magic happens. If you see A, your comparison will show it to be smaller than G. Switch! If you see B, the comparison will show it to be larger than G. Stay! You end up with the larger number B 100 percent of the time!
Your final probability of winning is the weighted sum of the above three cases. It is easy to see that if there is any probability at all of case 3 taking place, no matter how small, it will tip the weighted sum over 50 percent.
Is there any particular procedure that needs to be followed in coming up with G? No. As long as G is an arbitrarily generated number, it is impossible to choose a G that does not work (a fact that was mentioned in a comment by Eric Wong). The reason is that, no matter how large or small the G you have conjured up happens to be, there will always be numbers on both sides of it. Thus, there is always a finite probability that G lies between the two numbers chosen by the host, which is all that is required to push the probability of a correct answer above 50 percent. (To readers who wonder how probabilities can make sense, given that the real numbers extend to infinity in both directions, check out the posts by Greg Egan or Marcelo on mapping the entire real-number line to the open interval (0, 1) using a strictly monotonic function (one whose value goes upward without ever having a downward slope when you move in a specified direction, as in an upward ramp or staircase). Using this idea, we can express the informal argument I made above in more rigorous mathematical terms. Greg and Marcelo have also done a great job of addressing doubts about the solution expressed by other readers. In fact, these intuitive doubts stem largely from the infinities on both sides of the number line — a difficulty that vanishes when we carry out the mapping mentioned above. Thank you, Greg and Marcelo!)
The beauty of the solution is that it works even if the game is played just once. Some readers may wonder how a probability can be determined and the result checked if the game is never repeated. Just imagine that the people who play this game can do so repeatedly but have amnesia between games and carry no information from previous games, or imagine that there is a large number of parallel universes, in each of which there are players who play the game just once. By adding up the results of all these independent games using math or by computer simulation, we can confirm that the solution works.
Now the question arises: If this were a repeated, real-world game, would the above strategy still be the best one to use? To answer this, let’s turn to the bonus questions where “well-posed” ranges and selection procedures are specified.
Bonus Question 1:
Let’s say we play this guessing game repeatedly, and I promise you that my two numbers will always be between 0 and 10 and randomly generated. What’s your best strategy now, and what is the best success rate you can achieve with it?
In this case, the best strategy is to always choose 5 as your guide number G. Then the chances that A and B are on either side of G are 50 percent, and your probability of choosing the larger number correctly is 75 percent.
The probability of A being less than G is G/10.
The probability of B being greater than G is (10 – G)/10.
So the probability of A < G and B > G is the product of the two or G(10 – G)/100.
The probability of the reverse situation, A > G and B < G, is also G(10 – G)/100.
Adding these two gives the probability of a guide number G being between A and B to be 2G(10 – G)/100.
This function has its highest value of 0.5 at at G = 5.
The probability of success is the weighted sum of the same three scenarios as described for the main problem, which is 0.25 x 0.5 + 0.25 x 0.5 + 0.5 x 1, or 0.75.
On the other hand, if you use a random guide number each time, your probability of ending up with the larger number is lower: only 2/3 or 66.67 percent.
We can show this by considering symmetry.
Since A, B and G are all drawn at random, all three scenarios — G being the smallest, middle or largest number of the three — have a 1/3 chance of being true.
The probability of success is therefore (1/3) x 0.5 + (1/3) x 0.5 + (1/3) x 1 = 2/3.
This problem allows us to see the original question in a new light. When you are using a guide number, you are actually estimating a median of the distribution that the numbers are coming from. You are pretty much doing something similar to what people using common sense will do in this situation. They will look at the revealed number, and if it seems “high” they will go with it, and if it seems “low” they will select the number in the other hand. What they are effectively doing is guessing what the median is, and switching if the revealed number is lower. Hold that thought!
Bonus Question 2:
We play a variant of the first bonus game above, but this time my numbers are generated at random from some range of numbers that you do not know. What’s your best strategy in this case?
Remember, the best strategy is to guess the median. Since you do not know the range, for the first round you should generate G arbitrarily. As the game progresses you can generate a dynamic median, by listing all the numbers you have seen and finding their median. Use the updated median as the guide number for the next round. This strategy was suggested for a repeated game by ger groeneveld (g g?) and actually simulated by Geojenks, who found that within about 13 rounds you get pretty close to the theoretical success rate of 75 percent.
Bonus Question 3:
This time I offer to play you for money: I pay you a dollar if you correctly choose the larger number, while you have to pay me $1.10 if you don’t (to compensate me for the fact that in the original game you can succeed more than 50 percent of the time). I agree to use only decimal numbers between 0 and 10, using no more than three decimal places. However, the numbers are not generated at random: I will use whatever numbers I think will help me win the game. Should you accept this offer?
No, never! This is an adversarial situation with a specified least count, which is an artificial restriction designed to favor me, the host. In this case, the host can win by always choosing numbers that differ by exactly 0.001. As reader Michael James pointed out, there are 10,000 such gaps and your chances of finding the correct one is 1/10,000. So your chance of winning is above 50 percent but only slightly: It is 50.005 percent. This success rate is much too close to 50 percent to overcome the given odds, and you will lose approximately 10 cents every two games. (The original post said that the odds were exactly 50 percent, and I am grateful to Michael James for pointing out the error.)
The moral here is that in an adversarial game the host can reduce your odds to very close to 50 percent, so even with even payouts, this is not a get-rich-quick scheme.
Bonus Question 4:
Now let’s get back to something resembling the original problem, where you know nothing about the range or the method used to choose the numbers, but in this case you do know that the numbers were picked by a human who is not trying to fool you or be adversarial. Would you switch to the number in the other hand if the number in the open hand is: a) 0.6; b) –5; c) 10,000?
In the purely theoretical game, we chose an arbitrary guide number. But in a real-world guessing game, as I mentioned, you can use psychology and context to guess the range and the expected median. The answers here are subjective. Here’s what I would do.
In case (a), I would guess that the host is generating numbers with one decimal place from 0 to 1 — an estimate of correlation, perhaps. The median is 0.5 so I would stay with 0.6. In case (b), the range seems to be small integers on either side of 0, which is probably the median. I would switch. Case (c) is tough. I would guess that the host was playing with powers of 10 that have one to six zeros, with a median of 1,000. Stay!
It Pays to Second-Guess!
Armed with the “estimated median” insight from these real-world examples, let’s address the question: Where does the information come from in the original problem?
My answer is “nowhere,” because even when we succeed in the game, we still do not have any specific information about the unknown number in the other hand at all. Hasn’t using our strategy increased our expectation? No! In fact, our lack of information has decreased it, as is to be expected and as is seen in bonus problem 2. The fault lies in our intuitions that make us feel that the probability should be 0.5.
It is certainly the case that if you were allowed only one guess, there is no way that your probability of success could exceed 0.5. But in this problem you are given a second guess, which allows you to modify your original answer after seeing one number. In this case your probability of success rises to somewhere in the half-closed interval (0.5, 0.75] where 0.5 < p ≤ 0.75. When you have full knowledge of the median and the distribution is uniform, you can succeed 75 percent of the time by using the median as a guide. When you have no knowledge of the median, your second-guess probability falls to 67 percent for the uniform distribution case, and even lower if the host is intentionally trying to make you fail. But even with zero knowledge of the median, and despite the worst adversarial tricks of the host, your second guess still allows you to succeed more than 50 percent of the time no matter what number you select as your estimated median.
Regarding the separate question of why the second guess keeps the probability over 50 percent even in the worst-case scenario, the answer given by several commenters is correct: The second guess allows you to institute a systematic procedure that uses the properties of real numbers, specifically their monotonically increasing nature, and the fact that between any two given numbers is an interval in which we can always find additional numbers. This means that any guide number will always have pairs of chosen numbers that it can fall between, and there is always some probability of this, though it might be minuscule in some cases. Thus, using a random guide number does not increase your probability up from 50 percent by giving you information; in fact, your probability of success decreases from the maximal of 75 percent achievable by gaining knowledge of the median, to a lower value precisely because the guide number does not give you any specific knowledge.
So the answer to the title question of the original column, “Can information rise from randomness?” is an emphatic “no,” as g g suspected. Randomness, by definition, can never give you specific directional information. That it appears to do so in this problem is a result of our flawed intuitive expectation that we would be stuck with a 50 percent success rate. With the second guess at our disposal, we’re not!
The Quanta T-shirt for this month’s puzzle goes to Jason, whose proposed solution got the ball rolling in the comments. An honorable mention goes to all the respondents whose contributions I have cited here and others whom I may have overlooked. Thank you all for contributing to this fruitful discussion!