Puzzle

Solution: ‘Information From Randomness?’

It’s great to see the enthusiastic reader response to our first Insights puzzle. The puzzle has a deeply counterintuitive solution and raises philosophical questions about randomness and information. I cannot guarantee that all your doubts about the validity of the solution will be laid to rest, but I’ll try my best. I have tried to keep the actual math to a minimum so readers who are not professional mathematicians can still follow the story.

The main puzzle is as follows:

I write down two different numbers that are completely unknown to you, and hold one in my left hand and one in my right. You have absolutely no idea how I generated these two numbers. Which is larger? You can point to one of my hands, and I will show you the number in it. Then you can decide to either select the number you have seen or switch to the number you have not seen, held in the other hand, as your final choice. Is there a strategy that will give you a greater than 50 percent chance of choosing the larger number, no matter which two numbers I write down?

A monthly puzzle celebrating the sudden insights and unexpected twists of scientific problem solving. Your guide is Pradeep Mutalik, a medical research scientist at the Yale Center for Medical Informatics and a lifelong puzzle enthusiast.

A solution to this problem was published by Thomas M. Cover of Stanford University in 1987 (Thomas M. Cover, “Pick the Largest Number,” in Open Problems in Communication and Computation, ed. Thomas M. Cover and B. Gopinath  [New York: Springer, 1987], 152). Incidentally, Cover mentored my colleague Joseph Chang, who brought this puzzle to my attention.

Let’s consider the abstract or “pure math” version of this game: It is played just once and is non-adversarial (i.e., the game host is not intentionally trying to get the guesser to fail). The puzzle was first solved in our column’s comments by Jason, and his solution was well explained in detail by Erica Klarreich, g g,  Jeremy Magland, Greg Egan, Steven Alexander and Jamie H., among others. The solution is simple: First choose one of the host’s hands at random. Now comes the key step: You generate an arbitrary or random number entirely from your imagination. Following Erica, let’s call this the guide number G, the original numbers being A and B, with B being the larger. Depending on which hand you selected, you will see either A or B. Now compare the number you see to G. If it is smaller than G, switch to the other hand. If it is larger, stay with the number shown.

How does this solution work? There are three cases to consider. I’ve shown them below on the number line with numbers increasing as usual from left to right. Note that only the relative positions of the numbers is relevant, not their actual values. In the following illustrations, cover the numbers A and B in turn to simulate choosing the number in each of the host’s hands. The decision that the comparison procedure forces you to make is given in bold.

Case 1. G is less than both A and B. The number you see is either A or B.

In either instance, the number revealed is larger than G. Stay! Since you are shown A or B with equal probability, your chance of choosing the larger number B as your final choice is 50 percent.

Case 2. G is greater than both A and B. As before, you see either A or B.

In either instance, the number revealed is smaller than G. Switch! Since you will initially be shown A or B with equal probability, after switching your chance of ending up with B is again 50 percent.

Case 3. G lies between A and B. Again, you see either A or B with equal probability.

This is where the magic happens. If you see A, your comparison will show it to be smaller than G. Switch! If you see B, the comparison will show it to be larger than G. Stay! You end up with the larger number B 100 percent of the time!

Your final probability of winning is the weighted sum of the above three cases. It is easy to see that if there is any probability at all of case 3 taking place, no matter how small, it will tip the weighted sum over 50 percent.

Is there any particular procedure that needs to be followed in coming up with G? No. As long as G is an arbitrarily generated number, it is impossible to choose a G that does not work (a fact that was mentioned in a comment by Eric Wong). The reason is that, no matter how large or small the G you have conjured up happens to be, there will always be numbers on both sides of it. Thus, there is always a finite probability that G lies between the two numbers chosen by the host, which is all that is required to push the probability of a correct answer above 50 percent. (To readers who wonder how probabilities can make sense, given that the real numbers extend to infinity in both directions, check out the posts by Greg Egan or Marcelo on mapping the entire real-number line to the open interval (0, 1) using a strictly monotonic function (one whose value goes upward without ever having a downward slope when you move in a specified direction, as in an upward ramp or staircase). Using this idea, we can express the informal argument I made above in more rigorous mathematical terms. Greg and Marcelo have also done a great job of addressing doubts about the solution expressed by other readers. In fact, these intuitive doubts stem largely from the infinities on both sides of the number line — a difficulty that vanishes when we carry out the mapping mentioned above. Thank you, Greg and Marcelo!)

The beauty of the solution is that it works even if the game is played just once. Some readers may wonder how a probability can be determined and the result checked if the game is never repeated. Just imagine that the people who play this game can do so repeatedly but have amnesia between games and carry no information from previous games, or imagine that there is a large number of parallel universes, in each of which there are players who play the game just once. By adding up the results of all these independent games using math or by computer simulation, we can confirm that the solution works.

Now the question arises: If this were a repeated, real-world game, would the above strategy still be the best one to use? To answer this, let’s turn to the bonus questions where “well-posed” ranges and selection procedures are specified.

Bonus Question 1:

Let’s say we play this guessing game repeatedly, and I promise you that my two numbers will always be between 0 and 10 and randomly generated. What’s your best strategy now, and what is the best success rate you can achieve with it?

In this case, the best strategy is to always choose 5 as your guide number G. Then the chances that A and B are on either side of G are 50 percent, and your probability of choosing the larger number correctly is 75 percent.


 Explanation:

The probability of A being less than G is G/10.

The probability of B being greater than G is (10 – G)/10.

So the probability of A < G and B > G is the product of the two or G(10 – G)/100.

The probability of the reverse situation, A > G and B < G, is also G(10 – G)/100.

Adding these two gives the probability of a guide number G being between A and B to be 2G(10 – G)/100.

This function has its highest value of 0.5 at at G = 5.

The probability of success is the weighted sum of the same three scenarios as described for the main problem, which is 0.25 x 0.5 + 0.25 x 0.5 + 0.5 x 1, or 0.75.


On the other hand, if you use a random guide number each time, your probability of ending up with the larger number is lower: only 2/3 or 66.67 percent.


Explanation:

We can show this by considering symmetry.

Since A, B and G are all drawn at random, all three scenarios — G being the smallest, middle or largest number of the three — have a 1/3 chance of being true.

The probability of success is therefore (1/3) x 0.5 + (1/3) x 0.5 + (1/3) x 1 = 2/3.


This problem allows us to see the original question in a new light. When you are using a guide number, you are actually estimating a median of the distribution that the numbers are coming from. You are pretty much doing something similar to what people using common sense will do in this situation. They will look at the revealed number, and if it seems “high” they will go with it, and if it seems “low” they will select the number in the other hand. What they are effectively doing is guessing what the median is, and switching if the revealed number is lower. Hold that thought!

Bonus Question 2:

We play a variant of the first bonus game above, but this time my numbers are generated at random from some range of numbers that you do not know. What’s your best strategy in this case?

Remember, the best strategy is to guess the median. Since you do not know the range, for the first round you should generate G arbitrarily. As the game progresses you can generate a dynamic median, by listing all the numbers you have seen and finding their median. Use the updated median as the guide number for the next round. This strategy was suggested for a repeated game by ger groeneveld (g g?) and actually simulated by Geojenks, who found that within about 13 rounds you get pretty close to the theoretical success rate of 75 percent.

Bonus Question 3:

 This time I offer to play you for money: I pay you a dollar if you correctly choose the larger number, while you have to pay me $1.10 if you don’t (to compensate me for the fact that in the original game you can succeed more than 50 percent of the time). I agree to use only decimal numbers between 0 and 10, using no more than three decimal places. However, the numbers are not generated at random: I will use whatever numbers I think will help me win the game. Should you accept this offer?

No, never! This is an adversarial situation with a specified least count, which is an artificial restriction designed to favor me, the host. In this case, the host can win by always choosing numbers that differ by exactly 0.001. As reader Michael James pointed out, there are 10,000 such gaps and your chances of finding the correct one is 1/10,000. So your chance of winning is above 50 percent but only slightly: It is 50.005 percent. This success rate is much too close to 50 percent to overcome the given odds, and you will lose approximately 10 cents every two games. (The original post said that the odds were exactly 50 percent, and I am grateful to Michael James for pointing out the error.)

The moral here is that in an adversarial game the host can reduce your odds to very close to 50 percent, so even with even payouts, this is not a get-rich-quick scheme.

Bonus Question 4:

Now let’s get back to something resembling the original problem, where you know nothing about the range or the method used to choose the numbers, but in this case you do know that the numbers were picked by a human who is not trying to fool you or be adversarial. Would you switch to the number in the other hand if the number in the open hand is: a) 0.6; b) –5; c) 10,000?

In the purely theoretical game, we chose an arbitrary guide number. But in a real-world guessing game, as I mentioned, you can use psychology and context to guess the range and the expected median. The answers here are subjective. Here’s what I would do.

In case (a), I would guess that the host is generating numbers with one decimal place from 0 to 1 — an estimate of correlation, perhaps. The median is 0.5 so I would stay with 0.6. In case (b), the range seems to be small integers on either side of 0, which is probably the median. I would switch. Case (c) is tough. I would guess that the host was playing with powers of 10 that have one to six zeros, with a median of 1,000. Stay!

It Pays to Second-Guess!

Armed with the “estimated median” insight from these real-world examples, let’s address the question: Where does the information come from in the original problem?

My answer is “nowhere,” because even when we succeed in the game, we still do not have any specific information about the unknown number in the other hand at all. Hasn’t using our strategy increased our expectation? No! In fact, our lack of information has decreased it, as is to be expected and as is seen in bonus problem 2. The fault lies in our intuitions that make us feel that the probability should be 0.5.

It is certainly the case that if you were allowed only one guess, there is no way that your probability of success could exceed 0.5. But in this problem you are given a second guess, which allows you to modify your original answer after seeing one number. In this case your probability of success rises to somewhere in the half-closed interval (0.5, 0.75] where 0.5 < p ≤ 0.75. When you have full knowledge of the median and the distribution is uniform, you can succeed 75 percent of the time by using the median as a guide. When you have no knowledge of the median, your second-guess probability falls to 67 percent for the uniform distribution case, and even lower if the host is intentionally trying to make you fail. But even with zero knowledge of the median, and despite the worst adversarial tricks of the host, your second guess still allows you to succeed more than 50 percent of the time no matter what number you select as your estimated median.

Regarding the separate question of why the second guess keeps the probability over 50 percent even in the worst-case scenario, the answer given by several commenters is correct: The second guess allows you to institute a systematic procedure that uses the properties of real numbers, specifically their monotonically increasing nature, and the fact that between any two given numbers is an interval in which we can always find additional numbers. This means that any guide number will always have pairs of chosen numbers that it can fall between, and there is always some probability of this, though it might be minuscule in some cases. Thus, using a random guide number does not increase your probability up from 50 percent by giving you information; in fact, your probability of success decreases  from the maximal of 75 percent achievable by gaining knowledge of the median, to a lower value precisely because the guide number does not give you any specific knowledge.

So the answer to the title question of the original column, “Can information rise from randomness?” is an emphatic “no,” as g g suspected. Randomness, by definition, can never give you specific directional information. That it appears to do so in this problem is a result of our flawed intuitive expectation that we would be stuck with a 50 percent success rate. With the second guess at our disposal, we’re not!

The Quanta T-shirt for this month’s puzzle goes to Jason, whose proposed solution got the ball rolling in the comments. An honorable mention goes to all the respondents whose contributions I have cited here and others whom I may have overlooked. Thank you all for contributing to this fruitful discussion!

View Reader Comments (97)

Leave a Comment

Reader CommentsLeave a Comment

  • @ anyone who cares to set me straight or Pradeep Mutalik who stated:
    "So the answer to the title question of the original column, “Can information rise from randomness?” is an emphatic “no,” as g g suspected. Randomness, by definition, can never give you specific directional information. That it appears to do so in this problem is a result of our flawed intuitive expectation that we would be stuck with a 50 percent success rate. With the second guess at our disposal, we’re not!"
    respectfully sir, i must say that i must be completely lost. errhhh, i mean is it not true that this fantastic puzzle as it exists, could in fact not exist as it does, if it were not for randomness? i mean heck it requires the use of random numbers. so that being the case (i.e. random numbers being a requirement for the puzzle) then the information of which the puzzle is constructed is in part random numbers, hence without them we wouldn't even have the puzzle and no further information to garner. so at least in part information does come from randomness seeing as how it's a part of the recipe from which the puzzle is constructed.
    truly would like to see where i'm wrong as this puzzle and nature of randomness truly does interest me.
    again great puzzle, i truly enjoyed it.

  • This is a very interesting article, but the discussion after Bonus Question 3 is wrong. The guesser can choose a guide number that is between valid choices of the other player. There are 10,000 gaps and if the guesser chooses uniformly at random he has a 0.01% chance of guessing the correct gap. Thus his chances of winning this game are 50.005%, not 50% as the article says. Of course, he still shouldn't play for the $1/$1.10 payouts.

  • Wrong!!

    "Your final probability of winning is the weighted sum of the above three cases."
    Your assumption that you can generate a weighted sum is incorrect.

    In case 3 you have chosen a number A < G < B. There are B-A ways to choose that number, out of an infinite number of other options. So the probability of 3 happening is (B-A)/Infinity. That number is undefined. You can't do math with it.

  • The solution to the original problem is intuitive, but has been disguised by the wording. Essentially, switch if the number you see is "small" and don't switch if the number you see isn't "small".

    An ideal definition for "small" would be anything less than the median of the distribution from which the two numbers are chosen. But, we don't have that. So, define "small" in any way you see fit – pull a number out of your head, use 10, use a random number. None of these will be ideal measures of "small", but they are good enough to provide an advantage.

  • Ahh. So now it is a bounded problem? The rules have changed. And based on the solution, it also appears that your promise that “you can use this strategy to guess the higher number with greater than 50:50 odds on the first try” fell somewhat short. If played once, I calculate this statement would be false, 67% of the time. That is hardly a sure bet.

    1) I took your initial statement as “you have no idea how I obtain the two numbers..” to mean exactly that, that it wasn’t bounded. (The numbers aren’t drawn from one of those large plexiglass boxes we’ve all seen in lottos). If it is the unbounded case, what is the probability now of choosing “G” between the two?

    2) Jason’s strategy only works for a bounded case. As I already stated, if there is no information about the two draws there is no information. No information requires that we are assuming that the range is not limited. If the range is non-infinite then that property is now known, and sampling now becomes relevant, and we now have information! I am simply trying to help the problem work.

    3) You indicated several times, that the solution could enable one to win “magically” on the very first try. In other words if someone was allowed to play once. And yet, if this is a bounded problem, say 0-10 or 0-(very large number), there are two options, case 1 and case 2, in which I can get 50:50, and one option, case 3 in which I can get it exactly right those odds don’t stack up. When I play exactly once, I now have only a 1/3 chance of getting better than 50:50 odds. How is that a strategy that gives me higher odds in the first play?

    4) You either have information or no information about the set from which the numbers are drawn. But that distinction is important to determine the conclusions regarding, what I would presume is the point of the article “obtaining information from randomness”. Sure, in a case where there is information, it appears that information can be obtained from randomness. But if there was truly no information as the problem stated, then in order for this to actually comply, it would have to be unbounded.

    That point is relevant to your conditional argument here which I quote because I actually have no idea what you mean by “mapping” infinity:
    “To readers who wonder how probabilities can make sense, given that the real numbers extend to infinity in both directions, check out the posts by Greg Egan or Marcelo on mapping the entire real-number line to the open interval (0, 1) using a strictly monotonic function (one whose value goes upward without ever having a downward slope when you move in a specified direction, as in an upward ramp or staircase.”

    That being said, the human player has no way of navigating such an intractable “map” of infinity as you so easily do in theory, but #4 still holds deductively. We can let “unbounded” mean what it needs to for this argument. Thus a real human player would never be able to conceive of such numbers as we live and operate in the real, physical universe, and inifinity to a real physical being still means what it means.

  • Michael James, you are correct. Thank you for pointing this out. The answer given would only be correct if your choice of guide number was somehow restricted, but of course it can't be unless the host is exercising psychic mental control. Duh! I will change the discussion and cite your comment.

  • I think the comment from Matthew Kosak said it all in the first line:
    "Ahh. So now it is a bounded problem? The rules have changed."

    If we know that A and B lie in some FINITE (aka bounded) interval and also that G must be chosen from within the same bounds, the problem isn't hard at all.
    In fact, you could simply go by the midpoint of the interval instead of choosing a G value and you'd do better than 50-50.

    If A B and G are not restricted by any bound and can lie anywhere on the number line, then the probability that G falls between A & B is Zero (for any probability P>0, you can show the probability of G being between A&B is less than that. If you believe in limits, convergence and mathematics post 19th century, that's 0).

    So this problem appears to be flawed in its statement.
    -Side note: Any simulation on a computer will presume a bounded problem since the values will be limited by the # of bits used for the variables.

  • Ooops! I see Pradeep Mutalik answered while I was typing.
    Though I did say the same thing yesterday in the comments to the original problem.

  • @ Mark B, Matthew Kosak, Max Rockbin:
    No, it is not a bounded problem. The bounded examples were given only to clarify certain ideas. The answer to the main problem works even for the non-bounded case. Please re-read the comments of Greg Egan and Marcelo that are linked in the post above to see how you can prove a probability of winning above 50 percent even for the unbounded case by mapping the number line to the interval (0,1) and using inequalities between probabilities. I will post a detailed explanation of this later tonight.
    @Max Rockbin: My previous comment was directed to Michael James and concerned the discussion after Bonus Question 3 only.

  • In the bounded version (Bonus 1), it says that if you choose G randomly every time then the probability should be 2/3. However, I simulated this and it seems like it is still 75%, not 2/3. Thoughts?

    https://gist.github.com/jacquesattack/e4a6580e8cf3543431b9

  • @ Pradeep Mutalik
    There is a step that says "Then generate a uniform random number z between 0 and 1".

    Any interval on the Reals is uncountable. Which means there is no uniform distribution from which to generate a number from.

    http://math.stackexchange.com/questions/14777/why-isnt-there-a-uniform-probability-distribution-over-the-positive-real-number

    Can that step be fixed?

  • I would argue that this does involve bounded ranges, which is implied by the requirement that the numbers are written down. For arbitrary real numbers the delta between the two numbers could be made arbitrarily small, which would imply an expectation of 50%.

  • @ jacquesattack,
    couldn't read and understand your code (due to my lack of knowledge), but i believe if your getting in the 75% range of wins then you are using the medium number of the range of numbers some how in your code. eg. if bounded by 0 thru 10 then using 5 as a guide, if bounded by 0 thru 100 then using 50 as a guide, if bounded by 0 thru 1000 then using 500 as a guide,……. sorta thing.
    but if you were using random numbers that are between the bounds that is when you end up getting around 67% range of wins.
    that's how it's working out for me in excel. and that's how it should work out according to the explanation put forth by Pradeep Mutalik.

  • @sagef
    In R, the runif() function is the way of drawing numbers from a random uniform distribution (in this case from the unit interval), so the code isn't using the middle number anywhere.

    I'm interested to see how my simulation differs from yours. Could you share your spreadsheet so we can compare?

  • @Mark B

    That the real numbers between 0 and 1 are uncountable is not a problem in any practical sense.

    Suppose Player 1 has specified two numbers, x and y, on slips of paper. Player 2 chooses a hand and sees the number x. Player 2 has also chosen a monotonic function, f, that ranges from 0 to 1, such as:

    f(x) = 1/2 + arctan(x)/pi

    or

    f(x) = e^x / (1+e^x)

    In any case, the number f(x) should be computable, in the sense that it's possible to write a computer program that keeps generating successive binary digits of f(x) for as many digits as you ask. The quantity f(x) might require an infinite number of binary digits to describe exactly, but each time you obtain another digit you pin it down to an ever smaller interval. For example, if the first two digits in its binary expansion are 0.1 then it belongs to the interval [1/2,1), if the first three digits are 0.10 then it belongs to the interval [1/2,3/4), etc.

    For the uniform random number z in the interval [0,1], simply keep generating successive binary digits that are equally likely to be either 0 or 1. Toss a coin, say, or use a quantum noise source. As with f(x), after any finite number of bits you will have pinned down z to some particular interval, which becomes smaller for each new bit.

    So to make your choice as to whether to swap hands, just keep generating bits of both f(x) and z until you end up with different intervals, one higher than the other. In theory this process might go on forever: every time you toss a coin and obtain a new bit for z, the result might remain exactly the same as the sequence of bits you have computed for f(x). But the probability of that happening is zero, and the probability of any very long sequence of coin tosses agreeing will be extremely small. So despite the infinite range available to x and the uncountable nature of the real numbers, there is no practical obstacle to implementing the strategy.

  • Pradeep, in response to your response pointing to the Marcello/Greg Egan mapping of all integers to a (0,1) interval:
    I think you have to be a little more careful about the probabilities.
    There seems to be a presumption of a linear mapping of probabilities but a non-linear mapping of values A,B,G to the (0,1) interval. The problem remains that any finite interval (A,B) from positive integers maps to an arbitrarily small probability of G lying between them whether you map them with some function to (0,1) or not. It doesn't change that argument if you're remapping the probabilities accordingly.

    Also, there seems to be a fallacy in the arguments I think is clear from this example:
    1 + 1/x is strictly greater than 1 for positive x, but the limit of 1 + 1/x as x ->infinity is not >1
    For every single positive x you can think of, 1 + 1/x >1. Yet.. not in the limit.

  • @ jacquesattack,
    i'd be happy to share the spreadsheet. do you have somewhere i could upload it to?

  • @ Greg Egen

    Thank you for the explanation. I'm absolutely convinced that we can pick a G (or z).

    The only part which is unclear is "It is easy to see that if there is any probability at all of case 3 taking place, no matter how small, it will tip the weighted sum over 50 percent."

    So it just needs to be shown that P(case 3) > 0.

    It seems that we can generate uncountably many "close pairs" such that generating successive binary digits of x will always put x < f(A) < f(B) or f(A) < f(B) < x. (If we could generate infinitely many digits of x at once that would not be the case.). So why is P(case 3) defined and >0?

  • @jacquesattack,
    you need to actually generate both A and B, rather than A alone then another Bernoulli variable to indicate that A is larger. A and B's joint distribution would not be i.i.d uniform.
    Also I'm not sure your 's' is correct either – there should be two cases you wind up correct: G<A and A is larger; G>A and B is larger.
    In your setup, I would expect the probability should be 1/2, simply due to symmetry (i.e., G's magnitude has nothing to do with whether A is larger or not, as that depends on a separate, independent coin toss)

  • @Pradeep Mutalik,

    Since you cited Cover's 1987 paper, I guess you must be aware of (one of) the generalization of this puzzle, i.e., the googol game / secretary problem (I posted a similar comment below the original post – please forgive the redundancy).

    While I agree with what you wrote about "it pays to second-guess" after you randomly pick one hand and are shown that number, it's still puzzling why the magnitude of this number would result any improvement on the 1/2 probability in your so-called "worst-case scenario". In particular, we know that this is NOT true for the googol game with n>2. Only the relative ranks matter for the worst-case scenario (Gnedin 1994). I have not seen any convincing argument of why n=2 is special?

  • @Mark B

    I think you're trying to assign probabilities to different possible values of A and B, which is not the correct way to approach the problem. The problem asks for a strategy with a greater than 50% chance of success for any fixed choice of A and B; the probability of the strategy succeeding can be a function of the particular choice of A and B, and there is no requirement for any universal minimum amount by which the probability of success exceeds 50%.

    So, we need to start from the fact that A and B are two fixed, unequal numbers, with B greater than A. In my way of describing the solution, where upon seeing the number x you stick with it with probability f(x), the probability of success is:

    P = 0.5 f(B) + 0.5 (1-f(A)) = 0.5 + 0.5 (f(B)-f(A))

    which is well-defined and strictly greater than 0.5, because B is greater than A and f is a monotonically increasing function.

    If you prefer to use Jason's description, you pick a number G using a probability distribution whose cumulative distribution function is what I call f. That is, the probability of G lying between negative infinity and any number x is f(x). So the probability of G lying between A and B is f(B)-f(A), and the probability that it doesn't lie between A and B is 1-(f(B)-f(A)).

    The overall probability of success is then:

    P = f(B)-f(A) + 0.5(1 – (f(B)-f(A))) = 0.5 + 0.5 (f(B)-f(A))

    as before.

    P(case 3) is f(B)-f(A), which is greater than zero because B is greater than A and f is monotonically increasing. It's perfectly true that there are uncountably many choices of real numbers A and B such that f(B)-f(A) will be smaller than any small number you care to name, but that's beside the point. The problem doesn't ask for a probability of success based on some kind of random choice of A and B from the set of all real numbers; it asks for a probability of success that is greater than 50% for any specific choice of A and B.

  • The solution is not correct, well, sort of.

    Given the description of the game, which emphasizes that one "has absolutely no idea" how the numbers are generated, the natural way to model this is to assume that *any* interval [A,B] could be chosen with equal probability. Since there are infinitely many of those, the probability of each is zero. Put formally: there exists no probability measure on the real line which

    a) assigns a non-zero value to a finite, non-empty interval and
    b) assigns the same value to all other same-length intervals.

    Greg Egan's and Marcelo's mapping of the real line to the open interval (0,1) does not help here, because it does not model a uniform distribution. Intervals near zero get assigned larger probabilities than intervals far out on the real line.

    So, if we model "absolutely no idea" with "uniform distribution of intervals [A,B] on the real line", then the probability of G hitting inside [A,B] is zero, no information is gained and insofar the solution is wrong.

    But, well, does it make sense to use this uniform distribution model. It is completely unphysical, because I would argue that *writing down* the two numbers, as is explicitly stated in the puzzle, takes some time. Tedious to write numbers then have a smaller probability. This results in a non-uniform distribution over the real line, favoring intervals near zero. Now the mappings described by Greg Egan and Marcelo go in the right direction.

    But this modeling process may be endless. Shall we take into account shortcuts like 10^200^200, which are quicker to write than a small number like 487539452759345923759843593245? Shall we run experiments to measure how fast numbers can be written down? …

  • @Harald Kirsch

    There is no model required for the distribution of A and B. In fact, in the original post Pradeep went to great lengths to stress that they are not to be thought of as being drawn from any kind of probability distribution.

    A lot of people have ended up confused by this issue. Perhaps the confusion has been compounded by the fact that the main problem, and "Bonus Question 1", ask for fundamentally different things.

    In "Bonus Question 1", A and B are drawn uniformly from [0,10], and you are asked to come up with the best strategy based on that knowledge. I think it's uncontroversial that the best strategy here is to swap if you see a number less than 5, and that this gives you a 75% chance of success. Note, however, that this strategy does not achieve the goal specified for the main problem, which is to choose a strategy that's guaranteed to have a chance of success over 50% for any pair of values of A and B. In "Bonus Question 1", you might happen to have A=8, B=9, in which case the strategy of always switching when you see a number less than 5 will give you a chance of winning of just 50%. This strategy only becomes the best one when you examine its effect for all possible pairs (A,B) drawn independently from [0,10].

    Well, that's fine for "Bonus Question 1", but people seem to have fallen into the trap of imagining that they can or should do something similar for the main problem. But they can't, because A and B no longer come from a probability distribution, and they shouldn't even try, because the question doesn't ask them to do this. The wording is:

    "Is there a strategy that will give you a greater than 50 percent chance of choosing the larger number, no matter which two numbers I write down.

    The phrase "no matter which two numbers I write down" is not the same as "taken over some kind of probability-weighted average for all possible pairs of numbers I could write down". The former means for any specific values of A and B, the probability of success of the strategy must be greater than 50%. The latter is completely meaningless without some method to assign probabilities to each pair (A,B). I don't believe any such method exists, but whether or not you think there's some sensible way to do this, it's simply not what the problem asks for.

  • I agree with Harald Kirsch. If you have absolutely no information about the numbers, then according to the "principle of indifference", you have to assign equal probability to all possible numbers. But it is not possible, because uniform distribution over the whole real number line simply does not exist.

    So I think the problem cannot be purely abstract. We cannot escape from real world considerations. Having said that, I think the solution still works in practice (at least in our universe!). Reason: no matter how the numbers are generated, e.g. by human, computers or aliens, the numbers have to be drawn from a class of countable number of bounded distributions. The distributions have to be bounded because there is limited resource in the universe (this is my assumption). We have little idea about the resulting "grand mixture" of distributions, but we are sure that it has to be bounded as well. So, as long as we choose a guide number whose absolute value is smaller than the largest possible number this universe allows, the solution still gives a strictly greater than 50% chance of winning the game.

  • Agree with Harold.
    The only real chance for getting a more than 50 % success rate is to choose a G which lies between A and B.
    If the range for choosing A and B is truly unbounded and their selection is random, then any computation of the probability of (G lying between A and B) would involve a division by a denominator which approaches infinity in the limit.
    Sure we can map real numbers to (0, 1) but that doesnt enable us to calculate the probability of a number lying between two random real numbers.

    Common sense appears to win this one hands down.

  • @Harald Kirsh, Eric Wong and Munish Ratanpal,

    (Perhaps the inverse way is easier to understand…)

    As Greg has already pointed out, in every trial you are looking at a specific pair of numbers I have generated. There is no stipulation that these numbers come from a uniform distribution (which is impossible anyway, as you yourselves have pointed out). The process by which these numbers are generated is a black box to you.

    All that is required for you to succeed over 50%, no matter what the numbers are, is that you generate a guide number G such that its cumulative distribution function rises strictly monotonically from a value of 0 to 1 across the entire real number line. The inverse of the mappings Greg and Marcelo specified will do this, as for example G=tan(π(x-0.5)) where x is a random number in the interval (0,1).

    By definition, what strictly monotonic means is that for any given interval AB, the cumulative probability of G at the right end of the interval (B) is greater than that at the left end (A). Therefore, there is a finite probability for G to lie between any two numbers A and B you can think of. Now carry out the comparison procedure, and swap if indicated.

    And that’s all you need.

  • Now this starts to make sense.

    I see now that the solution relies not on assuming one specific probability model, but makes an assertion for all distributions which have strictly monotonic cumulative probability over the real line.

  • @Wiley,
    Re. the googol game/secretary problem – I haven’t looked at Gnedin’s paper in detail, but I would suspect that the reason is just increase in complexity and not because n=2 is special in any other way. For n=2 there is just a single comparison to be made. For n=3 there are already three comparisons, and the number of comparisons goes up rapidly. I suspect that there is a complexity threshold that is already crossed for n=3 and after that is reached, only relative ranks matter.

    This has precedent in several other problems in mathematics such as the possibility of analytical solutions for the motions of two bodies but not for three or higher; and the existence of formulas to find roots of polynomials up to quartics but not further.
    Re. the puzzling nature of the worst case scenario for the current puzzle, check out my previous comment.

    Finally, I have run the secretary problem as a puzzle in the New York Times. Check it out for fun at:
    http://tierneylab.blogs.nytimes.com/2009/11/02/monday-puzzle-the-case-of-the-mixed-up-princess/
    The second part of the first question in the above version of the puzzle is especially interesting to me. It’s the more practical case – what if your aim is to find the best chance to get a high rank rather than the very best? It was solved by readers in the comments.

  • Thanks for the explanations by Pradeep and Greg. But I still have one philosophical conundrum that I don't understand. It is about the interpretation of the probability of winning the game for a given strategy. The problem presupposes that such probability is well-defined and can in principle be evaluated. Pradeep has suggested that the probability can be evaluated using a "thought experiment": play the game repeatedly but have amnesia between games and carry no information from previous games. The average rate of winning should converge to the probability when the number of games tends to infinity. I can think of two possible ways the thought experiment can be carried out:

    1) in each repeated game, the host uses the same predefined black box, possibly randomized, process to generate the numbers, and the player uses the same strategy to guess the greater number.

    2) in each repeated game, the host uses a new black box process which is consistent with the problem description and independent of previous games to generate the numbers, and the player uses the same strategy to guess the greater number.

    It seems that Pradeep and Greg are using interpretation #1, but I think this interpretation is not reasonable. Consider the strategy of always choosing the left hand for the greater number. On one hand, we know a priori the probability of winning the game must be 50%. On the other hand, the probability of winning depends on the predefined black box process. For example, if the predefined black box process is to always generate a greater number on the right hand, the probability is 0. If the predefined process is to always generate a greater number on the left hand, the probability is 100%. Using interpretation #1, the probability is not well-defined.

    If we adopt interpretation #2, we run into the problem of choosing what black box process to generate the numbers. In the absence of other information, it seems unjustified a pair of numbers will be generated more frequently than another pair of numbers. This leads us to the uniform distribution over the real line, which does not exist.

  • Eric Wong writes: "Consider the strategy of always choosing the left hand for the greater number. On one hand, we know a priori the probability of winning the game must be 50%. On the other hand, the probability of winning depends on the predefined black box process."

    No, we don't know a priori that the probability of winning the game must be 50% for that strategy, independent of the black box process. The reason you get a contradiction is because you've stated two contradictory things, and the first of them is false. There are *some* black box processes for which that strategy will succeed with 50% success, but as you correctly deduce shortly afterwards, this is not true for all black box processes.

  • @Eric Wong,

    Here is my speculation where the confusion might be:

    If you go back and read the setup of the puzzle, the randomness actually comes from the player who is guessing, rather than the host. The host is allowed to do whatever he feels like, random or deterministic. The only thing matters is that these two numbers he came up with are different. There is NO assumption at all which method the host adopts. The probability comes into play only when the player flips a coin INDEPENDENTLY and picks a hand.

    If we consider, for a moment, that the game ends right there, i.e., the player says the hand he picks contains the larger number, do you agree that the probability of his guess being correct is exactly 50%? or in your frequentist's setup, as long as the host never fails to generate two different numbers each round and has no way to predict the player's coin toss, you can calculate the frequentist's probability which is 0.5. There is NO necessity to make any assumption of what the host does to come up with these two numbers. You can of course argue real human beings would more likely use such such method or whatever limitation they might have – we don't deny that these extra information are helpful, but the key is that we are putting ourselves in the worst possible situation (we simply don't know) and we have achieved 50% of the successful rate.

    If you have no objection to this base scenario and goes back to read the solution, the 'magic' happens when the player is shown the actual number before he guesses. Now he has this additional piece of information: the magnitude of a number, but it's not at all clear how it would help the player. After all, we have NO information of how these two numbers are generated. The fascinating part is that even in this worst scenario, there exists a strategy that always helps and achieves (0.5 + epsilon) probability, where epsilon > 0.

    Note this probability of epsilon is again introduced by the player himself, because he picks a specific random distribution supported on the whole real line, e.g., a Gaussian distribution and INDEPENDENTLY draws his number G. Can we calculate what exactly this epsilon is? No, because we know nothing about what the host does (the worst scenario), but we can prove it's strictly positive.

    Hope this helps.

  • @Greg,

    It is difficult for me to accept that the always-choose-left-hand strategy does not give 50% winning probability.

  • @Wiley,

    For the coin-flipping setup, I agree the probability does not depend on what the host does. But how about the strategy of always choosing the left hand for the greater number? You have to make some assumptions about the number picking process, say independence of the two numbers, before you can deduce the 50% probability.

    My main point is that for the winning probability to be well defined for every strategy, you have to make certain assumptions about the number picking process.

  • @Greg,

    Let me elaborate. Suppose I am playing the game and I am using the strategy of always choosing the left hand for greater number. Before the game begins, how will you describe my probability of winning? According to your arguments, I think you will say something like:

    "Eric may have 0% chance of winning. He may have 100% chance of winning. Or he may have a chance between 0 to 100%. It just depends."

    I think it is quite absurd.

  • @ Eric,

    What Greg said is not absurd.

    Probabilities are never the well-defined, unchanging things that you seem to want them to be.
    In all such games, to clearly define a probability you always have to specify the strategies, attitudes and knowledge of both players.

    In the current problem, when I said that the probability of success is (.5 < p ≤ .75) that assumed that a) you followed the guide number strategy; b) you wanted to maximize your success; c) you did not have knowledge of the host’s generation algorithm; d) the host was either adversarial or indifferent. If any of these conditions were to change, the probabilities would change too.

    Thus, if the host were to take a liking to you, and knew of, or deduced your strategy, or he just had a sloppy number generating algorithm, he could increase your probability close to 100% by always feeding you two numbers at opposite ends of the number line. Similarly, if you for some reason didn’t want to maximize your success, you could reduce it to less than 50% by doing the opposite of what you were supposed to with your guide number.

    It is a triumph of the current solution that it can increase your probability of success to over 50% (assuming you want to) no matter what the host does. That is huge!

    To go further and try and nail down specific probabilities, you will have to give all the conditions under which a specific contest takes place. Once they are nailed down, you can calculate or simulate the probability in this specific narrow case, if you are so inclined. It will have no bearing on the validity of the result we’ve obtained. The idea that there is some universal well-defined probability for some ideal cases where all pairs of numbers are equally considered is a chimera. It doesn’t exist.

    Probabilities are really fluid beasts. Check out Bertrand’s paradox: https://en.wikipedia.org/wiki/Bertrand_paradox_(probability)
    in which the same task done in three different ways gives three different probabilities. Or check out my puzzle called “The Relativity of Probability” and my two summary comments under it. It will really give you respect for how slippery probability calculations can be unless you nail down all the conditions:
    http://wordplay.blogs.nytimes.com/2010/08/16/numberplay-the-relativity-of-probability/

  • @Eric Wong

    Without specifying more about Player 1's action than the fact that they choose two numbers by some unknown method, there is no such thing as "the probability for Player 2 winning", even if you specify Player 2's strategy completely. There is just not enough information to quantify it. You might say informally that, because you know nothing, the odds of winning are 50-50, and in that purely informal sense no one would argue with you. But as a matter of actual probability theory, there is no basis on which you can calculate any value for the probability of Player 2 winning without knowing more about what Player 1 does.

    That informal, intuitive sense that when you know nothing you should treat the odds as 50-50 is fine. But you shouldn't be surprised, let alone find it absurd, that knowing more details about what Player 2 is doing might allow you to calculate an actual probability, and that probability might then turn out to be anything at all: it need not be 50%.

    If you tell me that your strategy in a coin tossing game is to always call "heads", and I say OK, but I'm not going to tell you yet if my coin has two heads, two tails, or one head and one tail, you might want to claim, informally, that your odds of winning are "50-50". But if we play the game 1000 times (with the same coin), you'll soon discover that it was actually 0%, actually 50%, or actually 100%, because you will have won either 0, roughly 500, or 1000 of the tosses. It just depends on what kind of coin I was using.

  • I now agree with Pradeep and Greg. Thanks for your great explanations. I think the following two assertions are correct:
    1) in the original problem, the winning probability for any given strategy of the player is not well-defined, if we know nothing about the black box process of the host.
    2) what the solution more formally means is that, for every black box process used by the host, the winning probability of the Strategy is strictly over 50%.

  • @Pradeep Mutalik

    Re your point, "There is no stipulation that these numbers come from a uniform distribution… The process by which these numbers are generated is a black box to you."

    The point is not how actually the numbers were/are generated by the host, but what is the contestant's state of knowledge of how the numbers were/are generated. Since the contestant's state of knowledge is nil the contestant must assume equal likelihood over all possible numbers.

  • Colin P writes:

    "Since the contestant's state of knowledge is nil the contestant must assume equal likelihood over all possible numbers."

    I'd recommend a less adamant version:

    "Since the contestant's state of knowledge is nil the contestant might want to consider whether it's useful, necessary and feasible to assume equal likelihood over all possible numbers, and if not he or she ought to try other approaches."

    When a problem involves unknown parameters, one tool that can sometimes be brought to bear on it is the assumption that those unknown parameters are drawn from some suitable distribution of maximum ignorance. But like any other tool, sometimes it's useful, sometimes it isn't. Sometimes it's necessary, sometimes it isn't. Sometimes it's feasible, sometimes it isn't. It's certainly not a universal law of mathematics that such an assumption must be invoked, and in problems where it cannot be invoked because no suitable distribution exists, it doesn't follow that the problem is ill-defined, or intractable.

    For this particular problem, with respect to the parameters A and B, it is neither useful, necessary, nor feasible to assume such a distribution for A and B. We can prove that for absolutely any pair of distinct real numbers A and B, the probability of Jason's strategy leading to a win is greater than 50%. But that is precisely what the original question asked for:

    "Is there a strategy that will give you a greater than 50 percent chance of choosing the larger number, no matter which two numbers I write down?"

    The phrase "no matter which two numbers I write down" is a universal quantifier (asking what's true for every pair of numbers), not an appeal to some undefined average over all pairs of numbers.

    We only need to know the average behaviour of Jason's strategy over the variables with well-defined distributions: the initial choice of left or right hand, both with probability 0.5, and the distribution of the guide number G, which has a strictly monotonic cumulative distribution function f. Using probability distributions for those variables does not compel us to do the same thing for A and B.

  • When I read the statement of the problem, my first thought was that Benford's law should be applicable. (see https://en.wikipedia.org/wiki/Benford's_law).
    Quoting Wikipedia:
    "Benford's law, also called the First-Digit Law, is a phenomenological law about the frequency distribution of leading digits in many (but not all) real-life sets of numerical data. That law states that in many naturally occurring collections of numbers the small digits occur disproportionately often as leading significant digits.[1] For example, in sets which obey the law the number 1 would appear as the most significant digit about 30% of the time, while larger digits would occur in that position less frequently: 9 would appear less than 5% of the time. If all digits were distributed uniformly, they would each occur about 11.1% of the time.[2] Benford's law also concerns the expected distribution for digits beyond the first, which approach a uniform distribution.
    It has been shown that this result applies to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants,[3] and processes described by power laws (which are very common in nature). It tends to be most accurate when values are distributed across multiple orders of magnitude."
    I wonder why nobody considered this.

  • @ Pauli,
    several people did consider Benford's law in the comments of the original puzzle publication here on Quanta magazine.

  • just me maybe, regarding my comment question (1st comment in this list of comments), perhaps my confusion is imbued with semantics, sorta thing, in that i may be missing the mathematical point that Pradeep made regarding information not being able to 'spawn' from randomness. either that or perhaps my confusion stems from the 'apparent' conundrum that is philosophical in nature regarding determinism vs in-determinism, sorta thing.
    in either case i suspect my question regarding my confusion is not within the bounds of the 'scope' of the puzzle.
    bingo? 🙂

  • @Greg

    What then are we to make of Pradeep's statements, "When you have full knowledge of the median and the distribution is uniform, you can succeed 75 percent of the time by using the median as a guide", and, "When you have no knowledge of the median, your second-guess probability falls to 67 percent for the uniform distribution case"?

  • @Colin

    Both those statements are true for uniform distributions within known bounded ranges as specified in the bonus problems. They do not apply for the main problem where the range is not specified and is unbounded, because uniform distributions in this case do not exist.

    I should have made this more clear in my write-up. I apologize if this confused people.

  • @sagef,
    Bingo! 🙂
    And thanks for clarifying questions from other respondents. Do write if anything is still unclear.

  • I read the comments and thought about it some more.

    As the problem was originally stated, A and B are unbounded real numbers. Thus, the probability that one can find a G between A and B is an infinitesimal epsilon, greater than zero but smaller than any positive number. And the chance of success is 50% + epsilon, which converts to 50% as a real number.

    Once you bound the range of A and B, though, then the probability of finding A<G<B becomes becomes greater than zero and non-infinitesimal, and you can then, and only then, achieve >50% chance of success. The extra bit of information is the fact the range is bounded, which allows the ingenious solution to work.

    This problem reminded me of https://en.wikipedia.org/wiki/St._Petersburg_paradox, where, once you bound the casino's bankroll, the expected value of the lottery drops from a paradoxical infinity to a reasonable value. The results are different, yes, but the idea of bounding versus boundless is similar.

  • Let me restate part of my answer as I see the discussion is turning around this :
    It is not possible to get an uniform distribution over an unbounded interval (real line). It is mathematicaly and physically impossible. You either have to drop one of two constrain :
    a) you drop uniformity constrain. It is the random generator proposed by Greg, it will yield a bell shaped curve that will be easy to arbitrate.
    b) you drop the unbounded constrain. Then you can have uniform distribution. So it this case a maximum and minimum bound exists.
    In both case we can apply solution given, but we might use more advance strategy to get the median of the distribution, that necessary exists.

  • Hi Pradeep,

    the mapping of reals to (0,1) doesn't give meaningful probability results because –
    1. There can be other monotonic mapping from (0,1) to the real line. for example – ln(y/(1-y)). And we will get a different probability of G lying between any specific A and B for different mappings.
    2. It would give a higher probability for G lying between A and B if A and B are spaced further apart, and lesser probability if A and B are closer together.
    however, the cardinality of ANY interval on the reals is the same as the cardinality of the reals themselves so the probability being more or less depending on spacing of A and B is not correct. In fact, that is all that these mappings prove – that the cardinality of any interval on reals is same as the reals.
    It is wrong to assume that we can make any meaningful computation of probability using such mappings.
    If at all we want to go down this route, we would need to 'undo' the effect of the mapping to get the correct probabilities but in this particular case that would land us back to square one.

    Is there a probability that we can select a guide G that lies between A and B? Yes, but is is infinitesimally small and can be ignored for all practical purposes.
    it is similar to trying to guess where the pointer of a spinning wheel will stop. Since it has to stop somewhere, we can always say that there is a small probability of making a correct guess, but for all practical purposes the correct value is zero.

  • @Pradeep

    Thank you Pradeep, for my part I think I have one more question for you which I am trying to formulate! But I have another one for Greg first. Also, thank you for a very interesting and informative discussion.

  • @Greg

    In your July 10 post ("Oops, I wrote a little hastily", to the puzzle before solution) you gave a formula for the probability of guessing correctly, namely 0.5 + f(x) – f(y). I do not believe that formula is correct, or you were using the token f(x) in two different ways within the post, reason being that earlier in the post you used f(x) to represent a monotonic and increasing function, but the particular choice of monotonic and increasing function is arbitrary therefore the formula 0.5 + f(x) – f(y) will yield different results depending on the function chosen. I believe the calculation of the probability of guessing correctly can be calculated only with knowledge of A, B and the distribution from which the random variable G is sampled. Am I right or am I wrong?

  • @ jacquesattack,
    below is how i set up my excel spreadsheet for bonus question #1 :
    set up a column starting at cell B6 for random guide #'s using formula =RANDBETWEEN(1,10) label random guide #'s
    and another starting at cell E6 for # of trials (just make it the number 1 (one) on down through however many trials you want while making all other columns drop down that many cells to correspond to the number of trials) we’ll try making 100 trials at first, sorta thing label # of trials
    and another starting at cell G6 using formula =RANDBETWEEN(1,10) this column label hand #1
    and another starting at cell I6 using formula =RANDBETWEEN(1,10) this column label hand #2
    and another starting at cell K6 using formula =IF(G6>5,1,2) this column label actual hand chosen
    and another starting at cell M6 using formula =IF(G6>I6,1,2) this column label actual largest hand
    and another starting at cell O6 using formula =IF(M6=K6,1) this column label win/loss
    click on cell B6 then holding shift key down click on cell O6 (you can now drag all of those cells down the number of cells to correspond to the number of trials you want to create ( in this case we’ll drag down to cell O105 thus creating 100 trials)
    create a cell Q6 using formula =SUM(O6:O105) label this cell total winners note: it would be O105 if you want 100 trials but could be set to whatever number of trials you want, you just want to be able to sum all of the trials, sorta thing.
    create a cell S6 using formula =(Q6/E105)100% and label it percent won
    so far we have set up the spread sheet so that it will simulate bonus puzzle #1 using medium guide numbers and we shall find that the percent won will fall right around 75%
    now we’ll set up further cells that shall simulate bonus puzzle #1 using random guide numbers
    set up a column starting at cell W6 using formula =IF(G6>B6,1,2) this column label actual hand chosen
    and another starting at cell Y6 using formula =IF(G6>I6,1,2) this column label actual largest hand
    and another starting at cell AA6 using formula =IF(Y6=W6,1) this column label win/loss
    click on cell W6 then holding shift key down click on cell AA6 (you can now drag all of those cells down the number of cells to correspond to the number of trials you want to create (100 trials in this case)
    create a cell AC6 using formula =SUM(AA6:AA105) label this cell total winners note: it would be AA105 if you want 100 trials but could be set to whatever number of trials you want, you just want to be able to sum all of the trials, sorta thing.
    create a cell AE6 using formula =(AC6/E105)100% and label it percent won
    so now we have set up the spread sheet so that it will simulate bonus puzzle #1 using medium guide numbers and random guide numbers and we shall find that the percent won will fall right around 75% when using medium guide numbers and 67% when using random guide numbers.

  • @Wiley
    Thanks for your comments. I changed the simulation to explicitly choose a and b and I got the 66% answer. I think this is the part that I was misunderstanding "A and B's joint distribution would not be i.i.d uniform.".

  • @Colin P

    First, to clear up one thing, in my first comment of July 10, written too hastily, I made an elementary error in the algebra and gave the probability as 0.5 + f(x) – f(y), which is why I needed to write a second comment, the one starting "Oops, I wrote a little hastily", where I corrected this to 0.5 + 0.5 (f(x) – f(y)).

    Also, to be clear, my x is what everyone is now calling B and my y is what everyone is now calling A. So with those names for the variables, the probability is:

    P = 0.5 + 0.5 (f(B) – f(A))

    "I believe the calculation of the probability of guessing correctly can be calculated only with knowledge of A, B and the distribution from which the random variable G is sampled. Am I right or am I wrong?"

    You're right, but you've missed the connection between the solution in the form in which I presented it and the solution in the form in which Jason presented it. "The distribution from which the random variable G is sampled" is completely described by the choice of the function I called f: f is just the cumulative distribution function of that probability distribution, i.e. for any x:

    P(G is less than or equal to x) = f(x)

    or equivalently:

    f(x) = the integral from minus infinity to x of P(G) dg

    The cumulative distribution function of any probability distribution is automatically non-decreasing (being the integral of a non-negative function), and if P(G) is non-zero everywhere, as well as non-negative, the cumulative distribution function is monotonically increasing.

    I won't repeat my proof that Jason's solution gives exactly the same probability in terms of f as my solution; you can see this in an earlier comment to this post, dated July 23, 2015 at 10:45 pm:

    https://www.quantamagazine.org/20150722-solution-information-from-randomness/#comment-351837

  • Very disappointing that Pradeep chose to resort to hand waving in his statement of the solution – actually pointing to non-rigorous (and actually misleading) posts about remapping the infinite to a finite interval, rather than directly answer what he clearly knows to be many readers' difficulty with the published solution.

    His answer (and even statement of the problem) seems unnecessarily patronizing to what must be a sophisticated audience reading Quanta. He simply says that anyone concerned about the issue of infinities need only read one of the comments when the whole tricky part of the problem and solution revolves around careful and disciplined understanding of infinities and probabilities. (the probabilities are not mapped to the interval in a way the preserves the flat/random distribution. This is not addressed at all)

    From the comments I'm clearly not the only one who takes issue with (at the very least) the statement of the solution and its utter lack of rigor around the most critical issue. Take away infinity and the problem is trivial.

  • @Greg

    Thanks you've set me right, 0.5 + 0.5 (f(x) – f(y)). I was referring to the fact that a few lines earlier you wrote f(x) = 1/2 + arctan(x)/pi, you've clarified.

  • @Munish Ratanpal, Max Rockbin

    No one is suggesting that any map from the set of all real numbers to the finite open interval (0,1) will map a uniform probability distribution to another uniform probability distribution. That's impossible. But it's also completely unnecessary. All that's actually required of such a map is that it preserves the order relationships. That is, if x is greater than / less than / equal to y, then f(x) is, respectively, greater than / less than / equal to f(y).

    There are an infinite number of different functions that meet these requirements, e.g.

    f(x) = 1/2 + arctan(x)/pi

    f(x) = e^x / (1 + e^x)

    f(x) = (erf(x) + 1)/2

    That there are infinitely many functions like this simply means that there are infinitely many variations on the winning strategy. Different choices of f correspond to the use of different probability distributions when picking G, with:

    P(G) = df(G)/dG

    i.e. the probability distribution for G is the derivative of f. Conversely, f is the cumulative distribution function for the probability distribution for G, i.e.

    P(G is less than x) = f(x) for any x.

    So, different choices for the map f yield different values for the probability that G will lie between A and B. But the probability will, in each case, be:

    P(G between A and B) = P(G is less than B) – P(G is less than A) = f(B) – f(A)

    Because B is greater than A, and because we require f to preserve order relationships, f(B) is greater than f(A). So P(G between A and B) is greater than 0. Since Jason's strategy wins 100% of the time when G is between A and B, and 50% of the time when G is not between A and B, the overall probability of success is:

    P(win) = P(G between A and B) + 0.5 (1 – P(G between A and B))

    = f(B)-f(A) + 0.5(1 – (f(B)-f(A)))

    = 0.5 + 0.5 (f(B)-f(A))

    which is greater than 50% for any choice of A, B where B is greater than A, and any choice of f that preserves order relationships.

    Note that nothing we have done here requires f to map a uniform probability distribution on one set into a uniform distribution on another.

    Now, the quantity f(B) – f(A) cannot be bounded below by any positive number: for a given choice of f, there will be choices of A and B that make f(B) – f(A) as close to zero as you like. But that's beside the point, because the problem has not asked for a strategy that yields any bound on the amount by which the probability of success exceeds 50%. The crucial sentence is this:

    "Is there a strategy that will give you a greater than 50 percent chance of choosing the larger number, no matter which two numbers I write down?"

    "Greater than 50 percent" does not mean "greater than 50 percent plus P, for some fixed positive number P". It just means "greater than 50 percent". We've proved that Jason's solution, for any choice of probability distribution whose cumulative distribution function f preserves order relations (or in other words, is strictly monotonically increasing) has:

    P(win) = 0.5 + 0.5 (f(B)-f(A))

    and no matter what values A and B take, so long as B is greater than A, this probability is greater than 50%.

    Max Rockbin writes:

    "1 + 1/x is strictly greater than 1 for positive x, but the limit of 1 + 1/x as x ->infinity is not >1
    For every single positive x you can think of, 1 + 1/x >1. Yet.. not in the limit."

    Yes, but what the problem, as stated, actually demands of the winning strategy is: for every pair of distinct real numbers, the probability of success is greater than 50%. It does not demand that:

    limit A->infinity of P(win) is greater than 50%

    or

    limit A->B of P(win) is greater than 50%

    That these limits are not greater than 50% does not invalidate the winning strategy, or imply any finite bound on the range of choices for the two numbers. If the aim of the puzzle had been to find a strategy that met these additional requirements, they would have needed to be stated as additional conditions.

  • @Pradeep (and @Max, @Jason, @Anyone really)

    I may have gotten half-way there with the puzzle! What I mean is I now realize that there is a definite non-zero probability (not just a theoretical one) that the contestant's random number will fall between the host's left and right hand numbers. To me this is one of the counter-intuitive things about this puzzle and I think it is because it turns out (counter-intuitively) that the infinities do not work for the host and against the contestant, but in fact they work against the host and for the contestant. The difficulty I had was that I was thinking about the finite difference between the hosts two numbers and how infinitesimally small that difference is compared to the whole real line, and how the contestant's random number could be anywhere on the real line so what possible likelihood was there that it could fall between the host's two numbers? I now realize that this picture springs from an intuitive but incorrect feel about how the random number should fall with equal likelihood anywhere on the real line. But in fact it turns out that not only is that picture not helpful, it is also not possible. What is actually going on is that a random number is generated according to some non-uniform spread of probabilities over the real line, a non-uniform distribution, and for example one reader suggested using a Gaussian distribution (normal distribution) for this task, and Jason generalized the choice to the technical term of a distribution with non-compact support.

    So when the puzzle is played the host makes a specific choice of a pair of numbers, a and b. With the host's choice of a and b, the contestant has the benefit of the whole real number line G (the infinities work for the contestant) from which to retrieve a random number g using some random number generator that obeys the rules of the chosen distribution, and given that it obeys the rules then there is a specific non-zero probability that g will fall between a and b. In fact, if you know the numbers a and b, and if you know the distribution from which g is selected, then you can work out this probability absolutely – it will be an actual number.

    Pradeep, is that a half-Bingo? The other half I cannot claim yet. The other half being how this non-zero probability impacts on the guess of which hand holds the larger number.

  • @Max

    Does your frustration arise because you understand the issues completely and you do not feel they have been presented well, or because (like my starting point) you do not understand the issues (especially around the infinities) and you feel the discussion is missing the mark in explaining them?

  • @Colin P

    "… how this non-zero probability impacts on the guess of which hand holds the larger number."

    Now that you've seen that there's a non-zero probability P(G between A and B), you just need to note that in those cases where G is between A and B, if your strategy is to swap hands whenever you see a number less than G, and stick to the original choice whenever you see a number greater than G, you are guaranteed to end up with the larger number.

    In detail: if A is the smaller number and B the larger, then if you chose A initially, it will be less than G, so you will swap and end up with B, which is the larger number. If you chose B initially, it will be greater than G, so you will stick with it and again end up with the larger number.

    So P(winning when G between A and B) = 1.0

    In those cases where G is NOT between A and B, the same strategy will lead you either to swap whether you see A or B, because both are less than G, or to stick with the original choice whether you see A or B, because both are greater than G. Since these "always swap" or "always stick" results happen independently of which hand you chose, the final choice of hand is every bit as random as the original choice, and so you have an unchanged, 50% chance of ending up with the larger number.

    So P(winning when G NOT between A and B) = 0.5

    Now you just need to combine the two possibilities: either G is between A and B, or G is not between A and B. We combine these to get the overall probability of winning as follows:

    P(win) = P(G between A and B) P(winning when G between A and B) + P(G NOT between A and B) P(winning when G NOT between A and B)

    = P(G between A and B) * 1.0 + P(G NOT between A and B) * 0.5

    = P(G between A and B) + (1 – P(G between A and B)) * 0.5

    = 0.5 + 0.5 * P(G between A and B)

    Since P(G between A and B) is greater than zero, P(win) is greater than 0.5.

  • @Colin,

    I would call that a 99% Bingo 🙂

    The rest is easy. For all cases (100%-the “small number”) where G does not lie between A and B, your chances of a win are 50%. For the “small number” when G does lie between A and B, your chances are 100%. Therefore your total probability summed over both cases is more than 50%. Simple!

  • @Max Rockbin and Munish Ratnapal,

    There is no hand waving and no (intentional) misleading.
    I would like to reiterate that this problem has a beautiful and subtle solution that works for the unbounded case. If I still have not been able to convey it to you, the fault is not in in the solution, but in my explanation, and I will attempt to explain it to you for as long as it takes. Unfortunately, this is likely to be a finite time – alas, in this respect, life imitates art.

    Later today, I will attempt once more to convince you with a detailed post. But please remember that this is a two-way street. Please consider that several "sophisticated readers of Quanta" such as Jason, Erica Klarreich, g g, Jeremy Magland, Greg Egan, Steven Alexander and Jamie H. appreciated and understood this solution even without me having to explain it. Others like Harald Kirsch and Eric Wong had much the same difficulties as you, but in the end were convinced by the argument. So do consider that there may be something about the explanation that you are not getting, and try to figure out what it is. I assure you that the effort will be worth it.

    In this respect, I would like to commend the explanations of Greg Egan enthusiastically and whole-heartedly. Greg is an absolute superstar – completely rigorous mathematically, brilliantly articulate and prolific in his comments. Frankly he dazzles me, and we are fortunate to have him here. Thank you, Greg, I feel at peace knowing that you are there when I cannot respond to someone right away. Anyone with doubts will do well to try and follow Greg's explanations.

    I will weigh in later too, as I promised above. For now, I just want to say one thing:
    The differences between the bounded and unbounded case are not as much as most people imagine, in theory and even (gasp!) in practice.
    Remember that within a bounded interval there are an infinity of numbers and it is an infinity having the same cardinality as the entire real number line. So for any pair of numbers A and B on the real number line, we can find a pair of numbers A1 and B1 within any bounded interval such that the probability of a guide number lying between A and B under a given distribution on the real line is exactly equal to the probability of a guide number lying between A1 and B1 under a uniform distribution. Yet most people have no problem accepting that the solution works in the bounded case!

    Ponder that for a minute. It's an awe-inspiring thought!

  • @Anyone…

    …who has difficulty thinking about the finite difference between the hosts two numbers and how infinitesimally small that difference is compared to the whole real line, and the seeming improbability of the contestant's random number falling between the host's two chosen number, this is the approach that shed light on this for me…

    Whilst reading around the subject and I was reminded that the total probability of an outcome of some sort (regardless of what it is), i.e. the probability associated with a set of exhaustive and mutually exclusive events, is defined to be 1. Applying this to the context of the puzzle, we know the numbers A and B are different, A < B, and the entire real line can be covered with the infinite set of terms A + n * (B – A) for n = -∞ to +∞. One can think of this intuitively as getting hold of an infinite number of line segments which individually can span from A to B and setting them end to end to cover the entire real line. It follows that if we allow the probability p that G falls in a particular A to B line segment to fall to 0 then we are faced with the conclusion that the probability of G falling somewhere in the real line is also 0, because (number line segments) * p = ∞ * 0 = 0. But in that case we have broken the definition of probability mentioned earlier, that the probability that G falls somewhere on the real line is 1.

    Further reading will reveal that this (in non-mathematical terms) is why it is not possible to have a uniform distribution over the real line, or over the integers either for that matter. Because in like manner the uniform distribution is exactly that – uniform – by definition it is a distribution in which all intervals of the same length have an equal probability pu. Add to this the fact that the total area under the distibution must be 1 (as for all distributions) and we find that a uniform distribution cannot span the entire real line (or the integers, etc) because if it did then either pu = 0 in which case the total area under the uniform distribution would be 0, because ∞ * pu = ∞ * 0 = 0, or pu > 0 in which case the area under the distibution would be infinite, because ∞ * pu = ∞ if pu > 0. In either case the total probability is not equal to 1. Conclusion: a uniform distribution over the entire real line is not possible.

    In fact the way I came at this information was slightly different and I thought I had found a reason why the puzzle was at fault; but then some synapse triggered and I realized that a non-uniform distribution with associated cumulative (probability) distribution function, such as a Gaussian distribution, is the way through this particular impasse.

  • @ non-believers :

    I think that the trick of this game is that people first impression is that, given a random number, chance that another random number to be higher is 50%.
    This is the trick : in fact this would imply a physically and mathematically impossible feature : the ability to generate random number uniformly over the real line.
    This is simply not possible you can have either :
    a) if you want to generate over the real line then you distribution have to be non-uniform.
    b) If you want your distribution to be uniform, then it will be bounded.

    The corollary of this is that in both case, the probability of a random number being in an interval is not necessarily null, (but it might be in the second case if we are outside the bounds).
    The solution of Pradeep and Jason is relying on this corollary.

  • @Pradeep, @Greg

    I got there. Greg, I had to draw a couple of graphs, the result came out exactly as your formulation (of course) I suppose sometimes one just has to do these things for oneself.

    Pradeep have you any comments on what the following means? If the contestant (unwisely) changes methodology and re-programs his/her Gaussian random number generator so that the distribution it draws from is centred on whatever number he/she sees first then his/her probability of success drops to exactly 50%.

  • @Colin,

    Congratulations! Your tenacity and careful, systematic approach are admirable, and an object example to others whose intuition is getting in the way.

    Re. your statement:
    If the contestant (unwisely) changes methodology and re-programs his/her Gaussian random number generator so that the distribution it draws from is centred on whatever number he/she sees first then his/her probability of success drops to exactly 50%.

    This is correct. In fact I was going to add this as an extra question, but decided to drop it.
    What it means is the following:
    Suppose the contestant chooses one of the two hands at random, and the number revealed is 3. Now he reasons fallaciously as follows "My chances of achieving over 50% success are only realized if my guide number G lies between the two numbers A and B. I should therefore choose a guide number as close to 3 as possible, so that my guide number doesn't overshoot the other number and land on the other side of it." So he chooses a very small number, say .0000001, flips a coin and either adds it to 3 or subtracts it from 3 based on the coin toss to arrive at his guide number, say in this case, 3.0000001. This is greater than 3, so he switches to the other number for his final choice.

    I'll leave it to you to figure out why this is not a good idea.

  • @Max, Munish and others,

    I'm afraid my promised explanatory comment is delayed because the editor has asked me to prepare the next puzzle which is coming out earlier than scheduled. I've asked that these comments be kept open for awhile longer so we can thrash things out, even after the new puzzle is published. I hope to come back to this soon.

    In the meantime, I again refer you to Greg Egan's comment submitted on 2015/07/28 at 9:34 am.
    Greg has said everything that needs to be said, in an extremely rigorous and clear manner.

  • @Colin P

    I think it's fun and instructive to consider what goes wrong if you switch the random number generator for G to a Gaussian centred on the number first seen.

    On one level it's obvious why this will bring the chance of success down to 50%: if you pick G from a Gaussian centred on the number you've seen, you are guaranteed a 50-50 split between G being above it or below it, and so you always have a 50% chance of swapping hands. A random choice followed by a 50% chance of swapping is just another unbiased random choice between the left and right hands.

    But it becomes a bit trickier to see exactly what's going on when we note that, as with the original strategy, G will fall into one of three cases: (1) less than A, (2) greater than B, or (3) between A and B. If it lies between A and B, you still get 100% chance of success. So what goes wrong in Cases 1 and 2 to drag things back down to 50% again?

    I believe this comes about because the probabilities for Case 1 and Case 2 are no longer independent of which number you saw first. If you see A first, so the Gaussian is centred on A, Case 1 will have a probability of 50%. But if you see B first, so the Gaussian is centred on B, Case 1 will have a probability slightly less than 50% (in fact, less by the probability of G lying between A and B). Similarly for Case 2, with the effects of seeing A and B reversed. Because these probabilities are different, you need to split the calculations for Case 1 and Case 2 into subcases.

    If we write P(3) for the probability of Case 3, the normal calculation is:

    P(win) = 0.5 (1-P(3)) + 1.0 P(3) = 0.5 + 0.5 P(3)

    But if we use a Gaussian centred on the first number seen, the probabilities of the four sub-cases are:

    P(1A) = 0.25

    P(1B) = 0.25 – 0.5 P(3)

    P(2A) = 0.25 – 0.5 P(3)

    P(2B) = 0.25

    e.g. the probability of seeing A first is 0.5, then the probability of G being less than A is 0.5, so P(1A) = 0.25.

    We can't just add these all up and attribute a 50% chance of winning to Cases 1 and 2 anymore, because now each case has a different chance of winning:

    P(win | 1A) = 0

    P(win | 1B) = 1.0

    P(win | 2A) = 1.0

    P(win | 2B) = 0

    P(win | 3) = 1.0

    So the overall probability of winning is:

    P(win) = P(1B) + P(2A) + P(3) = 0.25 – 0.5 P(3) + 0.25 – 0.5 P(3) + P(3) = 0.5

    as it must be.

    I suppose the moral of the story is that we can't be too cavalier about exactly where G comes from. Just ensuring that G has a non-zero probability of lying between A and B is not enough; it must also be chosen independently of whatever number we saw.

    This means we can't fiddle around with the probability distribution for G in any way after we've seen A or B, in the hope of trying to make it more likely that we end up with Case 3.

  • This is a fascinating puzzle, thank you Pradeep for taking the time to patiently explain the various points. The main lesson I take away from it, is that our brain is easily deceived by statistical concept.

  • The problem is that this problem has been set-up to fail. Like other paradoxes it has multiple issues that allow it to be undefined and, I believe unsolvable. I thought there was a paradox lurking somewhere within it, and I believe it’s been located.

    If you hold (as many do here) that it has a real solution, then you’ve already made certain assumptions that must be held to make that true. Let’s explore those. One of these is that it’s arbitrarily limited to a range (the problem as a whole, the player and host), and is finite. But what is also assumed, and this is not subtle, is that there is a connection (or a connectivity) between the player and the host. Those two factors contradict the implied premises of the game as it was stated.

    The flip side is that 1) it is not necessary that they are connected by their ranges, as I mention “collusion” as a problem before, and 2) it is not necessary or required that the player’s G number be within the range of the host. So many of the realists, are conditionally implying that G is somehow connected with numbers A and B. It doesn’t need to be that situation, even in the real world, but we see that the problem in no way limits that condition since, “we have no information about how they are generated” pretty much establishes no connection. No limit on G as far as its relative range is concerned, will always mean that the probability is not calculable as you do not know especially if the A B range coming from black box #1 is the same range chosen by the player’s black box #2. If the player’s range for G is significantly greater, i.e. outside of the AB range, then probability is quite different, getting closer to .5. There is also no need for overlap of the ranges, again, that’s another assumption in the finite case. But this is a strange sub-case in which it (the G range or black box 1) appears to be known but isn’t, it could be outside the range of black box 2 ,continuously. The even stranger case is that we cannot actually say there is 0% chance of a player guessing within the A-B range, however there is no way to claim what that percentage is, hence the duality of its contradiction. So in terms of any solution, (an argument that there is a solution or method that’s giving >.5), G can define its own range as much as the host can, but each can independently do so of the other.

    I don’t know how it is possible to generate random numbers without setting a range for the black box’s, the player’s and the host’s. If random numbers are being generated without stipulation about range, then I believe you have the infinity problem, (no range) which also leads to no solution, meaning no method for obtaining greater than p= .5. The bottom line is that you don’t know what the probability of overlap is and the ranges for generating the numbers must be set. I don’t believe it’s possible someone can input an undefinable, not knowable number into an equation and get a finite probability of G lying between A and B.

  • @Matthew Kosak

    There are lots of probability distributions that can be used for G, because they are non-zero over the entire set of real numbers. An example that will be familiar to most people is the Gaussian distribution, or bell curve. Gaussian distributions can have different means and standard deviations, but whatever you choose for those parameters, for any Gaussian random variable G, and any distinct real numbers A and B, it is always the case that:

    P(G lies between A and B) is greater than zero.

    So it doesn't matter what range the host has chosen the numbers from, and there is no need for collusion in order for the player to have a non-zero chance of choosing a G that lies between them.

  • Pradeep, Greg – thanks for your patient explanations.
    I now understand that there does exist a non-zero probability of choosing a guide between A and B, using one of the many mappings from reals to (0,1).

    looking forward to more Insights puzzles!

  • One starting point for this puzzle is to apply the fact that random numbers can be generated in a non-uniform distribution. In this situation although each individual number is random and unpredictable, there is a preponderance or distribution in the numbers generated in relation to various points on the real number line. An example would be a random number generator which generates numbers that are random and unpredictable but which, by dint of how it operates, generates half of its numbers between -3 and +3 and the other half between either – infinity and -3 or +3 and + infinity.

  • @Greg Egan
    So if x games are played, in how many games do you expect that the two bounds of G and A-B to overlap? You say it's a finite probability, then what is it? I think that explanation of a standard Guassian distribution sounds like a punt to me. Who or where rather, does it say such a distribution will occur? You assume the playing strategy of the player.

  • I'd like to further elaborate that the sort of distribution, Gaussian or otherwise, is irrelevant. This is independent of the boundary limits set for the individual black boxes, whatever they might be. They could be anything, and they are not required to overlap. But the important point is that they are set and are actual limits.

    I scribbled down the following test example which might help to further explain my point.

    Alright. So you, the game host, and I, the player set out into the vast world of "real numbers", and even in this vast space you may believe there is a chance that my number G will fall within, your two choices, A and B. Just to make that outcome more 'certain' you make your number span very, very, ..very large. You guess .0009000000000001 for a small number and 9999999999999999999999999900009.000 for your very large number, that on appearances is a large span. How can I miss? Well, I'm not going to write it out, but let's say on this round my guess is .000001 times your smallest range number. That's a million fold miss. Outside the range. You could say, well its because you didn't adjust your range properly, or perhaps it wasn't a good shot. We play again but this time I'm sure you're going lower, and you do, you go lower by a trillion times and higher by a trillion times, but I was shooting for the very reasonable , in this case, 1. Your highest number was a trillion times lower, and could not have been higher, as this was the limit of your box.
    These are examples, but we can see that unless there is agreement, between player and host, in theory, the player can always go beyond the host. The only way to prevent this scenario is if there is, again, agreement not to do so.
    This should make sense. We also see that the distribution is irrelevant, as it is the limit that is crucial , hence my remark that this was "punted" and overlooked.
    You assume theoretically, that the number generator is unbounded. But as I already indicated, this can't work as you're drawing from an infinite , limitless set. No random generator will be able to generate that command, it is unparsable. To do so, it would have to know what the limit was, and the answer is, "there isn't one",.. a paradox. The problem I suppose reduces to what is human capacity for scribbling numbers on paper, but that hardly sounds mathematical? But the only way out is if we "agree" not to have a duel of large numbers. Hence my point of "the colluding players."

  • @Matthew Kosak

    You write: "So if x games are played, in how many games do you expect that the two bounds of G and A-B to overlap? You say it's a finite probability, then what is it?"

    A Gaussian distribution has no bounds: any Gaussian variable has a non-zero probability of lying in any non-empty interval [A,B]. If we choose a Gaussian with mean zero and standard deviation 1, the probability distribution is:

    P(G) = 1/sqrt(2pi) e^(-G^2/2)

    The probability that G lies between A and B is:

    P(G between A and B) = integral from A to B of P(G) dG = (erf(B/sqrt(2)) – erf(A/sqrt(2)))/2

    The error function, erf, is monotonically increasing, so for any A, B with B greater than A, P(G between A and B) will be greater than zero.

    If x games were played with exactly the same A and B (by different players, or a single player with amnesia) then the expected number of games in which G was between A and B would be:

    x P (G between A and B) = x (erf(B/sqrt(2)) – erf(A/sqrt(2)))/2

    If x games were played with different values for A and B (A_1, B_1, A_2, B_2, … A_x, B_x), then the expected number of games in which G was between A and B would be:

    P (G between A_1 and B_1) + P (G between A_2 and B_2) + … + P (G between A_x and B_x)

    = (erf(B_1/sqrt(2)) – erf(A_1/sqrt(2)))/2 + (erf(B_2/sqrt(2)) – erf(A_2/sqrt(2)))/2 + … + (erf(B_x/sqrt(2)) – erf(A_x/sqrt(2)))/2

    You write: "Who or where rather, does it say such a distribution will occur? You assume the playing strategy of the player."

    The whole point of the puzzle is to tell the player what strategy they should use. I have just shown that the strategy of using a Gaussian distribution for G will succeed, following Jason's general approach. Any other distribution such that P(G) is non-zero for all G will also work; this is merely demonstrating one concrete example.

    You write: " You guess .0009000000000001 for a small number and 9999999999999999999999999900009.000 for your very large number, that on appearances is a large span. How can I miss? Well, I'm not going to write it out, but let's say on this round my guess is .000001 times your smallest range number. That's a million fold miss. Outside the range."

    You have set A = .0009000000000001, B = 9999999999999999999999999900009.000, and then you have said G happens to be .000001 times A when the game is played. If you were following the concrete example of a winning strategy where G is a Gaussian variable with mean 0 and standard deviation 1, then the probability that G was between the A and B you have specified will be close to 0.5.

    In your example, G happens to lie outside [A,B]. Of course that is possible: no one has claimed that the strategy makes it impossible for G to lie outside [A, B]. We have merely said that, using the correct strategy, there is a non-zero probability of G lying inside [A,B]. In the case you have described, that non-zero probability was about 0.5. That it's also possible for G to lie outside [A,B] is irrelevant. The strategy is not required to succeed in every single game.

    You write: "These are examples, but we can see that unless there is agreement, between player and host, in theory, the player can always go beyond the host."

    To repeat: no one has promised that the player will NEVER pick a G outside of [A,B]. All that the strategy guarantees is that for each game, P(G between A and B) is non-zero.

    You write: "But as I already indicated, this can't work as you're drawing from an infinite , limitless set. No random generator will be able to generate that command, it is unparsable. To do so, it would have to know what the limit was, and the answer is, "there isn't one",.. a paradox."

    No, it's not a "paradox". Erica Klarreich already described a simple, practical way to generate random numbers with no bound. I will quote her exactly:

    "Flip a coin repeatedly until it comes up heads. If the first head appears on the Nth throw, choose G to be a random number between N-1 and N. (To include negative numbers among the possibilities for G, we can do one more coin toss to decide whether to make G positive or negative.)"

    The result of this process can be any real number whatsoever. The probability that G will be very high is very small, but it never becomes zero.

  • Following my previous comment, the next thing one may consider is that the problem is bounded, not unbounded, in the role of the host. To see this, and following convention, consider the difference between the random variables A and B each drawn from the real number line, and the specific data points a and b, drawn from A and B respectively. As we approach the puzzle it begins, "I write down…". The host has not written down the letters A and B, he has written down the numbers a and b. He has made his choice. Now the contestant must see what he can do.

  • Almost Final Words

    Okay, it’s almost time to wrap up the first Insights puzzle. Let me summarize the important insights, and point out the mistaken notions that may still be preventing some people from seeing the solution.

    The Pure Math Question

    The main puzzle is an existence problem in pure math – does there exist a strategy that guarantees that you – the player – can guess the larger number as your final choice after seeing one of the two numbers, no matter what two numbers the host selects?

    The answer is yes, and this fact has been rigorously proved here for the unbounded case. The barriers to seeing the solution are:

    1) The pair of numbers A and B have already already chosen for you by the host using an unknown ‘black box’ process. There is no requirement that they come from a uniform distribution. In fact, a uniform distribution over the entire real line is impossible.

    2) There are an infinite number of winning strategies, all of which involve sometimes switching your guess to the other hand based on the number seen. One way this can be done involves generating an arbitrary guide number G having a non-zero probability of being present in any interval on the real line. There are an infinite number of distributions that have this property, and they can be based on trigonometric ratios, exponentials, Gaussian curves or even by simple coin-tossing. The guide number G must also be generated independent of A and B.

    3) The solution only proves that the probability of winning is more that 50 percent. The actual probability does not need to be calculated, and cannot be calculated without knowing the host’s number generating algorithm. The exact (non-zero) probability of G lying between any given interval can however be calculated, based on the actual distribution the player uses.

    4) For any qualifying distribution selected by the player, the probability of winning may be extremely close to 50% for a given pair of numbers, but it is always higher than 50% by a finite amount.

    5) The infinity of the real line has the same cardinality as the infinity of any interval within it (aleph-one). So for any pair of numbers A and B on the real number line, we can find a pair of numbers A1 and B1 within any bounded interval such that the probability of a guide number lying between A and B under a given distribution on the real line is exactly equal to the probability of a guide number lying between A1 and B1 under a uniform distribution.

    6) As a consequence of the above, there is not much of a difference in the practicality of achieving success between the unbounded and bounded cases. This surprising fact is explained in detail below.

    What about winning in real life?

    I think one of the reasons that some people have difficulty with the unbounded case is that the infinity of the real line seems so vast, that it seems impossible to win in a real-world game. This is certainly true in an adversarial game, where the host is actually trying to prevent the player from winning. But as a consequence of point 5 above, this is also true in the bounded case, as we saw in bonus problem 3. In the adversarial case, whether bounded or unbounded, the host can choose two numbers that are so close together, say 10^-30 apart in the bounded case, that the player’s chances of winning are reduced to very close to 50%. Only cosmic beings playing at the rate of a googolplex games per second and having all eternity can realize the win in such a case. However, this does not undermine the fact that the “pure math” winning strategy does exist even in the adversarial case. “What’s the use of this?” a hard-nosed realist may ask. The pure mathematician will say “It’s the principle of the thing” or as G. H. Hardy put it, “what is useful above all is technique, and mathematical technique is taught mainly through pure mathematics.”

    The utility of this technique bears full flower in the non-adversarial case, where the host is indifferent, as for example in a game show setting. When the host’s pair of numbers are selected at random from any reasonable distribution over the entire real number line without explicit intent to cause the player to fail, the win in the unbounded case is as easy as in the bounded case. I carried out a simulation of 100,000 cases where the host’s numbers A and B were drawn from the entire real number line using the function y=ln(x/(1-x)) with x being a uniformly distributed random variable between 0 and 1, where the player used G=tan(π(x-0.5)) to generate G. The player still won 62.7% of the time. In fact, because the median of both these distributions are centered around zero, the “median” strategy suggested by the very first commenter, Sarah L. of using 0 as the guide number succeeds, as expected, 75% of the time.

    Besides having a common median, another problem with the above distributions is that they have a sharp peak very close to zero (neither of the two generated numbers above 10 in my simulation!). To correct for these problems, I changed the generating distributions to y=(10^15)^ ln(x/(1-x)) with a 70% bias towards negative numbers for the generating distribution and G=10^(tan(π(x-0.5))) with a 70% bias towards positive numbers for G. Now I obtained some hugely negative and positive numbers (absolute values of 10^150 or more in both cases). The player still won 65% of the time. The “dynamic median” strategy which I had suggested as the best for a repeated game won 75% of the time even though the median fluctuated wildly initially.

    So it is just as easy to win in the unbounded case with a non-adversarial opponent as it in the bounded case. Why should this be? Well, any distribution that has appreciable probability density over a wide range on the real line will often generate a pair of numbers that are far apart, and there is a good probability that a guide number generated by almost any distribution will be able to lie between them.

    I offer the following corrections to hitherto uncorrected comments made by people, not as criticism but to allow them and others to overcome false impressions. As they say in reminder letters, please ignore these messages if you have already corrected these impressions ☺.

    Walt Donovan said (and this was echoed by Max Rockbin, Mark B, Matthew Kosak and others):
    “Once you bound the range of A and B, though, then the probability of finding A<G<B becomes becomes greater than zero and non-infinitesimal, and you can then, and only then, achieve >50% chance of success.”

    This is not true as we saw above. What is more important than bounded vs. unbounded is adversarial vs. non-adversarial. The results are the same in the bounded and unbounded cases.

    Walt also said:
    As the problem was originally stated, A and B are unbounded real numbers. Thus, the probability that one can find a G between A and B is an infinitesimal epsilon, greater than zero but smaller than any positive number. And the chance of success is 50% + epsilon, which converts to 50% as a real number.

    This is not correct. There is no such thing as an “infinitesimal epsilon, greater than zero but smaller than any positive number.” For any A and B, the epsilon is an actual positive number, with an infinity of smaller numbers still present between it and 0. The chance of success always remains measurably greater than 50%.

    @Pauli
    Benford’s law is interesting and would have been relevant if the numbers were restricted to numbers with a fixed number of digits. This is not the case in this problem.

    @sagef
    I noticed in your code that you had used RANDBETWEEN(1,10). Didn’t you mean RANDBETWEEN(0,10)? The former will give you success only 74.69% instead of 75% ☺

    @Giovanni
    Thanks! I agree, it is definitely a most fascinating problem. I think we are easily deceived not so much by statistical concepts, but by infinities. Most of the mental difficulties here relate to this.

    The three mantras that need to be chanted ad infinitum ☺ are:
    1. Om – There are no uniform distributions on the real number line.
    2. Om – The probability of a number falling between A and B is not (B-A)/infinity. That quantity is undefined.
    3. Om – The actual probability of a number falling between A and B depends on the distribution and can certainly be finite.

    @Greg Egan
    I can’t thank you enough. You have been stellar! Anyone with specific unresolved difficulties please consult Greg's many explanatory comments. It is surely addressed there, clear as a bell.

    @All
    Comments will be closed here in 3-4 days. Please rush in any unresolved difficulties or feedback before then. Thank you all.

  • @Greg Egan and @Pradeep Mutalik

    Let’s take the first point, that you state that it is a Gaussian distribution. My reply to that is to look at the rule imposed by the game “you have no idea how I generate my number.” Does that sound like you know it is a number within your distribution or within a known distribution? It follows that if I know there is a probability of G falling within A and B, then I know something about how it is generated. Which isn’t possible according to the game rules. The whole exercise was about information limitations, and also getting order from nothingness. But if @Pradeep Mutalik would like to adjust the initial rule to say “I generate my number from a range of x to y” and you also must choose your guess between x and y, that completely changes the problem and the solution as well. It never says the player will choose between x and y, and of course these would have to be actual real numbers to have any meaning to a player, thus the median is also known. You have no idea where such a distribution is centered, nor what the other player’s range is, even though you know your own. If this is confusing it is because the rules are self-contradictory, but unfortunately, that is how the problem was designed.

    The Gaussian distribution implies a fair distribution, either by large randomizing (of outliers) or by controlling input around an average. And the whole point of such a curve is weightedness or higher probability of outcomes around a center, and actually in a fair game we wouldn’t expect our number to be unduly influenced. What is the Gaussian distribution of numbers from a twenty sided die? Are these distributed around an average? How do you have a bell curve when each number in the real number set is just as probable as the next? In a coin toss there is no Gaussian, there are two peaks. Either way, it presumes that the player is not deliberately acting or could not play the game in exactly the manner I illustrated “always shooting for the highest or lowest G possible” the crazy strategy, but as the solution states, he or she can choose ANY G and still win. The player does so, and follows the rules but still gets only .5 odds. Regarding @Pradeep’s recent clarification 8.2.15 he states “..The pure math solution exists even in the adversarial case.” So @Pradeep’s solution is said to apply in ALL cases of play. So the example I provide where the player continues to guess a G that is as large as humanly possible, turns the game into a “get outside of A-B”, which he can do endlessly when facing an opponent with less creativity or an inferior black box generator with a different range of allowable x-y.

    You state: “The whole point of the puzzle is to tell the player what strategy they should use. I have just shown that the strategy of using a Gaussian distribution for G will succeed, following Jason's general approach.”
    No, the objective is to evaluate the solution to the problem as the article states. We therefore have to examine the solution itself, test it, and then consider how it might provide a winning strategy for the player to achieve better than .5 odds in guessing the higher number. @Jason and @Pradeep Mutalik have summarized a mathematical method that is claimed to always work for the player. By appearances, the solution would seem to work for any G and any A or B selected. However,
    1) What is not apparent, is that the solution depends on G and any A or B to be within the same distribution. They either are 100% (p=1) within the same number field or distribution or there is at least a finite non-zero probability (p ≠0) that they will fall within the same distribution or field.
    2) The solution also depends on the fact that the number ranges from which G and A and B are selected , which are themselves, discreet ranges, have a finite probability of overlap.
    If #2 is false, then it’s irrelevant if #1 is true. The two players can be playing in the same field of numbers and never “see” each other, in terms of the probability of G(falling between A and B) as you’ve said. But also, in the sense that the player never experiences any, that is a non-zero probability of increased success with the “solution.” The player doesn’t “see” the advantage with the method.
    That’s because (unbeknownst to either party) they’re ranges don’t overlap, which I call the “non-overlap problem.” This is a real case (with innumerable permutations) as I demonstrate with the one test case.

    Again, at @Pradeep’s recent clarifications 8.2.15, I’m not saying the number range need be bounded, it is the individual range for the G number that is critical, so I’ve been improperly grouped with the others on this point.

    We can agree that there is some distribution of the numbers which are even common to the player and host. In other words, they can select their numbers from the same pot. However the game requires that the player and host select independently and without knowledge of eachother’s guesses. That in my mind, is an expressly stated rule or game edict which should be underlined. If we imagine that these ranges are arbitrary shapes, dispersed over the common field of numbers shared by player/host, we can see that they must overlap in their boundaries if the solution is real. I note that no information is given that suggests a finite boundary on how often this overlap occurs or if it does occur. For all I know the two players can go off forever on a non-overlapping pathway. Why? Because it is never limited, mathematically or otherwise for them NOT to do so. We don’t know the probability in this case, but we would know that probability in a real world game. If we said the both use the same type of black box, or they both draw numbers from the same pool. Etc etc. or even if they both used the coin flip model you quoted from @Erika. We don’t have that information, and I gave you just one example (of limitless ones) in which the range of G doesn’t just randomly fall outside of A B. It never gets there because neither you nor I know what the probability of it getting there is. BTW, the distribution you are using , be it Gaussian or otherwise, forces the player and host to select from a common “all real numbers set” and also to select their guess ranges or limits at some arbitrary distribution where as you say G falls within A and B some of the time, (at least some finite number). It also is irrelevant if it is non-zero. We could say 1’s are out and also 5’s that wouldn’t make any difference.

    So our objective is to evaluate the solution not necessarily give advice to players. (Players are theoretical anyway). I’ve found exceptions to that solution but in fairness I’ve also illustrated how those exceptions would work if we had more information and what cases it would work in.

    If you’re looking for the missing information, it’s the p (of overlap) of the two theoretical ranges or sub-pots that player and host select their guesses. It's also the issue that I've not seen a reply from @Pradeep Mutalik on specifically, but perhaps he will. I've not said that this issue is only relevant to a bounded range, (@Pradeep's 8.2.15) I said the two players generate their own ranges independently via their black boxes, and we have no idea what they will be. That is a very BIG piece of missing information in this problem. It is also a source of the paradox.

  • @ Pradeep Mutalik who says:
    "@sagef
    I noticed in your code that you had used RANDBETWEEN(1,10). Didn’t you mean RANDBETWEEN(0,10)? The former will give you success only 74.69% instead of 75% ☺"

    you are absolutely correct sir, good catch! imho
    and yup i was getting on average around 74.69%

  • @Matthew Kosak

    You write: "Let’s take the first point, that you state that it is a Gaussian distribution. My reply to that is to look at the rule imposed by the game “you have no idea how I generate my number.” Does that sound like you know it is a number within your distribution or within a known distribution?"

    The PLAYER has no idea how the HOST generates A and B. We are told not to assume that the two numbers are drawn from any probability distribution, and the actual statement in the puzzle “You have absolutely no idea how I generated these two numbers” is the HOST telling the PLAYER about A and B. The puzzle itself says nothing about G, because the whole idea of G only appears in the proposed solution.

    The PLAYER can do whatever they like to decide whether to swap hands or not. The question is what strategy they should adopt if they want to improve their chance of winning. There is absolutely nothing in the rules that says the PLAYER can't pick a number G from a probability distribution that is non-zero across the entire real line, such as a Gaussian.

    You write: "It follows that if I know there is a probability of G falling within A and B, then I know something about how it is generated. Which isn’t possible according to the game rules."

    You seem to be very confused about the rules of the game. Of course the player knows something about how G is generated, because it is the player who decides to generate G by a method of their choosing.

    You write: "Either way, it presumes that the player is not deliberately acting or could not play the game in exactly the manner I illustrated “always shooting for the highest or lowest G possible” the crazy strategy, but as the solution states, he or she can choose ANY G and still win."

    If the player chooses G from any probability distribution that is non-zero across the entire set of real numbers, there will be a non-zero probability of G falling between A and B. A non-zero probability is not the same thing as winning in every case. You keep giving examples where the player does something and happens not to win. That's beside the point. There is no strategy that guarantees that G will always fall between A and B, but there is a strategy that produces a non-zero probability that G will fall between A and B.

    You write: "So our objective is to evaluate the solution not necessarily give advice to players."

    The solution consists precisely of advice to the player. If the player does anything else except follow the strategy offered by the solution, then the results of the game are completely irrelevant to evaluating the solution.

  • Continuing from my previous two posts, please would you dear reader act as puzzle adjudicator and copy down the two numbers a and b that the host has selected? We don't want him changing them later! That would make him a bit of a random variable! What are his two numbers that you have copied down? 10 and 15? 1002 and 1002.5? Only joking, you mustn't tell me, that would ruin the puzzle.

    Previously we arrived at the point where the contestant must decide what to do. From what the contestant has heard he believes he might have a chance with this puzzle if he can generate a random number g that has a chance of falling between the host's a and b, so he heads over to the two walk-in store cupboards that hold the random number generator machines, and since the host said he (the host) could choose any two numbers the contestant ignores the cupboard marked [Bounded] and walks in to the store cupboard marked [Unbounded (-infinity to +infinity)]. Loads of machines in there, all marked with labels [Such-and-such distribution]. But try as he might he cannot find one marked [Uniform distribution], pity it felt like that would be a good choice. But he does notice that all the machines have a little green sticker saying, [Area under curve certified to be 1]. The one in the middle of the room, marked [Gaussian distribution] has the same green sticker, and seems to be well used. It has loads of buttons and levers to adjust the distribution and its centre, the contestant decides to concentrate on zero centred curves, picks up the instruction booklet, and on its pages reads: [Curve given by y = f(x); Area under curve from -infinity to +infinity = 1, i.e. integral from -infinity to +infinity of f(x) = 1] (The green sticker image is printed on that page too.) He realizes the machine is saying that the area under the curve is finite (and converges to 1) even in the limit of allowing x to go from -infinity to +infinity.

    Turning a page he reads, [Area under curve from -infinity to x = integral from -infinity to x of f(x) = probability that number generated g is less than or equal to x].

    Turning another page he reads, [Probability that number generated lies between two numbers a and b is (integral from -infinity to b of f(x)) – (integral from -infinity to a of f(x)) = integral from a to b of f(x) = area under curve between a and b]. He realizes also that this area is greater than 0 if a not equal to b.

    The contestant thinks this sounds highly promising! The machine is designed so that it has a non-zero probability of generating a random number that will fall between two fixed numbers a and b! Isn't that just what he wanted to find? Trouble is he remembers he's only going to know one of the host's two numbers, let's say it's a. He decide to have a think about that, and have go with this machine in a minute. But first he decides to double-check in case he missed the machine marked [Uniform distribution]

  • @Pradeep

    FYI I had to post using square brackets because your security software blocked the exact same post written with double-quotes

  • @Colin

    As you can see, other comments do display double quotes without a problem.
    If you generated the document in a word-processor, it may have converted the quotes to "smart quotes" which might have triggered the rejection?
    In general, it is best to convert word processor documents to plain text before cutting and pasting from them, as this prevents unwanted characters from being embedded in them.

  • @Pradeep

    I know it's a bit off-topic, but I wrote it in Notepad so they were definitely not smart quotes. I've tried again just now with same result. A big red cross appears to tell me I've been blocked and at the bottom the page says: CloudFlare Ray ID: 21036932d6340a84 • Your IP: 86.167.77.155 • Performance & security by CloudFlare. I'll email the text file to quanta@ in case it's of interest.

  • @colin

    Sorry about that, I know this has happened before, and is probably the result of overly sensitive security software. I'll ask for it to be checked out.

  • @All,
    Here's the comment that Colin was trying to submit. The content is the same as his above comment submitted at 8/3/2015 at 8:50 am, but this will be easier to read.

    —————————————
    Continuing from my previous two posts, please would you dear reader act as puzzle adjudicator and copy down the two numbers a and b that the host has selected? We don't want him changing them later! That would make him a bit of a random variable! What are his two numbers that you have copied down? 10 and 15? 1002 and 1002.5? -0.004 and -0.0051? Only joking, you mustn't tell me, that would ruin the puzzle.

    Previously we arrived at the point where the contestant must decide what to do. From what the contestant has heard he believes he might have a chance with this puzzle if he can generate a random number g that has a chance of falling between the host's a and b, so he heads over to the two walk-in store cupboards that hold the random number generator machines, and since the host said he (the host) could choose any two numbers the contestant ignores the cupboard marked "Bounded" and walks in to the store cupboard marked "Unbounded (-∞ to +∞)". Loads of machines in there, all marked with labels "Such-and-such distribution". But try as he might he cannot find one marked "Uniform distribution", pity it felt like that would be a good choice. But he does notice that all the machines have a little green sticker saying, "Area under curve certified to be 1". The one in the middle of the room, marked "Gaussian distribution" has the same green sticker, and seems to be well used. It has loads of buttons and levers to adjust the distribution and its centre, the contestant decides to concentrate on zero centred curves, picks up the instruction booklet, and on its pages reads: "Curve given by y = f(x); Area under curve from -∞ to +∞ = 1, i.e. integral from -∞ to +∞ of f(x) = 1" (The green sticker image is printed on that page too.) He realizes the machine is saying that the area under the curve is finite (and converges to 1) even in the limit of allowing x to go from -∞ to +∞.

    Turning a page he reads, "Area under curve from -∞ to x = integral from -∞ to x of f(x) = probability that number generated g is less than or equal to x".

    Turning another page he reads, "Probability that number generated lies between two numbers a and b is (integral from -∞ to b of f(x)) – (integral from -∞ to a of f(x)) = integral from a to b of f(x) = area under curve between a and b. He realizes also that this area is greater than 0 if a ≠ b.

    The contestant thinks this sounds highly promising! The machine is designed so that it has a non-zero probability of generating a random number that will fall between two fixed numbers a and b! Isn't that just what he wanted to find? Trouble is he remembers he's only going to know one of the host's two numbers, let's say it's a. He decide to have a think about that, and have go with this machine in a minute. But first he decides to double-check in case he missed the machine marked "Uniform distribution"…

  • @Greg Egan

    The solution to the game states that I may choose any G I like, correct? It is a GUESS. I the PLAYER, choose my GUESS to be between 1 and 10. Always, without fail.

    Now, let’s consider again how this strategy will play out in the solution. We have not yet described how the HOST will select their A and B which is a critical point. The HOST can select the two numbers “A and B” from any range of numbers. “We do not know how these numbers are generated.” So the host pulls an A and B from the range of numbers, 1000.00 to 9999999999999.00. That is a substantial range from which to draw, and they’re pulled randomly. The HOST can choose what the range of real numbers are because these are never limited by the game.

    We play once. (@Pradeep Mutalik indicated that it would work for just one time). I won’t bother to go through the rules described above in the “solution”, but it’s rather evident. It should be obvious that my guess will never fall within A and B? There is zero probability of that occurring. And the consequence is that I lose my slight probability of my strategy improving my odds. In other words "case 3" doesn't exist. So the claim that the solution improves one's odds is not correct.

    I don’t have to know what the HOST will do, or how the numbers are generated. I am complying with the solution strategy, and yet the probability is exactly .5.
    Now, many of you will automatically harp on the fact that this is not really “playing” a game, I’m not even trying, correct? Well, nobody said we had to try, nor do the rules stipulate what kind of G, my guess number has to be. I’m completely within the rules, and yet the solution fails. But that is just a theoretical case, one out of countless theoretical permutations. Remember, “you don’t know how A and B were selected” and you certainly don’t know how the G number was generated.

    I’ll change my reply when you’ve stipulated what ranges I must select from for my G (which means I don’t guess that number) and also what ranges are allowed for A and B (which means we know a median). But then you’ve changed the rules.

  • @Matthew Kosak,

    When I said that the solution works the very first time, I was quite clear that it is your probability of success that is greater than 50%. You could still lose that first game, of course, as in any game involving probabilities, but you will win more often than you lose.

    Now here's what you have misunderstood about the strategy:
    1. Your guess must be generated out of a distribution, like the Gaussian, that extends over the entire number line. So you cannot, "always without fail" choose a number between 1 and 10. If the game is repeated (even if it is just in your imagination) your next guess must come out of the distribution you are using, and consequently could be any number whatsoever, and need not be in the 1-10 range.
    2. Your host does not know what number you have selected. Your guess in a single game could certainly fall within the range 1-10. However the host does not know this. He could have selected numbers higher than 1000, but he could just by chance have also selected a pair of numbers lying on either side of your guess, which would result in you winning.

    Since there is always a chance of this happening on any try, no matter what range he selects, your chances of winning are greater than 50%.

    If your guess G can range over the entire real number line, which the solution stipulates that it must, there is always a finite chance that case 3 will happen.

    P.S. I have also mentioned elsewhere (in a reply to Eric Wong) that you must want to win and apply the strategy correctly. That is always assumed in such solutions because if you want to fail, or don’t apply the strategy correctly as you seem to imply in the last part of your last comment, then nothing can help you, to paraphrase Schiller.

  • two questions:
    #1.
    for bonus question #1 there is a 25% edge over the game if one uses the exact medium number to assist in determining the hand that holds the highest number after the contents of one hand is shown. there is only a 17% edge over the game if one uses a random number (that is between and inclusive of the bounds of the allowed numbers) as an 'approximate' 'estimate' of a 'substitute' for the exact medium number. there is an 8% difference between the method of using an exact medium as your 'guesstimate' 'guide' and a random number (that is between and inclusive of the bounds of the allowed numbers).
    my question is, is that 8% difference in a sense a measure of the 'significance' of precision with respect to the state of our 'sophisticated' mathematical knowledge with respect to this problem relative to the state of our 'sophomoric' lack of mathematical knowledge with respect to the problem? in other words, is 8% a measure of the edge the use of math and logic has over the use of math-less logic has over such a problem.

    #2
    i had a sense that (but not a certainty) if one were to choose a guide number for the problem before one was shown the 'clue' number in a hand over waiting to choose a guide number after being shown the 'clue' number that one would come up with a much better (more objective, not prejudiced) guide number. true or false?
    if true, is that an indicator that objectivity over possibly biased logic is advantageous for this problem when it comes to choosing a guide number?

  • @sagef,
    #1
    The exact median used as a guide number will give you 75% success only under the following two conditions:
    1. You know, with 100% certainty, the exact median of some distribution that the host uses to generate A and B.
    2. The host’s two numbers A and B are randomly selected out of this distribution, are independent of each other in any given trial, and you know this to be true with 100% certainty.

    If these conditions are satisfied, then yes, the extra 8 odd % success rate can be attributed to the information you have about the host’s selection process. It can also be used as a measure of your knowledge. For example, if you did not know the exact median but only knew that it was somewhere between 5 and 6, your success rate would drop to somewhere between 66.7 and 75%.

    Note that neither of these two conditions are satisfied in the original problem, and under those conditions, the random guide number method is the only game in town, at least on the first try. The random number method is also more robust in another sense: it can give you the possibility of a win on any trial, no matter what the host’s pair of numbers are. This is not true of the median guide number method.

    I would not use the comparison "use of math and logic" vs. "use of math-less logic". Math and logic are used together in both instances.

    #2
    You should not be influenced by the 'clue' number in choosing your guide number. Greg and I wrote to Colin P. on 7/29/2015 in the am about this.

    You should generate your guide number out of your chosen distribution, independent of the 'clue' number. The timing of this makes no difference as long as you can ignore the 'clue' number. If you think you will be tempted to somehow change the guide number based on the clue number, by all means, generate the guide number first.

  • i haven't gone through the question or solution but my initial reaction was 'ok, they choose two integers between 0 and infinity, and put one in each hand'. you choose one hand and look—if its not infinity then its the other hand.

  • @Pradeep Mutalik

    You wrote: "you must want to win and apply the strategy correctly."

    My random selection of a number between 1-10 “without fail” was actually a winning strategy. I had it on good authority that my “opponent” was a fair player, and was limiting her guesses to between 1-100.

    Regarding some other points:

    You stated that you’d simulated the game “100,000 times” and the solution worked, i.e. giving >50% odds. I’d be very interested to know, in a basic way, how that was done.

    1) If the distribution you are simulating is not limited by a range, then is it possible for the simulation you are using to create a number that falls outside this range? This goes back to the point I made earlier, that it is not possible for a computer to have a limited distribution range and yet at the same time, to exceed that range.
    2) Can the guess number G in your simulation fall outside the range of the distribution for the other numbers A and B?

    Knowing more about your simulation might help elucidate some important issues with the problem, and clear up much confusion, as I strongly suspect that the simulation that you may be using is basically not equivalent to the problem given, which involves a PLAYER and a HOST. Two individuals. In the simulation I suspect that you are generating, simultaneously, all three numbers within the same range. And if we were to correlate that case with the initial situation of the game you proposed, that kind of mathematical simulation is not equivalent even theoretically to the PLAYER-HOST problem. What should be apparent is that the automatic inclusion of the guess number G within the A and B distribution is a COMPLETELY different problem than what was initially stated. I believe it is a source of the paradox.

    As I said (post above), the problem as stated does not limit the guess G to a specific distribution of A and B : “1) it is not necessary that they are connected by their ranges, as I mention “collusion” as a problem before, and 2) it is not necessary or required that the player’s G number be within the range of the host. So many of the realists, are conditionally implying that G is somehow connected with numbers A and B.”

    If the distribution you are simulating is encompassing the guess number G, and A and B, this is highly relevant to the point I made earlier regarding “the overlap problem”.
    “1) What is not apparent, is that the solution depends on G and any A or B to be within the same distribution. They either are 100% (p=1) within the same number field or distribution or there is at least a finite non-zero probability (p ≠0) that they will fall within the same distribution or field.
    2) The solution also depends on the fact that the number ranges from which G and A and B are selected , which are themselves, discreet ranges, have a finite probability of overlap.
    If #2 is false, then it’s irrelevant if #1 is true. The two players can be playing in the same field of numbers and never “see” each other, in terms of the probability of G(falling between A and B)..”

    I think the paradoxical nature of this problem lies in the fact that it is undefined or is defined with contradictory assertions. As I suspected earlier, if that is the case, “it is set up to fail.”
    As you said “if you want to fail, or don’t apply the strategy correctly as you seem to imply in the last part of your last comment, then nothing can help you, to paraphrase Schiller.” Contrary to your assertion, it is not I who is aiming to fail. Success will be achieved when we shed a great deal more light on the paradox implicit in this problem.
    Cheers

  • Asymptotically Fading Words

    I thought I’d touch on one more point: if the dynamic median method gives better results than the random guide number method in the bounded cases, and even in my unbounded simulation, is there any advantage in using the latter at all?

    The answer is yes, and this point was addressed by Marcelo pretty early on in the original problem post. For the median or dynamic median to work, the host’s two numbers have to be independent of each other, and the original problem did not have this constraint. If A and B are not independent than the median will not work or will work less efficiently. So the random guide number method is more general than the median one and should be preferred if nothing is known about the host’s numbers.

    Comments will be closed here on Monday Aug. 10, so if there are any last minute comments or questions please send them in quickly.

    @Matthew Kosak
    Your comment again displays a misunderstanding of the winning strategy. There are no overlap problems and no implicit paradoxes in it.

    The distribution of G extends over the entire real number line from -∞ to +∞. Hence it has a finite probability of overlapping any numbers that the host chooses. So your point #2 of “the overlap problem” is always true.

    You asked about my simulations.

    First of all, simulations are completely unnecessary. You can mathematically prove that such a distribution for G can allow it to have a finite probability to be between any A and B whatsoever.

    If the host’s numbers are drawn from a bounded range, the probability of G lying in that range can be exactly calculated because the player knows the exact distribution he or she is using.

    If the host’s numbers are drawn from an unbounded range, by definition, there has to be an overlap with the player’s numbers which are also drawn from an unbounded range.

    That’s all we need to consider.

    Nevertheless, I did do the simulations for the unbounded case because that is the case where some people think that the possibility of G to overlap A and B is nil. I have described my simulations quite completely. The number G was drawn using one distribution, and A and B were drawn using another. Both distributions ranged from -∞ to +∞ (in theory) and at least from about -10^150 to 10^150 in practice. In both cases a random number was picked between 0 and 1, and two separate functions, as described, were used to generate G on the one hand, and A and B on the other, respectively.

    Finally, I would also like to point out the volte-face you did in your last two comments for the case where your strategy was to select a number between 1-10.

    In your comment on 2015/08/03 at 11:52 pm you said “So the host pulls an A and B from the range of numbers, 1000.00 to 9999999999999.00.”

    In your last comment on 2015/08/08 at 1:50 am you completely changed this to “I had it on good authority that my “opponent” was a fair player, and was limiting her guesses to between 1-100.”

    This does not give confidence that you are writing your comments in good faith. It seems as though all you want to do is to raise objections to the solution without making any effort to understand it, in spite of it being patiently explained to you many times. If this is indeed the case, there is nothing left to say.

  • @Pradeep Mutalik and @All

    You write:
    “Your comment again displays a misunderstanding of the winning strategy. There are no overlap problems and no implicit paradoxes in it.”

    This is mostly a theoretical problem (but not entirely) and I believe that I’m addressing its theoretical issues. You have repeatedly framed the problem as though there is in fact a Player and Host, (this was in the initial wording of the problem itself, but also in other relevant descriptions) which I interpret to be, ultimately, two different probability distributions, not one. However, I have repeatedly pointed out, both in my 8.8.15 and 8.2.15 comments-that this is a very different scenario than what the solution presumes to be based upon. The solution, (theoretical or otherwise) assumes a single, Gaussian distribution that must be identical for Player and HOST, a scenario that is fundamentally different than the problem stated.

    I stated earlier in my August 2, 2015 post, what I suspected to be the actual problem of two independent or unknown distributions:
    “So you, the game host, and I, the player set out into the vast world of "real numbers", and even in this vast space you may believe there is a chance that my number G will fall within, your two choices, A and B. Just to make that outcome more 'certain' you make your number span very, very, ..very large. You guess .0009000000000001 for a small number and 9999999999999999999999999900009.000 for your very large number, that on appearances is a large span. How can I miss? Well, I'm not going to write it out, but let's say on this round my guess is .000001 times your smallest range number. That's a million fold miss. Outside the range. You could say, well it’s because you didn't adjust your range properly, or perhaps it wasn't a good shot. We play again but this time I'm sure you're going lower, and you do, you go lower by a trillion times and higher by a trillion times, but I was shooting for the very reasonable , in this case, 1. Your highest number was a trillion times lower, and could not have been higher, as this was the limit of your box.
    These are examples, but we can see that unless there is agreement, between player and host, in theory, the player can always go beyond the host. The only way to prevent this scenario is if there is, again, agreement not to do so.”

    In the game in which both PLAYER and HOST play without knowledge of each other’s guesses or selections, this is an example scenario (August 2, 2015 example I give above) that one can encounter, and there are innumerable variations; such that the probability would be zero of the guess number G landing between the HOST’s A and B. Again, you can see from my example above, that the players are playing “in good faith” and yet because of the way the problem is designed, they have no means of actually knowing where the other player is in the field. It is meant as a theoretical example, just as your simulation is a theoretical example. How can one compute a probability in this case?

    I’m simply giving a scenario based upon the “rules of the game” as you stipulated them initially in your post. But there are many other examples that make it impossible to calculate the probability (G between A and B) and I believe, provide exceptions to the solution. And , regarding your point(s) about my “volte face” or my apparent “lack of good faith” in my comments, my examples of two other game scenarios, are just other examples of games that could be played when “you don’t know how A and B are generated” and your opponent has no information about how you generate your guess number G. She could very well have had a range of 1-100 from which to select her A and B, or a host could use a black box to pull an A and B from the range of numbers, 1000.00 to 9999999999999.00, as I said 2015/08/03 at 11:52 pm. Or my theoretical opponent could select a 1-100 range in one game and use the 1000.00 to 9999999999999.00 in another. But one has no information to tell them that the host must choose a specific range, which will overlap their own, so their actual probability of guessing where that range is, might be 0, (and probably is) and I believe mathematically you’re on thin ice to claim otherwise. Now, unless you actually respond and show otherwise, I don’t believe that brining up cases to support my argument for exception to the solution, impinge upon my “good faith” in my replies here.

    The probability of finding G within A and B in your simulation would be finite, but assumes that the overlap is unity. The problem explicitly does not state that the guess G number must fall within the range of the range for A and B (the HOST’s) so the scenario I provide gives innumerable exceptions.

    You raised a number of theoretical issues, one of which was: “where is the information coming from?” To which I replied, August 2, 2015 @7:52, that the missing information would be the probability (or non-probability) of overlap of the two ranges in any given game (unbeknownst to the player or HOST). I highly doubt that this can be construed as you write: “not making an effort to understand the problem.” You assume a p=1 for this overlap by assuming all numbers, G, A and B must originate from the Gaussian. BUT, it is not because you’ve recognized such a probability exists.

    So before you close the big iron gates on further “issues” regarding the solution, I don’t believe it would be too much to ask how you might theoretically calculate the probability (G falling between A and B) in the scenario above, when there are two independent distributions? Believe me, I am attempting to “make the effort” and I’m aware there is a great deal of “patience” required in attempting to get a scientifically valid point across.

  • @Matthew Kosak
    Again your comment is filled with misunderstandings and incorrect ideas about the solution. Greg and I have pointed out these mistakes to you several times.

    Let us address the theoretical issues, as you say. For real-world considerations refer to the 'What about winning in real life" section of my August 2 post.

    The following statements of yours are INCORRECT, and show a lack of understanding of the solution:

    -"The solution, (theoretical or otherwise) assumes a single, Gaussian distribution that must be identical for Player and HOST.."
    WRONG. The solution only assumes a distribution for the PLAYER. The host's distribution, if any, is unknown and can be anything whatsoever. The player's distribution for G should have a finite probability of selecting a number lying in any interval from -∞ to +∞, of which the Gaussian distribution is one of many different options.

    -"There are many other examples that make it impossible to calculate the probability (G between A and B) and I believe, provide exceptions to the solution.."
    WRONG. It is always possible to calculate this probability since the chosen distribution is fully specified by the player.

    As an example, suppose I am the player and my distribution is a Gaussian with a mean of 0 and an SD of 10^15:

    For A = 1 and a B = 100, the probability of G lying between A and B (per Wolfram Alpha) is 3.95 x 10^-149
    For A = 1000 and B = 9999999999999.00, the probability of G lying between A and B is 0.00399

    Note that both numbers are positive, and will give a greater than 50% chance for the player to win.
    This is true of any two real numbers generated by the host.

    CASE CLOSED.

Comments are closed.