Friday, October 21, 2005

Attention Math Geeks

As most of you probably know, I'm very much into odds and handicapping and the like. I got into an argument with a random stranger about the Pick 3 lottery on the bus, and it was very maddening because I found that while I am absolutely convinved that he is wrong and I am right, I cannot actually demonstrate this because I lack the probability background.

I won't describe the argument itself for fear of prejudicing you, but here's the question:

If I run the "Pick 3" experiment over and over again, eventually I will observe every possible outcome from 000 to 999. The question is, how many times, on average, should I expect to have to run the experiment before I have observed all those results? Or, to put it a different way, what is the percentage chance that I will achieve all 1000 results before I have run the experiment 10,000 times?

I can actually come up with a pretty close approximation of the answer to this question by writing a program that defines an array with 1000 entries in it with all of the entries set to 1 initially, and whenever the test returns a number, the corresponding spot in the array is set to zero. The program terminates and reports on its results once the sum of all the numbers in the array is zero.

So I'm actually less interested in the actual answer to the question as the "how" of answering the question. I need to know the formula that an actual probablity person would use to determine this algebraically.

At issue is the correctness of my guiding maxim as a gambler, that no betting strategy can give a series of negative expectation bets an overall positive expectation.

16 comments:

Anonymous said...

Interesting question. Here are some experimental results. For the below output, (generated by a quickie perl script I hacked together) the "base" is your lotto number (for example, '1000' in the case of pick 3)

On average, in order to generate every possibility in pick 3, you'll need to play about 7,500 times. The probability that you would have to play more than 17,000 times in order to get every combination is around 1 in 10,000. (See below)

Note how the "average base multiple" (see the output "Average of X tries, which is Y times the base) increases with the size of the base. Very interesting. So if was pick 2 (with say 100 combinations) instead of pick 3, on average about 520 attempts would be necessary to get all combinations, which is only 5 times the base of 100. For pick 3, you need 7 times the base.

Notice how the larger the number gets, the faster the multiple begins to grow. This might be a tip-off that it's a harmonic series.

Source code available for the script upon request. Pasting it here is a little too ugly.

=========== BASE 2 =====================
After 10000 attempts, here are the results:
In order to generate 2 unique numbers randomly, you need:
Average of 2.98 tries, which is 1.49 times the base.
The minimum number of steps was 2 (1.00 times the base) the maximum 15 (7.50 times the base)
=========== BASE 3 =====================
After 10000 attempts, here are the results:
In order to generate 3 unique numbers randomly, you need:
Average of 5.53 tries, which is 1.84 times the base.
The minimum number of steps was 3 (1.00 times the base) the maximum 30 (10.00 times the base)
=========== BASE 4 =====================
After 10000 attempts, here are the results:
In order to generate 4 unique numbers randomly, you need:
Average of 8.31 tries, which is 2.08 times the base.
The minimum number of steps was 4 (1.00 times the base) the maximum 43 (10.75 times the base)
=========== BASE 5 =====================
After 10000 attempts, here are the results:
In order to generate 5 unique numbers randomly, you need:
Average of 11.41 tries, which is 2.28 times the base.
The minimum number of steps was 5 (1.00 times the base) the maximum 50 (10.00 times the base)
=========== BASE 6 ===============
After 10000 attempts, here are the results:
In order to generate 6 unique numbers randomly, you need:
Average of 14.75 tries, which is 2.46 times the base.
The minimum number of steps was 6 (1.00 times the base) the maximum 62 (10.33 times the base)
=========== BASE 7 ===============
After 10000 attempts, here are the results:
In order to generate 7 unique numbers randomly, you need:
Average of 18.19 tries, which is 2.60 times the base.
The minimum number of steps was 7 (1.00 times the base) the maximum 76 (10.86 times the base)
=========== BASE 8 ===============
After 10000 attempts, here are the results:
In order to generate 8 unique numbers randomly, you need:
Average of 21.69 tries, which is 2.71 times the base.
The minimum number of steps was 8 (1.00 times the base) the maximum 80 (10.00 times the base)
=========== BASE 9 ===============
After 10000 attempts, here are the results:
In order to generate 9 unique numbers randomly, you need:
Average of 25.42 tries, which is 2.82 times the base.
The minimum number of steps was 9 (1.00 times the base) the maximum 96 (10.67 times the base)
=========== BASE 10 ===============
After 10000 attempts, here are the results:
In order to generate 10 unique numbers randomly, you need:
Average of 29.28 tries, which is 2.93 times the base.
The minimum number of steps was 10 (1.00 times the base) the maximum 142 (14.20 times the base)
=========== BASE 11 ===============
After 10000 attempts, here are the results:
In order to generate 11 unique numbers randomly, you need:
Average of 33.24 tries, which is 3.02 times the base.
The minimum number of steps was 11 (1.00 times the base) the maximum 142 (12.91 times the base)
=========== BASE 12 ===============
After 10000 attempts, here are the results:
In order to generate 12 unique numbers randomly, you need:
Average of 37.34 tries, which is 3.11 times the base.
The minimum number of steps was 12 (1.00 times the base) the maximum 148 (12.33 times the base)
=========== BASE 100 ===============
After 10000 attempts, here are the results:
In order to generate 100 unique numbers randomly, you need:
Average of 518.85 tries, which is 5.19 times the base.
The minimum number of steps was 253 (2.53 times the base) the maximum 1298 (12.98 times the base)
=========== BASE 500 ===============
After 10000 attempts, here are the results:
In order to generate 500 unique numbers randomly, you need:
Average of 3391.39 tries, which is 6.78 times the base.
The minimum number of steps was 2053 (4.11 times the base) the maximum 7764 (15.53 times the base)
=========== BASE 1000 ==============
After 10000 attempts, here are the results:
In order to generate 1000 unique numbers randomly, you need:
Average of 7511.47 tries, which is 7.51 times the base.
The minimum number of steps was 4523 (4.52 times the base) the maximum 17018 (17.02 times the base)

heatkernel said...

Exact answer:
Prob(m,N)=(Sum over q from q=0 to q=m of (-1)^q C(m,q)(m-q)^N)/m^N.
(notation explained below)

I use the term "outcome" for each of the possibilities of 1 pick (0 to 999 in the original problem), and "trial" for each of the possibilities of picking N times (picking 10000 times in your example).

Assume that all the outcomes of a single trial are equally likely. Obviously it changes the answer if they are not.

To simplify, change the problem by assuming that there are only 2 outcomes, "0" and "1" instead of 1000 outcomes, "000" through "999".

The probability Prob(2,N) of seeing all the outcomes by the first N trials is

Prob(2,N)=1-Probability(NOT seeing all the outcomes by the Nth trial)

The probability of not seeing all the outcomes is the probability that 1) all the outcomes will be "0" or 2) all the outcomes will be "1".

For each of 1) and 2), the probability Prob(2,N)is 2^{-N}.

Therefore,
Prob(2,N)=1-(Prob(1)+Prob(2))=1-2x2^{-N}
=1-2^{-N+1}.
We can also write
Prob(2,N)=(2^N-2x1^N)2^N

Now, scale up from 2 outcomes to 3 outcomes, which introduces a new wrinkle. The outcomes are "0", "1", "2", say. Again, we have

Prob(3,N)=1-Prob(NOT seeing all the outcomes by the Nth trial)

to put it another way

Prob(3,N)=1-(Number of possible trials in which at least one outcome fails to appear)/(Number of possible trials)

We have to count the total number of trials in which at least one of the outcomes never appears. The number of trials in which "0", "1" or "2" never appears is 2^N. However, "3x2^N" overcounts the total number of trials in which a single outcome doesn't appear. The reason is that we have counted twice the number of trials in which _2_ outcomes fail to appear. Therefore, we have to add back in the number of trials in which _2_ outcomes fail to appear. There are 3 outcomes in which 2 outcomes fail to appear, because a trial in which 2 outcomes fail to appear is of the form "0000..." "1111..." or "2222...."

So we obtain for the answer
Prob(3,N)=1-(3x2^N-3)/3^N
or, to put it another, more informative way
Prob(3,N)=(3^N-3x2^N+3x1^N)/3^N

The technique is called "inclusion-exclusion", because we alternately include too much, exclude to much, and so on, until we reach the correct answer.

Now, scale up from 3 outcomes to m outcomes. (In the original problem, m=1000). We have to introduce the notation C(m,q), "m choose q" for the number of ways of choosing q objects from a set of m objects. We have
C(m,q)=m!/(m-q)!q!.

Note that C(m,0)=1, by convention.

Again, we have

Prob(m,N)=1-Prob(not seeing all the outcomes by the Nth trial)

at the qth stage (q goes from 0 to m-1) of the inclusion/exclusion argument, we alternately add/subtract C(m,q) times (m-q)^N outcomes. The reason is that we are counting the number of trials in which q outcomes fail to appear and there are C(m,q) ways of choosing these q outcomes. Given the choice of q outcomes there are m-q outcomes which may appear at each of the N steps of the trial. We end up with

Prob(N,m)=(Sum over q from q=0 to q=m of (-1)^q C(m,q)(m-q)^N)/m^N.

You can easily write a program that will evaluate the sum in the numerator for any given values of N,m. In the special case N=m, there is a closed-form for the sum, namely,

(Sum over q from q=0 to q=m of (-1)^q C(m,q)(m-q)^m)=m!.

So we get
Prob(m,m)=m!/m^m.
This makes sense because, in the special case m=N, the number of trials in which all outcomes appear is the number of permutations of m objects, which is m!. There are m^m total trials, giving the closed form answer by another method.

There may be a closed form for Prob(m,N), but I do not know it.

heatkernel said...

erratum:
"There are 3 outcomes in which 2 outcomes fail to appear, "

should read

"There are 3 _trials_ in which 2 outcomes fail to appear, "

heatkernel said...

On second thought, I wouldn't recommend using the exact answer for any m and N as large as 1000 and 10,000. The computer is probably going to store the intermediate results in such a way that the errors will result in your getting meaningless answers in the end. Maybe a better thing to do is make a graph of P(m,N) for m taking small values (from 1 to 10 say) and N going from 1 to 10m. Some basic pattern should emerge.

Then there's the possibility that the sum in the exact answer P(m,N) does have a closed form which I just don't recognize. A combinatoricist would be best equipped to deal with that possibility, and I know one I can ask.

Unknown said...

Interesting. I'll comb through this when I have more time. Thanks.

Anonymous said...

You may have to adjust your definition of "average", but I
think the number you seek is presented here:

www.saliu.com/Saliu2.htm#Table

It appears what you want is 692.

heatkernel said...

This Saliu guy seems like a crackpot. Whatever conclusion he is trying to support, it is incomprehensible to me.

It appears, that in this document, moreover, he is considering a different problem from the one you posed. What he is considering is the chance that a given pick in the lottery, say xyz, for example, where x,y,z are digits from 0 to 9, will appear in the first N drawings. This is a much easier problem to solve. The probability that xyz will not appear in a trial of N drawings is
1-(total number of trials in which xyz fails to appear)/(total number of trials).
The total number of trials (the denominator in the above formula) is just 1000^N. The reasoning is that at each drawing there are 1000 outcomes, and two trials are different precisely when the outcome of the nth drawing differ for n<=N. The total number of trials in which xyz fails to appear, by similar reasoning, is 999^N. So, the probability of seeing xyz in the trial of N drawings is
1-(999/1000)^N=(1000^N-999^N)/1000^N
You can easily solve for N to figure out the number of drawings necessary to obtain xyz with any desired probability.

Unknown said...

It's very interesting that you found this page, because this guy is actually dealing with the very question that led to the argument on the bus. The webpage is advancing an argument similar to the one my acquaintance on the bus was advancing.

Heatkernel, I imagine you are probably not a gambler as to a non-gambler this is going to seem like gibberish (which it mostly is.)

But what this guy is trying to describe and confirm is the perception among some gamblers that a series of negative expectation bets can actually have an overall positive expectation. Degenerate gamblers essentially need this belief in order to rationalize why they should continue betting even though it is ruining their lives. They refer to whatever magical explanation they come up with as their "betting system."

It is my belief, though I have never been able to prove it in the general case, that this is not true. What guys like this generally do is throw up some series of charts and statistics and then sort of claim at the end that "Voila!" they've come up with a way to bet a sow's ear getting three and a half points at home, and win a silk purse.

The chart is, in a weird way, sound, but the conclusions that he is trying to get you to draw from the chart are not. Take it back to Blackjack for a second and see what I mean. Let's say you're playing roulette and you have a 1 in 32 chance of winning each time.

After 21 trials, there is a 50% chance that you've won. So he's asserting the break even point for a payout is 21:1. And you can sort of follow the thinking. But it's wrong.

The reason this is enticing as a gambler is that if I look at this I can tell myself that I'm going to go play roulette until I win once.

If I win after 1 spin, I'm up 32 bets. After 2 spins, 31, etc. until we get to the point where I've spun 32 times and still haven't won. Well, most of the time, well over half the time, I'm never going to get there. So 62% of the time or whatever, I win money.

Trouble is, the other 38% of the time you've lost more money than you're ever going to win. So the overall expectation is still (in this fake case) zero. In the casino of course it's negative.

heatkernel said...

I think I finally understand what you are after. In order to show that you are right, you don't need to answer the rather difficult question you posed in your original post. All you need to do is compute the total number of trials where a given outcome (say 0) occurs at the nth place, multiply that number by your net win/loss if 0 occurs at the nth place, sum those projects from n=1 to N (N is 32 in your example) and divide by the total number of outcomes. If you do this you will obtain a series which evaluates to a closed form. Anyway, I know this is a bit vague, but I'm too tired to see straight at the moment. Tomorrow I'll post the formula.

heatkernel said...

I mean, "products" not "projects". I hate how you can't edit these posts.

Don Q. said...

I still haven't given up on your original question which I think is pretty interesting, but re:betting systems ...

Why is it that gamblers have this belief that there is a failsafe betting system? I have heard of only one that makes any mathematical sense, and it requires a huge bankroll. It is not a guaranteed system, but you will most likely win if your bankroll holds out.

It goes something like this. Take something that is slightly worse than a 50-50 bet -- betting on red in roulette. You have a 47.36% of hitting and a 2 to 1 payout. Play this game long enough and you will notice that you are losing 2.63% of your money. BUT, you can beat the odds if you double your bet after each loss and do so until you win, you will eventually win (unless you don't).

For example, I lay down a $1 bet on red. It loses. Next time I bet $2. It loses. Next time I bet $4. It loses. Next time, I bet $8, and it wins. Net winnings ($8) - net losings ($7) = $1. This system will work so long as you don't break your bankroll, but that's a pretty big if.

Don Q. said...

Further clarification.

I am fully aware that my "betting system" does not work. If I had a bankroll of say $1,048,576 (2^20), I could make 20 consecutive doubling bets. The chances of losing 20 red-black bets in a a row is (20/38)^20 --> approximately, 376,000 to 1. Very unlikely. If I was willing to put all that glamour in play, I would likely win my $1.

Worth noting though is that I would likely lose my entire banroll if I kept playing my betting game until I won 1 million dollars. If I play the game 1 million times, my 1 in 376,000 chance will show up about 2.67 times.

heatkernel said...

Two parts to this post. The first part is to prove that you are clearly right to say that a sequence of negative expectation bets must have negative expectation, provided the individual bets are independent. This is simply because the expectation of several bets on independent events is the sum of the expectations on the individual events. Therefore, the expectation of the sequence of bets with negative expectation is the sum of negative numbers, hence the expectation of the sequence is negative.

The second part is to deal with the specific example of a lottery or roulette game you brought up. The difficulty here is that the bets are not independent, so the above reasoning doesn't apply. The reason is that the bets you place depend on the outcome of previous bets. If you win in the first roulette spin, you stop playing, and therefore, the expectation of the 31 remaining bets becomes 0. In general, if you win the nth bet, the expectation of the each of the remaining 32-n bets becomes 0.

It is, however, possible to calculate the overall expectation as follows. Let

a trial be the action taken to determine whether you win/lose one bet (spinning the wheel or picking the lotto number from 000 to 999

an experiment be a sequence of trials of fixed length

m be the the total number of outcomes of a trial (m=32 in roulette example, 1000 in the lottery)

N be the number of trials (N=32 in your roulette example, 10,000 in the lottery example) in an experiment.

n denotes an integer ranging from 1 to N, inclusive.

D be the number of dollars you receive when you win a trial. We may assume that you pay $1 when you lose a trial. (D=21 in your roulette example).

0) We make the simplifying assumptions that regardless of when you win the first bet, you keep taking bets until you have bet N times, even though you do not receive or pay any money for the bets after your first win. Also, we assume that you consistently bet on the same outcome, say 1 on the roulette wheel, or 000 in the lottery. Neither assumption has any effect on your expectation.

1) The total number of possible experiments is m^N, each having equal probability.

2) The number of possible experiments in which you win first at the nth trial is
(m-1)^(n-1) x m^(N-n)
The reason is that we are assuming that the outcome you always bet on first appears at the nth bet. That leaves m-1 outcomes for each of the n-1 preceding trials, and m outcomes for each of the N-n subsequent trials.
The number of possible experiments in which you never win is (m-1)^N

3) The aggregate payoff from the experiments in which you first win at the nth trial is
(D-n+1) x (m-1)^(n-1) x m^(N-n)
We obtain this by multliplying the result of 2), which is the number of experiments in which you first win at the nth trial, by payoff from any such experiment, which is clearly D-n+1.
The aggregate payoff from the experiment in which you never win is (-N)(m-1)^N.


4) By 1) and 3), the expected payoff is
(-N)(m-1/m)^N + Sum from n=1 to N of (D-n+1) x (m-1)^(n-1) x m^(N-n)/m^N=
1/m x Sum from n=1 to N of x (D-n+1) x (m-1/m)^(n-1).
which we obtain by summing 3 from n=1 to N and dividing by the total number of trials.

The sum in 4) splits up into two sums, each of which has a closed form. Both sums are taken from over n from 1 to N. The first sum (with the constants taken out in front) is

5) (1/m)(D+1) (Sum from n=1 to N of (m-1/m)^(n-1).)

The second is
6) -(1/m) (Sum from n=1 to N of n (m-1/m)^(n-1)).

The sum in 5) is the sum of a truncated geometric series.
It has a closed form given by substiting x=(m-1/m) into the formal identity
1+x+...x^(N-1)=(1-x^N)/(1-x).

Therefore 5) is
(1/m)(D+1) (1-(m-1/m)^(N)/(1-(m-1/m))).
The denominator is just 1/m, so 5) is more simply written as
(D+1)(1-(m-1/m)^N).

The sum in 6) is the "derivative" of a geometric series. More precisely, let x be an indeterminate. Then
1+x+...x^N=(1-x^(N+1))/(1-x).
Differentiating both sides formally with respect to x, we obtain
1+2x+3x^2+...+Nx^(N-1)=
(Nx^(N+1)-(N+1)x^N+1)/(1-x)^2.
Now the sum in 6) is the sum on the left-hand side of this equality, with x=(m-1/m). So 6) is
-(1/m)(N(m-1/m)^(N+1)-
(N+1)(m-1/m)^N)+1)/(1-(m-1/m)))^2.
The denominator of the last expression is (1/m)^2. This cancels the 1/m factor in the numerator and leaves m in its stead. The last expression simplifies to
(m-1/m)^N (N+m)-m


The sum of 5) and 6) therefore
(D+1)(1-(m-1/m)^N) +(m-1/m)^N (N+m)-m=
D - D(m-1/m)^N + 1 - (m-1/m)^N +(m-1/m)^N (N+m)-m=
D+1-m + [(m-1/m)^N][N+m-(D+1)] (7)

Therefore, the expected payoff is the (-N)(m-1/m)^N (the contribution from the experiments in which you never win) plus (7), so
D+1-m + [(m-1/m)^N][m-(D+1)]=
(D+1-m)(1-(m-1/m)^N). (8)

Now, suppose N=m, as in the roulette example you described in your last post.
Then the expected winning from an experiment is obtained by setting N=m in (8).
D+1-m + [(m-1/m)^m][m-(D+1)]=
(D+1-m)(1-(m-1/m)^m) (8)

Note that log [(m-1)/m]^m is m (log m-1 -log m). By the mean
value theorem, log (m-1)-log m equals - the derivative of the log function at some value between m-1 and m. The derivative of the log function is 1/x, so we can estimate log (m-1)-log m all but the first few values of m
as -1/m. Therefore, log[(m-1)/m]^m is approximately -1, and [(m-1/m)^m] is approximately 1/e,
which is approximately 0.367. So (8) is approximately
(D+1-m)(1-.367)=(D+1-m)(0.633).

For example, if in the roulette example (m=32) you set the ratio of payoff for a win to payoff for a loss to 21, so that D=21, then the expected payoff is approximately
(21+1-32)(0.633)=(-10)(.633)=-6.33.

It may be somewhat surprising that the break-even point is not D=m, but D=m-1. The reason for this is that you stop after the mth loss, thereby avoiding further bets which would be expected to lose you even more money. At least for the case m=2 (coin toss), it is easy to see that the break-even point (value of D with 0 expectation) is D=1, because there is a 1/2 probability that the first coin toss will come up heads (you are betting heads, lets say), in which case you win $1 and a 1/4 probability that both the first and second coin tosses will come up tails, in which case you lose $2. (In the case when the first toss is tails and the second heads, you make $0.) So your expected gain with D=1 is 1/2x1-1/4x2=0.

Anonymous said...

Hi Patrick, guess what, I just saw a Live Roulette to test our systems and about download game roulette.
It has been a long time waiting for download game roulette.
Test your systems with a real roulette wheel and it's FREE ! Bye, Larry

Anonymous said...

Hi,

I've got a roulette site covering winning strategies with some original articles related to casino roulette that your visitors may be interested in - here's a sample article:

Roulette Strategies - The 10 Commandments for Bigger Profits!
You will see many roulette strategies on the net. Here we have gathered the 10 most important strategies for playing roulette and maximising your ...

casino roulette


Thanks

Anonymous said...

Hi Adam P. Short, A real enlightening blog. Don't stop now. No better time than now to stop winning at roulette No more winning at roulette
Raymondwinning at roulette