We are working on these projects in which people either receive an email with an invitation or they register on a web site or they simply play and expect an instant result. What is the best way to compute the chance of winning?
Case 1: People register then, at a certain date, an extraction of prizes takes place. Ah, this is the best and simple situation. You have the list of players, the list of prizes. Just do an index = Random(0,number_of_players) and if index is smaller than number_of_prizes, you know the person won a prize. A second random on the number of prizes will determine the prize itself. The win probability is
number_of_prizes/number_of_players.
Case 2: People enter a site that allows them to play instantly and win a prize. Here the problem is trickier. While the algorithm is basically the same, prizes over people, you don't have the total number of players. There are some subcases here based on the following parameters:
- The win campaign lasts for a certain amount of time or is indefinite
- The prizes must all be given at the end of the campaign or not
- The players play after they have received an invitation (email, sms, etc) or just randomly coming from ad clicks or for some information on the site
.
Let's assume that the campaign doesn't last for a finite time. The only solution is to pick a win probability and be done with it until you remain out of prizes. You always compute this by considering the number of people playing over a period of time, in other words the speed of people playing. However, in this case the only thing influencing the selected probability is a psychological one: how many people would need to win in order to have a marketing effect?
Now, if the campaign does have a finite time, you would use the speed of the people playing to determine the total number of people that would play. Let's assume you know people are visiting your site at an average rate of 1000 per hour, then you see how many are playing and you remember this percentage, so you can estimate the number of players per hour, then you just multiply that number to the total number of hours in the campaign. Again, we get to the prizes over people formula.
However,
it is very important to know how people are deciding the participate in the extraction! If it is just a game added to one's site, then the people are coming and going based on that site's popularity and hourly/daily distribution (since the number of visitors fluctuates). So just computing this from the first hour of people coming to the site and playing doesn't help, but it might when using the first day and maybe the first week. One week of statistical data is best to estimate the number of people over time. Then the formula is
number_of_prizes_available_per_week/people_visiting_per_week. Where the number of prizes available per week is either the total number of prizes over the finite campaign time or an arbitrary number chosen by the campaign creator.
If, instead, people are being invited to play, as following an email promotion campaign, let's say, then they will come as soon as they read their email. That means they will flock to your site in the first hours, then just trickle in the next week, then nothing. That means that estimating the total number of players from the first hour or day is not really feasible unless you are certain of a statistical distribution of people playing games after email campaigns. It is difficult as different messages and designs and game types might attract more or less people.
A mixed hybrid can also exist, with a game on a site that also people are invited to play over email. Then all the parameters from above must be used. In any case,
the best estimation I can think of comes from the total of players in similar campaigns. The more similar the better.
But what if ALL the prizes must be given to people, as required by law or simple common sense (so as not to be seen as keeping some for you or your friends)? Then one can adjust the probability rate to suit the extraction speed. The same prizes over people formula is used, but only on the remaning values. The probability of winning is given by
number_of_remaining_prizes/number_of_remaining_people.
But that has some disadvantages. If the number of total participating people is badly estimated it will result into a roller coaster of probabilities. People playing in the first part of the campaign would be either advantaged or disadvantaged than the people in the last part as the total number of players is being adjusted over time to compensate for the first bad estimation.
Let's do a small example:
| Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | Day 6 | Day 7 |
---|
People playing | 7500 | 1000 | 400 | 300 | 200 | 300 | 100 |
Percentage | 75% | 10% | 4% | 3% | 2% | 3% | 1% |
Estimated total players | 15000 | 25000 | 12000 | 11000 | 10000 | 10000 | 10000 |
Estimated remaining players | 7500 | 16500 | 3100 | 1800 | 600 | 200 | 100 |
Remaining prizes (day start) | 100 | 50 | 47 | 42 | 36 | 27 | 13 |
Win probability | 0.66% | 0.30% | 1.52% | 2.33% | 6.00% | 13.50% | 13.00% |
As you can see, the people playing first were screwed pretty much, because it was expected the total players to be 15000 and the distribution closer to linear. After half of them played in the first day, panic made them all increase the expected players to 25000, while thinking what to do. Then they realised that the distribution of players is affected by the fact that all play after reading their emails and then they will probably not come play anymore. They adjust the win probability every day and as you can see, it is good to play in the last days.
But what would have happened if 1) they knew the percentual distribution of players would be 75,10,4,3,2,3,1 after an email campaign and 2) the total number of players will be a percentage out of all emails sent and so they estimated 10000 people playing and the right distribution?
| Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | Day 6 | Day 7 |
---|
People playing | 7500 | 1000 | 400 | 300 | 200 | 300 | 100 |
Percentage | 75% | 10% | 4% | 3% | 2% | 3% | 1% |
Estimated total players | 10000 | 10000 | 10000 | 10000 | 10000 | 10000 | 10000 |
Estimated remaining players | 2500 | 1500 | 1100 | 800 | 600 | 300 | 100 |
Remaining prizes (day start) | 100 | 25 | 15 | 11 | 8 | 6 | 3 |
Win probability | 1.00% | 1.00% | 1.00% | 1.00% | 1.00% | 1.00% | 1.00% |
Even if computing every day the number of remaining prizes over the remaining players, the probability was constantly 1%. Of course, one could say "Why didn't they stick to their 0.66% probability and be done with it? Like this:
| Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | Day 6 | Day 7 |
---|
People playing | 7500 | 1000 | 400 | 300 | 200 | 300 | 100 |
Percentage | 75% | 10% | 4% | 3% | 2% | 3% | 1% |
Estimated total players | Not important |
Estimated remaining players | Not important |
Remaining prizes (day start) | 100 | 43 | 40 | 38 | 37 | 35 | 34 |
Win probability | 0.66% | 0.66% | 0.66% | 0.66% | 0.66% | 0.66% | 0.66% |
Everything is perfectly honest, only that they remained with a third of prices on hand. Now they have to give them to charity and be suspected of doing this on purpose for whatever distant relative that works at that charity.
Well, think about it, and let me know what you think. Are there smarter solutions? Is there a web repository of statistical data for things like that?