Number abundance estimator

Tjips · Post by **Tjips** » Tue Oct 26, 2010 3:18 pm

I don't know whether this has been derived before, but I've derived it now, so here it is.

I derived an (technically 3) expression which gives the ensemble probability that a square is a specific number. What I mean by ensemble probability is that if you where to generate a large amount of boards and look at what number appeared in a specific square, this expression would give the abundances of the different numbers on that square in the ensemble. Basically it tells you the probability that the square would contain the number x if you chose a board at random and looked in that square.

The expression looks like this:

P_a(n) = [a choose n]*[d^n]*[(1-d)^(a-n+1)]

where a=3 for corner squares, a=5 for edge squares and a=8 for the rest of the board; d is the mine density (#mines/area); and n is the number in question (n in 0..8).

When applied to an exp board and compared to the observed number abundances from a run of 10 000 000 random boards it shows very good agreement with the data. It is only off by 19% when used to predict the number of boards with an 8 on them (observed: 8316/10000000, predicted: 10189/10000000). Exactly how well this fits is debatable, but imo it is a good approximation.

How was this expression derived?
Quite simply actually. It is just:

P(n) = [probability square is not a mine]*[probability that only n squares around the square are a mine]*[number of ways to put n mines down around the square]

The terms are shuffled a bit in getting from this simple logic to the expression, but you can easily see the parts represented in the above expression.

How well does it predict the different numbers?
Well, on comparison with a single run one can get some idea of how well it works.

For the different values of n it estimates as follows: (middel, edge, corner)
0: 2% over, 0.8% over, 0.4% over
1: 0.2% under, 0.5% under, 0.5% under
2: 1% under, 0,5% under, 0.04% under
3: 0.5% under, 0.7% over, 2% over
4: 1.2% over, 3.2% over
5: 4.2% over, 6.8% over
6: 8.3% over
7: 13.4% over
8: 18.4% over

Are there really only these three groups? Middle, edge, and corner?
Yes, according to the data... Take into account that the statistical significance drops significantly for the higher numbers. 8000/480 gives values only order 10 for the individual squares...

How does the abundances look for these 3 groups?
Attached... (the red diamonds peaking out in places are the observed values

)

What now?
Well, I would like to somehow go from this to something related to 3bv and the like, but so far I'm not having much luck. It seems that the independence assumption falls apart badly when trying to predict the "most likely contribution" to [3bv-opening]. Taking an approach assuming independence gives a result of 101 for the most likely [3bv-openings]. Taking a more involved approach bringing in some dependence between the squares surrounding the square in question gives a slightly more pleasing result of 138. This is still quite a bit removed from the actual value, which is more like 160... Any ideas would be good, coz I'm not really all that experienced with this sort of statistical reasonings.

P.S. Something interesting I discovered, but didn't know, is that if a square 1 away from a side is a 0, then the square on the side must also be a 0. Not mind blowing, but interesting.