new benchmark: 3bv for flaggers

Elmar · Post by **Elmar** » Thu Jul 30, 2009 9:56 pm

I think it's time us flaggers fought back and find a suitable replacement for 3bv.

That's why I suggest we discuss how we can create a benchmark for the number of clicks a flagger would need to solve a given board.
Unfortunately this is quite a complex problem. It has characteristics of the Travelling Saleman Problem, i.e. it is NP-complete. Which basically means it could potentially take very long to find the optimal solution.

Therefore, I believe we will have to resort to heuristics. This means simple, fast algorithms that will find a good solution but not necessarily the optimum.

I have discussed this problem briefly in the past with Christoph Nikolaus and we came up with some algs, that I will post here.
The end result should be a fast algorithm that calculates an intuitive new board benchmark. I hope somebody will be able to implement it and ultimately it would hopefully be included in the clone.

Please feel free to discuss, comment, improve or post your own approaches.

Elmar · Post by **Elmar** » Thu Jul 30, 2009 10:02 pm

Greedy ZiNi
(assumes UPK)

1. calculate premium for each cell (open or closed) not containing a mine:
premium= [adjacent 3bv] - [adjacent unflagged mines] - 1_[if cell is closed] - 1
determine the benifit of performing a double click over a left click.
'1' represents the double click assuming 1.5 technique

2. find the (top-left most) cell with the highest premium
a. if premium non-negative perform solve:
ZiNi=ZiNi+[adjacent unflagged mines] +1
a premium of 0 might add evenually benificial flags, which still means a benifit over a left click
b. if premium is negative open top left most cell
ZiNi=ZiNi+1
top-left-most to be unambigous

3. if closed cells remain start from 1.

Pros:

first manual test gave quite good improvement over 3bv (126 3bv-> 94 ZiNi)
can possibly find optimal solve
simple, fast

Cons:

depends heavily on UPK, not comparable to actual solve
other cells with same or lower premium might sometimes lead to better solve
will not necessarily find optimum

Elmar · Post by **Elmar** » Thu Jul 30, 2009 10:02 pm

Random ZiNi
(assumes UPK)

this is basically an extension to the Greedy ZiNi that uses multiple random attempts to find a better solve

(assumes UPK)

1. calculate premium for each cell (open or closed) not containing a mine:
premium= [adjacent 3bv] - [adjacent unflagged mines] - 1_[if cell is closed] - 1
determine the benifit of performing a double click over a left click.
'1' represents the double click assuming 1.5 technique

2. randomly select a cell with the highest premium
a. if premium non-negative perform solve:
ZiNi=ZiNi+[adjacent unflagged mines] +1
a premium of 0 might add evenually benificial flags, which still means a benifit over a left click
b. if premium is negative open random cell
ZiNi=ZiNi+1
top-left-most to be unambigous

3. if closed cells remain start from 1.

4. if ZiNi has improved during last n runs start again.

Remarks:

instead of ramdomness selection of cell could also be based on lowest adjacent premium
number of runs could be reduced by exploiting different trees for isolated islands seperately
selection could include n cells with highest premium (including lower premia)

Pros:

can possibly find optimal solve

Cons:

depends heavily on UPK, not comparable to actual solve
will not necessarily find optimum
possibly slow

Elmar · Post by **Elmar** » Thu Jul 30, 2009 10:04 pm

Human ZiNi
(assumes UPK)

1. open all openings
ZiNi= [number of openings]
to be unambigous regarding the starting point of the solve

2. calculate premium for each open cell:
premium= [adjacent 3bv] - [adjacent unflagged mines] - 1
determine the benifit of performing a double click over a left click.
'1' represents the double click assuming 1.5 technique

3. find the (top-left most) cell with the highest premium
a. if premium non-negative perform solve:
ZiNi=ZiNi+[adjacent unflagged mines] +1
a premium of 0 might add evenually benificial flags, which still means a benifit over a left click
b. if premium is negative open top left most cell
ZiNi=ZiNi+1
top-left-most to be unambigous

4. if closed cells remain start from 2.

Pros:

simple, fast
humanlike approach to the board hence comparable with actual solve

Cons:

very unlikely to find the optimum

Tommy · Post by **Tommy** » Sat Aug 01, 2009 1:10 pm

Hi,
good ideas there!
I've been thinking about that a couple of months ago, but didn't have time.
An idea:
Does this stat really have to be unambiguous with regard to the starting point of the solve? If so, we aren't realistically going to find an alg that doesn't use UPK.
With incomplete information at the start, we are going to have to get the information "along the way". So, the result the algorithm produces is influenced by the "path" taken, and so also whatever we take as a "starting point" (in quotes because "path" and "starting point" need not refer to mouse path or a position on the field).

We need to define what exactly the algorithm is supposed to do.
What runtime is acceptable? Is this something that is calculated
- during a game
- afterwards
or
- afterwards if the player is interested?

In the first case, we are talking something that will not consume significantly more resources than, say, 3bv.
In the second case, an overhead of maybe half a second is acceptable (on slow machines).
In the third, probably at maximum 10-30 seconds (again, on slow machines).
(on expert)

Something else I thought about is preprocessing the board somehow, and thinking in terms of 3bv clusters to reduce the amount of things that need to be processed. Basically that would divide the problem into a couple of smaller subproblems (the individual clusters) that could be solved in an acceptable amount of time.

Tjips · Post by **Tjips** » Sun Aug 02, 2009 5:33 pm

Ah, that old chestnut: How do we gauge the difficulty of a board?

Well this thread got me thinking about it once agian, and I must say that I feel more qualified to answer this question now, than the previous times I dabbled with it. Just so that you guys understand the lingo in which I think (so I don't look like I'm babbling

), I'm just gonna state the way in which I view the problem.

The idea of "difficulty" is slightly missunderstood. A board with a specific 3BV for instance isn't really any more difficult to solve than a board with some other 3BV. The word "difficulty", as we use it, actually refers to how difficult it is to get a fast time on a board. And this is what leads me to restate the problem for myself as follows:

How much work (i.e. how many clicks) is there to be done on a specific board?

The basic idea being that one can complete less work in a faster time. Now for non-flaggers this is easy to answer. The work to be done is simply the 3BV of the board. But for flaggers it is a different story. If we where to look at any board from a flaggers perspective we would realise that the amount of work you do in solving a board depends on the order in which you do all the little jobs around the board. Thus, the question of how much work there is to be done on a board then becomes someting wholely different. We are now forced to ask the ultimate question in minesweeper: "What is the most efficient way of solving a minesweeper board?"

*end of David Attenborough style self-naration* xD

Well this problem I have thought of quite a bit in the past, and I've figured out a few things. First off, there is no one technique which will be most efficient on every board. The point is, however to find that technique, that style, that way of approaching a game that will yeild good results most of the time. This is basically what a minesweeper player is: A machine for churning out styles of play which produce good results most of the time.

One of the most promising ideas I've encountered for finding this "ideal style" is basically to evolve it. The idea is simple: You give a computer a large set of rules, ranging from very general (like: Always move into the unopened regions of the field) to very speicific (like: Don't left click on a 1 which is 3 squares from the side and shares it's mine with a two on the second row from the side) and let it play minesweeper according to these rules and let it systematically evolve a style which is generally effective.

This sort of approach to complex problems is a proven one. The only trouble we have at the moment is figuring out how those rules look. (This is why I've been trying to do some combinatorix on minesweeper to try and figure out what rules to give such a program)

"But this isn't telling us how much work is on a board!!!!"

True, but this will tell us what approach to let our algorithm take to get to that optimal solution, and that is my point for this post.

Basically we need to equip our algorithm with a set of rulesets which we know are close to reality under certain (much more easily computed) conditions. One such factor one could use to discern between two possible approaches is 3BV, because it is very likely that the styles producing best results on two different 3BV ranges, will be notably different.

Well, that's my ten cents...

Tommy · Post by **Tommy** » Mon Aug 03, 2009 7:20 am

I like that a lot Bertie

Take into account though that not only clicks matter, path is easily as important (which is why 3bv/s is affected strongly by the 3bv of the board itself).

Tjips · Post by **Tjips** » Tue Aug 04, 2009 9:38 am

Thanks

The way to take path into account is by including rules which operates from the square which was last solved, i.e. "solve the closest pattern to the pattern which was just solved". We also of course need to take into account that when solving you don't really turn around too often (thus you have something akin to momentum). But this I think should emerge from the evolutionary method.

I also had a thought on how we can get a set of starting rules to put in. We ask the top players what rules they play by. I know I can contribute 4 rules to the pot

.Other rules we could extract from the usual minesweeper community interactions (like the idea of chording). Then we just need to write the evo program.

Tommy · Post by **Tommy** » Tue Aug 04, 2009 10:20 am

I can program, but I dont know whether I'll have enough time, in the next week I won't, after that, quite possibly.

I only just started with c++ though, do you think the overhead of using a scripting language is acceptable? I'd much rather (and quicker) do it in python

Tjips · Post by **Tjips** » Fri Aug 14, 2009 9:53 pm

I have C++ experience, so I'll be able to kick in from that perspective. I also think a good way to structure it would be to have two or three sets of rules, each differing by one rule, play the same board after which each of the "odd one out" rules gets points according to it's performance. The way we choose the rule sets is by randomly selecting rules, but weighted using those points. How's that sound

.

Cryslon · Post by **Cryslon** » Sat Aug 22, 2009 8:37 pm

Please test my implementation of ZiNi.

@Bertie: how would you create the rules? IMHO it's not easy task...

(edit) deleted old version

Cryslon · Post by **Cryslon** » Mon Aug 31, 2009 4:11 pm

ZiNi calc 0.1 contains stupid bug: openings may be hit several times and it affects premia of border cells.

(edit) deleted old versions

Cryslon · Post by **Cryslon** » Tue Sep 01, 2009 3:26 pm

Quiet ZiNiCalc (yeah i know i shoulda just implement an option but i'm too lazy).

zinicalcq.zip: (7.4 KiB) Downloaded 3333 times

Tommy · Post by **Tommy** » Wed Sep 02, 2009 1:45 pm

On another note, what does ZiNi actually stand for?
Zimmermann-Nikolaus?

RonnyDeWinter · Post by **RonnyDeWinter** » Fri Oct 02, 2009 10:28 pm

Tjips wrote: Thanks

The way to take path into account is by including rules which operates from the square which was last solved, i.e. "solve the closest pattern to the pattern which was just solved". We also of course need to take into account that when solving you don't really turn around too often (thus you have something akin to momentum). But this I think should emerge from the evolutionary method.

I also had a thought on how we can get a set of starting rules to put in. We ask the top players what rules they play by. I know I can contribute 4 rules to the pot .Other rules we could extract from the usual minesweeper community interactions (like the idea of chording). Then we just need to write the evo program.

I also like these ideas of Bertie, but I would like to take it even further. The only way to really be able to compare boards is not in clicks, path, 3BV, openings and patterns......but in time! In other words you'll have to convert all actions needed to solve the board into time variables. And the only way to do this is to make a solver that adds time for all psychical limitations of a highly trained human minesweeper (for instance using the current fastest players as a reference for move speed, click speed, reaction time). It's complex but I think it could result into something much more accurate than the current methods and it could also be used to determine the 3BVs difficulty of a board or RQP.

Not only we could finally compare times with times and not times with clicks, but in the end we could even turn it into a very fun split screen feature to play AGAINST this humanlike solver (with a nice slide bar between 0% and 120% WR speed) so you can set it at a speed that you can actually beat it.

My fingers are itching to try and create such a thing, but since it would probably cost me weeks or months, I'll first have a good night sleep about it.

Tommy · Post by **Tommy** » Sun Oct 18, 2009 11:32 am

Holy crap.

Do you realize how much one could learn by playing against something like that?

And even better than split-screen - one screen, two cursors

Squares that both have opened/not opened yet appear as usual - the rest has 80% your view of it and 20% the solver's. That way, you can see exactly what the solver is doing in the background, which is better if you want to learn how to imitate it, which is probably a good idea.

Cryslon · Post by **Cryslon** » Mon Oct 19, 2009 6:12 am

Tommy wrote:Do you realize how much one could learn by playing against something like that?

I fear one could learn how to create perfect fakes. But hey, progress is unstoppable, if you manage to write this humanlike solver, Ronny, it will be great thing.

And i should admit that my implementation of RandomZiNi is awfully bad, because the probabilities of choosing different squares with equal premia aren't equal. I don't have enough time and motivation to fix it now, but if you wanna test R.ZiNi or RH.Zini you should rewrite randomizing procedure.

RonnyDeWinter · Post by **RonnyDeWinter** » Thu Oct 29, 2009 1:18 am

Cryslon, maybe you could help with this brain twister about HumanZini that I can't figure out yet:

Assuming that my experiments are right, HumanZiNi seems to be a much better indicator on expert level than on intermediate level. Do you have any ideas what's causing this?

(It probably has the same reason as why people use relatively much less flags on intermediate record boards than on expert record boards, but I can't figure out how this negatively effects HumanZini's effectiveness.

)

Cryslon · Post by **Cryslon** » Thu Oct 29, 2009 6:52 am

RonnyDeWinter wrote:Cryslon, maybe you could help with this brain twister about HumanZini ...

Hm... To make some judgements we should firstly know what do you mean by 'better indicator'.

The main weakness of ZiNi and H.ZiNi is the way they choose square to click when there are no non-negative premia. Non-random ZiNi's always choose top left square and it may miss places where it could have won some clicks. [name removed]' 11 board is a perfect example showing how H.ZiNi may fail: there's a pattern which could be solved with flags with sup1 IOE, but H.ZiNi doesn't notice it because of bad strategy of choosing square to click.

Also, H.Zini is much more vulnerable than ZiNi, because while ZiNi sees closed squares, H.ZiNi does not. H.ZiNi opens all openings at first, and if there's no flagger-friendly piece adjacent to one of the openings, it starts to open board from top left corner. So the conclusion is: the fewer the number of places where flagging is more effective than NF, the bigger the chance that H.ZiNi misses these places.

That's all i can say about ZiNi without knowledge of your methods to study it.

Also, i think Elmar could add sth too...

RonnyDeWinter · Post by **RonnyDeWinter** » Fri Oct 30, 2009 9:44 am

In short I've tested the effectiveness of hZini by looking at which algorithm showed the biggest deviation between the normal distribution of random boards and all worlds ranking boards. Could it be that because people flag significantly less and differently than the hZini protocol on Intermediate that we don't see much difference between the effectiveness of 3BV and hZini on Intermediate?

Tommy · Post by **Tommy** » Fri Oct 30, 2009 1:29 pm

I'd say because mostly NF is probably most efficient on int even if the premia are positive (or zero), because path is much much more important on int.
Maybe try changing the premia descision from >=0 to >0 or even >1 (only flag if the premium is positive, or only flag if the premium is greater that one) and see if that's a better fit.

Cryslon · Post by **Cryslon** » Mon Nov 02, 2009 7:14 am

@Ronny: well, all those normal distributions and standard deviations are good for some theoretical research, but imo there are another and somewhat more important characteristics of board benchmarks.

For limits we need benchmark which doesn't allow dreamboards. Ie, benchmark should show low results on boards which you can finish much faster than your personal best.

For speed measures we need benchmark which shows almost constant results for benchmark/time.

Obviously, 3BV fails both criteria: it's possible to construct extremely easy board with high 3BV (even with very high 3BV), and 3BV/s is higher on higher 3BV. Don't understand me wrong: 3BV is pretty good in first approximation, but surely shouldn't be used for such things as board limiting.

I think ZiNi is pretty good thing for setting limits. At least, i don't know VERY easy boards with high ZiNi value, and imo such a board is the only thing which would discredit idea of ZiNi limits. There are some vulnerabilities in ZiNi, so you can construct pretty easy board with >100 ZiNi, but that board still takes by far more time to finish than those in dreamboard thread.

@Tommy: i think your suggestion contradicts with main ZiNi's point. Your modification just makes ZiNi a bit closer to 3BV, which isn't good imo.

Tjips · Post by **Tjips** » Fri May 14, 2010 1:19 am

Cryslon wrote:@Bertie: how would you create the rules? IMHO it's not easy task...

Well after carefully studying what Elmar and Christoph have done in the ZiNi algorithms (which I haven't really done up till today

) I think I've found a simple waay of implementing different rules. The key idea is that of a premium as used in the mentioned ZiNi algorithms. Simply put a rule can fall into one of two categories, (a) Rules that change the value of the premium, usually a rule would correspond to a term in the premium's formula, and (b) Rules that change the way the premium is used/implemented, like only computing premia for squares within a certain radius, or solving for premia >0 or >=0 etc.

I think we could relatively easily find a way to express most of the rules we can think of in this framework. This also leaves room for the addition of a few more ad hoc rules if it where necessary.

Cryslon · Post by **Cryslon** » Sun Jul 04, 2010 5:31 pm

I've fixed implementation of Random ZiNi.

Tjips · Post by **Tjips** » Sat Sep 04, 2010 4:27 pm

Did a few runs with my implementations a bit ago.... Not a serious post, just wanna link the image in the IRC channel

EDIT: Seems to be for int

Authoritative Minesweeper

new benchmark: 3bv for flaggers

new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers

Re: new benchmark: 3bv for flaggers