Page 1 of 1

Cataloguing the guestbook

Posted: Mon Mar 26, 2012 12:49 am
by aradesh
There's been a fair bit of talk about the guestbook and the forums lately, and I've thought before it would be nice - generally for looking things up and for historical purposes - to put the contents of the guestbook into a nice database format.

I'd be willing to do the work for this - but I have almost no experience with databases and this sort of thing.

Could I have some advice on what would be a nice standardized format to store all the posts from the GB, so that for instance it would be easy to program a website to do things like searching with conditions on the name, contents of the post, restrictions on time period, etc? i.e. I might for instance wish to see every post Kamil made on the GB to follow his progress from beginner to pro.

I could probably hack some tables and rudimentary things together in python but likely it would be inflexible and not particularly useful for other people.

What do you guys think?

Re: Cataloguing the guestbook

Posted: Mon Mar 26, 2012 11:03 am
by Tommy
Sounds interesting!

The HTML from http://www.minesweeper.info/downloads/Guestbook11.html, for instance, should be easy enough to parse.

Writing the script that parses the old GB HTML files, and stores them in some database, would already be huge. Damien would have to write the frontend, since that would be hosted on ms.info. But that isn't much work, just a bit of SQL would be fast enough for our purposes (ie, I don't think we need to bother with a full-text index yet, we can just use LIKE).

Re: Cataloguing the guestbook

Posted: Wed Mar 28, 2012 11:07 pm
by thefinerminer
What a great idea!

Many years ago, when I knew nothing about programming, I started copying and pasting posts into Excel (and adding dates, name of player, category for easy sorting). I got bored after about 500 entries. Parsing it should be pretty simple, although:

1) The 2009-2011 guestbook (I have it zipped but haven't uploaded yet) includes replies to posts, so the parse code will need to be modified from what is used to parse 2000-2008.
2) I would like a database column for playerid added to the code. After the guestbook is parsed, a specialist (me?) can sort the posts by name and add in the playerid to non-spam posts. This may be useful as I know everyone's nicknames and can identify who did what (to help future people quote them for articles etc or to understand who was arguing with who).

Re: Cataloguing the guestbook

Posted: Thu Mar 29, 2012 1:01 am
by aradesh
thefinerminer: that would be awesome. and a terribly tedious job for you! but it would be cool - in that you'd be able to figure out when the same person has used a different name and connect them together with the same playerid.

also if i write a script to get all the current GB and comments, you can restart the GB with 0 pages.

Re: Cataloguing the guestbook

Posted: Thu Mar 29, 2012 9:02 am
by EWQMinesweeper
hehehe - i recall the one or the other post with a different nickname. i guess i might be able to help there too.

maybe we should make a list of columns that will be helpful in the future when searching a certain gb post. what do you think?