ascorbic.net/catbox

What is the chatterbox archive?

Simply put, it is a fully searchable database of the enlightening conversations that happen in the chatterbox. There is also a page of vaguely interesting statistics about the messages in the archive.

How does it work?

The archive is written in PHP, with a MySQL database as the backend. It's been completely rewritten since I took it over from wonko.

Once a minute, a cron job calls a PHP script that grabs The Universal Message XML ticker from E2. This contains the latest things said in the chatterbox. Using PHP's DOM XML parser, this is broken up and then inserted into the database.

Every hour the statistics script is run. This runs a series of queries on the database to generate those fascinating stats. These are saved to a static file that is included on the front page.

There is a full text index of the messages in the archive, so you can type natural language queries into the search box and all the matching messages appear, sorted by date. You can also search by the name of the user who spoke, or restrict it further by entering both.

If you join a conversation in the middle, and want to catch up, you can always see the latest messages in the archive at http://ascorbic.net/catbox/latest.

In order to make the XML acceptable to the parser, the messages are stored with hard links intact. They are converted to real links to E2 when they are displayed. You can see a variant of the function I use to do this here. Feel free to add it to your own site, for E2-style hard links and pipe links.

The witty little chatterbox topics are also now tracked, and appear above each page in the archive. Topic changes are highlighted in the archive. Topic changes are also stored as messages in the database, with the special user name @topic. Search for that name and you can see all the topics. Magic!

What is it running on?

Since I took over the running of the archive in April 2002, it has moved between a few machines. Firstly it was on my company's server in Telehouse: a 900MHz Athlon, I think, running Red Hat 7.2. When I left that job at the beginning of June 2002, I put it on the only spare computer I had at the time: a rev-A bondi-blue iBook, running Mac OS X, and stuck it at the end of my cable modem. Running it from an old laptop over 802.11 was sketchy as hell, and my flatmates kept on putting it to sleep or shutting it, so I dug out an ancient IBM P90, with 16MB of very tempramental RAM and a 500MB HD, and put Debian on it. The hardware was useless, but it almost did the job and was the best there was available at the time. When I moved I gave up on that, so it then moved to a shared server provided by phpwebhosting.com, who seemed good enough. It was a dual 1.4GHz PIII, with a gig of RAM and 64GB SCSI RAID, running Red Hat. They became crappy, so I moved to a virtual private host, so I have root again. It's pretty slow, mainly due to the lack of memory. It's running Debian stable. It was using turck-mmcache to cache the pages and database results, but that stopped being maintained, so it won't work with newer apache. I plan to move moved it to memcached. My next trick will be was to separate the old (i.e. no longer on the first page) messages from the latest. Old messages are stored as page-sized blocks using memcached (packaged for Debian by our very own jaybonci). In September 2007 I dumped the virtual private server when they shut me down without warning for alleged late payment (they didn't even sent me a bill). After paying the bill, I decided I'd had enough of them, so moved back to my company's servers. It's now on our old-ish dual Xeon server, which is sitting in a colo facility at the top of Merchant Venturers Building in Bristol. First time in years that it's on a machine that I've actually seen.

So, happy chatting. You are being logged (hopefully).