Fault-tolerant hosting

September 14, 2006
Let's say you have a number of servers at a hosting company. If one of the servers becomes unresponsive, said hosting company (herafter SHC) have a “control panel” that allows you to request either a hard reset or a KVMoIP connection to the doomed machine. In order that the server will recognise the KVMoIP (or, more correctly, the keyboard and mouse parts) SHC will hard reset your machine for you.

Let's say further one of your servers - they call it Server A - stops responding and you request a KVMoIP to it, but when you connect you find you're looking at Server B. So you call up SHC and tell them they've connected the KVMoIP to the wrong server.

“No”, they reply. “You must have labelled the machines wrongly or have otherwise confused which of servers A and B correspond to your names for them. If you request a KVMoIP for Server B you will find youself looking at Server A.”

So you request a KVMoIP for Server B, but when you connect you find you're still looking at Server B. So you call SHC and tell them you're still seeing Server B. Whilst you're on the phone, Server B (which, if it wasn't obvious, is running fine) is unceremoniously powered off and rebooted, a process which ends with the machine booting to the chilling message “Operating System not found”.

After a little more time - during which it becomes apparent that the KVMoIP attached to Server B is either improperly connected or faulty and has to be replaced - the RAID interface for Server B is available through the KVMoIP and shows that one disk has catastrophically failed (most likely due to being switched off whilst running) and cannot be rebuilt. A replacement will be needed.

Meanwhile, Server A is still unresponsive, and SHC again explains that the KVMoIP that showed Server B was attached to Server A and there must be some three- or four-way confusion where Server A points to Server B, Server B points to... wait... anyway, some other server, possibly will point to Server A. So you should request a reboot for Server A and see which server is in fact rebooted and that will show which one it really is.

A reboot for Server A is thus requested which promptly reboots and, in a stroke of luck, returns to normal operation. No clue as to why it crashed mind you.

And thus, back to Server B and the big question: how many cock-punches does the person who tells you it will cost £60/hour for an engineer to swap in the replacement hard disk deserve? And how many more if you knew it was a hot-swappable machine where the drives slide out the front?

Vera Duckworth and Huey Lewis?

December 16, 2005
I've been working on the Cricket game at work. On the actual game itself as well as the web interface, which is nice because my C++ was getting very rusty. It was a bit of a drop in the deep end mind you, as the first things I had to fix was the botched Duckworth-Lewis calculations in the game which change the score the second-batting team needs to achieve to win when a game is shortened by rain.

Actually, that said, the problem was less with coding than with finding out how to implement the “D/L”. Whilst almost all descriptions of it say something along the lines of “[t]he D/L method is relatively simple to apply” they forget to clarify “as long as your score matches one of the examples we've supplied”.

The main problem is that the D/L is actually sooper-seekrit and all they sell is the resulting look-up table. The previous programmer had not only used an old table but also just part of it, averaging out the missing values - meaning the games calculations were generally off. Compounding that, it was always adding its adjustments, which meant that on the occasions were the new target was supposed to be smaller it was in fact much larger.

The other problem is the difficulty in confirming my interpretation of the calculations. I'm all but certain I grok it, but the lack of absolute certainty led me to put up a few examples and solicit feedback from the players. People were mainly happy but a couple were convinced I'd got half of them wrong - though without supplying any calculations to prove it, just generally “feelings” things were wrong. But then, as the WIkipedia page notes: “the D/L method can produce results that are somewhat counterintuitive”.

I think we're going to go live with it Monday. Even if it's slightly wrong, it's an improvement over the current one.

I'm probably not going to take that much notice of it anyway

September 27, 2005
From this morning's trawl of the spam-trap generic work email inbox.
On the 1st of november , we will have to pay for the use of our MSN and email accounts unless we send this message to at least 18 contacts on your contact list. It's no joke if you don't believe me then go to the site ( www.msn.com ) and see for yourself. Anyways once you've sent this message to at least 18 contacts , your msn dude will become blue. please copy and paste don't forward cos people won't take notice of it otherwise!

Also, the word gullible isn't in the dictionary.

Where the Sun don't shine

September 06, 2005
Sun Microsystems is an interesting company. They've done a lot for open source stuff with their cash behind both Java and OpenOffice. On the other hand they tried to persuade the world that “the network is the computer” and everyone should buy expensive Sun servers to power “cheap” thin clients. They also came up with Network the Dog, who seems to have all but been erased from the Internets these days - I guess he's one dog who was just for Christmas.

I cut my *nix teeth admining a Solaris box though, so I had some fond memories - until today.

As a few people know, I work for a small online games company you've never heard of. We do online sports management sims based on the lower, amateur side of sports. The oldest game we have is Sunday League which has been around since mid-2000 and takes its impetus from Sunday League Football wherein a number of variously talented lads with degrees of fitness attempt to injure each other on the field of battle play before getting lathered in the pub.

Long before I arrived Cat Games decided to trademark Sunday League. This is, sadly, a long, involved process which is one of many ways for lawyers to squeeze money out of you. It gets even worse when a big company like Sun decides to take a pop at you. A care package came back from the lawyers today - Sun is apparently blocking our trademark on the basis that people might get confused; that they would think Sun were involved in Sunday League. That, to be more specific, when people heard “Sunday League” they might think of a “league” or range of “Sun Ray” or “Sun Blade” products and, I don't know, sign up for our game instead of buying an over-priced web server.

We just think they're bat-shit insane.

Feedback

September 05, 2005
Some people put too much into their feedback...
He also spelt off wrong

(and he also spelt “off” incorrectly)

POBCAK

August 13, 2005
A user emailed in with a problem thisyesterday morning. It was a fairly simple issue - instead of the login page, they'd bookmarked a page inside the game. By unlucky happenstance, it was the page that was longer operative in the new update. So instead of being bounced to the login page they just got an error.

So they emailed the address given, IN ALL CAPS, complaining. I replied, explaining they should bookmark and login at the front page, not random pages in the game.

Then our Internets connection was killed at work (current scuttlebutt is that BT themselves may have deep-sixed their own service in order to get overtime fixing it) so it was until I got home that I found the second and third emails they'd sent, lamenting our lack of response and describing the situation as “bloody silly now!!!!!!!!!!!!!!!!!!!!!!!!!!!” and “not good enough!!!!!!!!!!!

I'm guessing from their aol.com address they don't have a whit of an idea how to control the spam filter >_<

Top 5 least favourite feedback messages

June 03, 2005
5. any feedback/error report by an English speaker featuring txtspk or atrocious spelling.
4. people who assume everything is a vast conspiracy to get money our of their wallets without doing any work.
3. people who report an error and what it was or where it happened but not both.
2. Americans who think there's some vast gulf in customer service between Europe and the US: “Maybe your performance is acceptable in Europe, but in the U.S. it is not.”
1. people who say “I had an error” and nothing else.

The longest day

June 03, 2005
Oi vey, what a long day yesterday was.

Read more »

From the department of residual paranoia

May 07, 2005
Checking the freshly-minted Logwatch output from the now resurrected ex-hacked server I saw three logins through SSH. One was from the office IP (which would be me), one from my home IP (uh... me again) and one from a Spanish IP with a number of password failures before one success. Oh noes! Spanish hax0rs!?

Well... there was the possibility it was our guy in Spain, so I IM'd him.

Read more »

H4x0r3d

May 04, 2005
Someone managed to get a rootkit on one of the boxes at work. The first clue we had was when the password was changed. Since I log in through ssh using no-password authentication this didn't affect nor stop me logging in. At first I thought maybe someone else at work had changed the password - but then it turned out that every command that made use of dates threw up a segfault - notably ls -l.

Read more »