Let's say you have a number of servers at a hosting company. If one of the servers becomes unresponsive, said hosting company (herafter SHC) have a “control panel” that allows you to request either a hard reset or a KVMoIP connection to the doomed machine. In order that the server will recognise the KVMoIP (or, more correctly, the keyboard and mouse parts) SHC will hard reset your machine for you.
Let's say further one of your servers - they call it Server A - stops responding and you request a KVMoIP to it, but when you connect you find you're looking at Server B. So you call up SHC and tell them they've connected the KVMoIP to the wrong server.
“No”, they reply. “You must have labelled the machines wrongly or have otherwise confused which of servers A and B correspond to your names for them. If you request a KVMoIP for Server B you will find youself looking at Server A.”
So you request a KVMoIP for Server B, but when you connect you find you're still looking at Server B. So you call SHC and tell them you're still seeing Server B. Whilst you're on the phone, Server B (which, if it wasn't obvious, is running fine) is unceremoniously powered off and rebooted, a process which ends with the machine booting to the chilling message “Operating System not found”.
After a little more time - during which it becomes apparent that the KVMoIP attached to Server B is either improperly connected or faulty and has to be replaced - the RAID interface for Server B is available through the KVMoIP and shows that one disk has catastrophically failed (most likely due to being switched off whilst running) and cannot be rebuilt. A replacement will be needed.
Meanwhile, Server A is still unresponsive, and SHC again explains that the KVMoIP that showed Server B was attached to Server A and there must be some three- or four-way confusion where Server A points to Server B, Server B points to... wait... anyway, some other server, possibly will point to Server A. So you should request a reboot for Server A and see which server is in fact rebooted and that will show which one it really is.
A reboot for Server A is thus requested which promptly reboots and, in a stroke of luck, returns to normal operation. No clue as to why it crashed mind you.
And thus, back to Server B and the big question: how many cock-punches does the person who tells you it will cost £60/hour for an engineer to swap in the replacement hard disk deserve? And how many more if you knew it was a hot-swappable machine where the drives slide out the front?
Fault-tolerant hosting
Comments
No comments yet
Add Comments
You'll need to register to post comments.
You must be logged in as a member to add comment to this blog