Our first production HA failover
Posted by martijnl on February 13, 2007
A quite unexpected event yesterday was the very first HA failover in production. Although we had tested it and seen it work a number of times in our testing environment it was something else to see it in the production environment. As a result of which we weren’t looking for an HA failover when all of a sudden 14 servers went down.
After reviewing the logs we found out that the servers in question were moved because of a failover. It turned out that one of the hosts lost it’s network connection for three seconds (we still don’t know why) and that HA decided to power off and move all the servers as a precaution.
We can safely say that HA works.
PC Blade Daily Links 2007-02-14 - PC Blade Daily - Practical News and Views on Centralized Computing said
[...] Documenting a Virtualization Project: Our First Production HA Failover “We weren’t looking for an HA failover when all of a sudden 14 servers went down … We can safely say that HA works.” [...]
Pranav said
What kind of HA system do you guys use ? Heartbeat-DRBD ? or is it something else ? I too have implemented HA for our product, so was curious.
martijnl said
We use the VMWare HA option. See http://www.vmware.com/products/vi/features.html#c836 for details