Finished Chicken

“Eggs in one basket” is a meme, it is not a tangible real thing.  It’s an irrational but real fear.  Let’s kill it, people!  Eggs are laid by chickens.  Right?

Applications in virtual machine are not like eggs, and hosts are not baskets.  An egg is easy to break, and when it gets broken it’s finished for good.  You can’t put an egg back together.

A modern computer application, unlike an egg, can be resilient to breaking and even when it does break it can be restored back to normal in minutes.

Don’t be scared of something going wrong: embrace it, for it will happen, my friend.  When you face this eventuality you are set free.  You will build resilient, automated and highly available systems.  If you think of applications as eggs, where any failure is a catastrophe, you’re screwed.

What about the basket?  The biggest fear in virtualization is that multiple applications running on one host means that any outage to the host means a big impact on service because multiple hits happen at the same time.

The pressure is on the host to be as “up” as possible, with five-9′s desired.  This is wrong. Plain.  Wrong.  The Host is not the problem, it’s our thinking that is wrong.  Here’s why:

  • All applications can fail at any time for many reasons.  Applications should be resilient to failure through human techniques such as Electrify The Fence and protective  redundancy.  If one node dies, the service keeps running at reduced capacity.  If your service can’t afford this kind of resiliency and it has Single Points of Failure (SPoF), then your availability might be in the 80% range:  that’s a fact of life, like buying a Ferrari kit car with a Toyota engine means – guess what? – IT AINT A FERRARI.
  • Face up to the main cause of outages: me and you; us dopey humans.  Our success is measure by the Mean Time Between Cock-Up (MTBCU).  For those unfamliar with my English vernacular, a “Cock-Up” is pulling the wrong cable, rebooting the wrong server, and eating bacon with jam.
  • If you think a Host = Basket, then you’re wrong.  Your face is so close up to the paper you can see a word but not a sentence, and certainly not the whole story.  At last week’s London VMUG I heard two people refer to the Basket as the platform/vendor or a stack of blades.  That’s more like it.  I prefer to think of the basket as The System, a top-to-bottom layer of technology stacks (compute, network, storage) with resilience and recovery built in, where you can measure availability from the view of the service consumer.  In The System I expect apps to break, servers to be rebooted incorrectly, networks to flap, disks to break: and because of this, I build in resilience and recovery to cope.
  • Until Administrators are rewarded for higher ROI through higher consolidation ratios, instead of being severely penalised for the impact of failure, then virtualization will stay at 30% and people will equate Eggs in one Basket to “How to get fired”.
  • Nobody, not you or me, knows when Too Many is Too Many.  Is one egg in a basket too many?  What about two?  What about thirty-two?  What about three hundred?  How do you calculate the line-that-must-not-be-crossed?  Yes, this is a risk calculation, but you show me an Admin who has set Maximum VMs Per Host because of a Probability and Impact calculation…

So there you have it: applications are not eggs and hosts are not baskets.  The fear of “Eggs in one basket” is irrational and unfounded IF YOU ACCEPT FAILURE HAPPENS AND BUILD RESILIENCE AND RECOVERY INTO THE SYSTEM.  Note those important words:

1. ACCEPT FAILURE

2. BUILD RESILIENCE AND RECOVERY

3. THE SYSTEM