Pandora's Box of Clouds

Pandora's Box of Clouds

I’ve been reading some of James Hamilton’s publications recently: he’s one of the clever chaps behind Amazon Web Services (AWS).  One line bio from his blog:

James is a Vice President and Distinguished Engineer on the Amazon Web Services team where he is focused on infrastructure efficiency, reliability, and scaling.

What struck me about Hamilton’s work is his combination of pragmatism, engineering and operations: now that’s my idea of a holy trinity!

Hamilton works for AWS and is obviously heavily involved in making the service a game changer, and has been successful at it, but how is he doing this?

If you look at Hamilton’s publication list you can get some clues to the human weaponry that AWS have.  Here’s just one example:

Data Center Efficiency Best Practices (pdf)

This is looking beyond mere servers and looking at the whole data center.  Slide 5 points at a $200M / 15MW / 15yr  facility and $100M / 50,000 / 3yr servers.  The monthly costs are $5.6M, of which $2.3M are related to power and confirm the approximate rule of thumb that for every $ of compute there is a $ of power.

Slide 24 talks about the Co-operative Expendible Micro-slice Servers (CEMS) which is a joint project with Rackable to produce efficient servers, and he makes a great point that instead of growing bad facilities, why not move to containers and repurpose the data center as more valuable office space?

I’d just like to note that at the recent VMworld09 a Cisco Unified Computing System of 512 blades ran between 30,000-40,000 VMs.

AWS are recognizing that their data center is core to their business and IT is a core competency to selling stuff better than the competition.  So they invest in IT and brains to make them better than the competition.  I don’t think that there is a vendor on the planet than could do a better job than them today, even considering Microsoft and Google.

After reading James Hamilton’s work I could not imagine AWS using someone else’s weaponry, such as VMware’s vCloud, in the Cloud arms race.  I’m sure AWS have looked at vCloud, but they would look at more than just the API and would burn through the marketing with their efficiency and effectiveness laser focus.  If using vCloud improves the PUE and DCiE figures compared to the home-grown system, then would vCloud be a better solution than AWS home-grown CEMS and Xen?

How about we consider someone who isn’t AWS but want’s to compete with AWS in the Public Cloud space.  First of all there are two great barriers to entry to providing a Public Cloud service:

  1. Investment Capital. From the numbers above, you can see the hundreds of millions that AWS have _already_ spent, and today’s economy is not one in which to raise capital on the markets.
  2. Intellectual Capital. There is a scarce number of smart people who are like James Hamilton.

You could argue that vCloud gives other Public Cloud builders a “leg up” by letting them buy the weaponry that reduces these barriers to entry and can get them towards an AWS-like service at a lower investment.  But is there really a shortcut to becoming AWS or better? If there is, I reckon AWS are the only people that know what it is, and they are unlikely to share it.

For the companies that are getting into the public cloud business I can only see them complement AWS if they offer some service that AWS doesn’t.  If they try to offer the same service at a lower price than AWS : how?  They must be more efficient and effective than AWS, and I can only see that happening through brains, hard work, time, and significant investment.  No short cut.

So if you can’t compete publicly with AWS, then that leaves the internal/Private Cloud (which AWS will be eating into steadily – did you see Virtual Private Cloud?  If AWS are eating into the Private Cloud now, then they are putting pressure on internal IT service providers.

If you thought there was a big wall between Public and Private cloud, think again: VPC is a huge hole in the wall, and your internal application users are looking longingly at AWS through that hole.  The Inter Cloud is coming!

There’s a lot of talk in Cloud circles about APIs, standards and virtualization, but I wonder how many internal/Private Cloud proponents are applying the same laser focus on efficiency and effectiveness like James Hamilton is doing at AWS?  Surely the efficiency of a cloud is more important than if you and I use the same name for it?

If you think that an internal/private cloud is just another VMware cluster with some fancy self-service catalogue wrapped around it, then think again.  Out of all the “what is a cloud?” discussions, I want to see more focus on money / power / useful work efficiency which results in hard, tangible $$ on the costs to deliver and price to buy.

My question to anyone thinking about an internal/private cloud:

Is buying top end hardware and software from an arms dealer the way to achieve that efficiency and effectiveness and offer a competitive cloud service?

I think that it is possible, but it isn’t guaranteed.  There are a bunch of things you must do to provide a competitive cloud solution, across purchasing, architecture, engineering, operations, sales, marketing – it’s a big bet.  Being effective means your service is good, but that’s not enough.  You have to be efficient at delivering that effective service.  You need to be good at both to beat the competition.

This is important  because if internal/Private Clouds become common place in a business vertical (e.g. banking) then they will become a competitive differentiator: an efficient and effective large-scale cloud will cost $millions less and drive $millions more revenue than the inefficient and ineffective competition.

Imagine the Banking CIOs playing golf at Wentworth, with most people envious of the CIO with the most efficient / effective Inter-Cloud, and majority laughing at the CIO with the inefficient / ineffective Private Cloud.

The numbers are industry standard numbers (PUE, DCiE, DCP and more) and will be easy to produce.  At last, IT will have objective numbers that can be compared across the industry: great for business, bad for Inefficient IT.

And it gets worse for Inefficient IT: if you have an inefficient and ineffective internal/Private Cloud, and we all know about it from the numbers, then now we have abstracted the business services from the infrastructure/platform it’s real easy just migrate the workloads, either piece-meal or en-masse.

Migration to Cloud is inevitable.  It’s not a binary switch from current, internally provided IT to externally provided Cloud.  There is at least a step in the middle called Private Cloud, and perhaps a more hybrid destination of Inter Cloud.  But Pandora’s box of clouds is open there’s no going back and the hard numbers will show the winners and losers.

No related posts.

9 Comments

  1. Brad Hedlund says:

    I really enjoyed reading this Steve.

    Thought & Question: James Hamilton points out the “Container”, in which you can effectively by PUE in a box. Won’t that be the easy way out for what you call “Inefficient IT”? With containers, IT can buy a James Hamilton DC in box and simply plug it in to their Private Cloud management tools for their “differentiating workloads”, and move the commodity non-differentiating workloads to AWS. ?

    Cheers,
    Brad

  2. Good point, there are lots of Hamilton-esque things I couldn’t fit into the architecture but you can see in his work. The other parts I didn’t have room for are the server/cluster design aspects (it’s ok to fail, just make it invisible to the customer) and operational aspects (restart/reboot/rebuild etc). I didn’t mean to try and make JH one dimensional, but somehow I managed it by mistake ;-)

    I think if you wanted JH in a box then that box would have to capture those design and operational practices too. Put it this way: a container built using JH design + ops will be more efficient and effective than a non-JH container.

    Make sense? Thanks for pulling me up on it!

  3. John, some how Akismet thought you were spam! hahahahah! as if! ;-)

  4. Brad Hedlund says:

    Well, PUE has many complicated dimensions in itself. I would argue speaking of James Hamilton’s work in strictly a PUE sense is not minimizing to diminishing it any way. PUE, in my opinion, is the hard part (for an average IT shop) – whereas Cisco, VMware, EMC/NetApp for example are making things easy on the design/opps side.

    True, buying Containers does not instantly turn your DC into James Hamilton-esque overnight, but its a big and relatively easy step in the right direction. Containers take care of the PUE angle, while Cisco UCS + vSphere can address the design/ops angle. The combination of the three Containers+UCS+vSphere may not get you everything JH has built at AWS, but it could get pretty close, at least “good enough” for the Private Cloud in a typical Enterprise DC.

  5. The last missing bit is the all important operations. I think we’re all trying to fix that with tools, which is part of the solution, but I wonder if there’s a simpler version of “ITIL for the data center”, or more likely something like Cobit/ISO20000 – a middle ground between the tech-heads and the process-primadonnas…

  6. gblnetwkr says:

    In addition to having smart people and using paper plate servers instead of fine china, AWS has the advantage of a very heterogeneous work load. This work load lets them run at very high levels of utilization with very little “safety stock”.

    By comparison the typical data center is subject to the “politics of the DC” – if the DC runs out of compute,the CIO gets fired. If the DC has too much compute, no problem.

    So the business efficiency of the typical data center is way lower than AWS.

    However, DC’s are starting to experience what manufacturing experienced 20 years ago when they got rid of inventory and slashed safety stock. We expect that various forms of secure cloud bursting ( AWS VPC, GOOG GAE, CohesiveFT VPN3) mean that instead of holding safety stock as Capex in the DC we will see peak loads handled by Opex in the cloud.

    This will be uncomfortable at first, as were the first “just in time” efforts. However, the impetus to drive down Capex is very strong and we will figure it out.

    So, while we should learn lessons from people like James Hamilton, the goal will be not to duplicate what he has done, but to leverage it, on demand, in a way that drives up the utilization and business efficiency of our DC

  7. Daniel Baird says:

    What i’d love to see is the vCloud API become a fully adopted open standard. Then Amazon can just add the vCloud API as an option to EC2. Telcos can build their offerings on VMware (or Citrix Xen if they get something decent going). No Hotel California! Actually, i’d *want* my VMs running on different platforms, with the vCloud API only for checking in and out. if VMware ever has another bug like that 30-day-expiry one, if i’m running VMs on Amazon’s EC2 as well, then they wont be hit.

    I read somewhere that there are 3 computers on the Space Shuttle, with hardware and software designed/written by independent teams of engineers, but all with the same requirements spec and API definition. This is a brilliant idea. if one computer’s output is different from the other two, its ignored. bugs in hardware and software are expected, planned for.

  8. I love that shuttle comment! When I get asked “Hey Steve, Vendor Y has just updated Product X – will it still work with Product Z?” < - my answer is: yes, if they coded to the API and *that* didn’t change and they did regression testing ;-)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Spam protection by WP Captcha-Free