Provoking IT from Good to Great
How real life became my worst nightmare

How I screamed when I realised what I had done
I have one Cisco Unified Computing System (UCS) running 3,200 compute nodes on 64 half-slot Cisco B200 blades in 8 Cisco 5108 Chassis.
A critical part of my Unified Computing System is simple cabling: I only need 2 cables from each Chassis’s 2104 Fabric Extender to the each Cisco 6120XP Fabric Interconnect, for a total of 4 cables per chassis to provide highly available unified fabric for both LAN and SAN networking.
That’s a total of 32 cables for all 3,200 compute nodes (1 per 100), 64 blades, 8 chassis and all LAN and SAN traffic.
A consultant came to see me today to give us a insight into industry best practices, to see if we could improve our compute platform.
The consultant said we should separate, redundant cards, cables and ports for each class of networking traffic – VMs, COS, vMotion, Fibre Channel – why?
Because that was the industry best practice, and it was “less risky”.
I asked more about the risk, after all people have been trusting ESX virtual switches to keep traffic separate in software before it gets to the physical NIC. What was the specific risk, in terms of exploit, probability and impact? No answer from the consultant.
What about the benefits of such a change? What added value does this give me? No answer from the consultant?
What about the costs of such a change? Saving the Silent Consultant embarrassment, here’s what I think I’d have to spend:
- Assume I need four physical NICs then I can’t use my blades, so that’s a none starter as I’d have to invest in less-efficient and larger servers to cope with all those NICs and HBAs meaning at least double my cost, conservatively adding over $100k in CapEx.
- Even if I spent more money on the servers to get four pNICs, according to the consultant I’d actually need eight pNICs which could be four dual-port NIC cards, let’s add $2k per host just to buy, never mind install, configure maintain. Total $128k.
- Add the cost of the network ports + switches – that would be 64 hosts x 8 ports = 512 ports, which is about 512 / 48 ~ 22 access layer switches ($4k each) – that’s a grand total of $88k.
- I’d also need separate dual-port HBA cards, let’s call that $2k again for each host – another lump of $128k.
At this point we’re already running to a grand total increase in CapEx of $444k to add no value and mitigate no risks.
I don’t even want to think of the OpEx of such a change, but again off the top of my head:
- Labour cost to architect, purchase, install and configure the new environment.
- With all those added components and complexity, the increased risk (probability and impact) of operational mistakes on my environment.
- Cost of additional monitoring (switches, NIC and HBA) which also runs into $tens-of-thousands.
And then I woke up. I was dreaming! Life’s not like that at all! I’m not running Unified Computing…woah! Things are the other way around! I’ve already spent all that money… OH NOOOOOOOOOOO!
Oh My God! You mean I’m actually spending all that money for no return in real life? Please don’t let it be true, I couldn’t stand the shame! How will I explain this waste of money, time and resource to my senior execs, and to my customers who are footing this expensive bill?
Perhaps it’s not too late, what’s the number for Cisco?
Disclaimer: OK, I made this all up, but you get my point, dontcha?
PS: Stu (@vinternals) pointed out that I missed off FC, and I also realised I missed off power. Let’s call it a cool $500k to end the discussion, but know that the real figure is, scarily, much more and mostly invisible to the people who influence compute purchases.
Related posts:
| Print article | This entry was posted by Steve Chambers on 25 September, 2009 at 18:40, and is filed under UCS, good2great. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site. |
about 11 months ago
It’s exactly these kinds of costs that the average server guy doesn’t think of, and that Cisco would do well to focus on in their value proposition. I think you may have missed the cost of the additional FC switches / ports too
about 11 months ago
You’re right! How could I forget FC? Stick another few tens of thousands on then! Let’s call it a cool $500k
AND I missed the power! How much power for all those extra servers, NICs, HBAs, Switches… to produce the same amount of “Useful Work”? The DCiE / DCP would be much worse…
Thanks for commenting Stu!
about 11 months ago
There you go making sense again Stevie! The consultant said that because that’s the we we have always done it! We are resisting CHANGE…again! We need to learn to just go with the flow.
Dave
about 11 months ago
Good post and absolutely good shot for UCS. Many ppl do not forsee the hidden cost and long term cost. They will only realize it when they own or run the entire DC and look at the bill seriously. The consultant require more training or study before visiting the customer
about 11 months ago
What about the little guy? Isn’t there a value prop for the guy/gal who only needs Eight B200′s in two chassis to start-up a new ESX Cluster to show the company that they can become great by purging all these secondary cards.
Perhaps its just a convergence discussion but the valueprop to me is where a competeing product no-longer makes sense. EG. An HP C7000 loaded with 8 and you need to add the 9th compute blade and now need to add $500K of a new chassis and interconnects to slide that 9th in.
I see the UCS value greater than the parts but the customers still see it as server = blade.
about 11 months ago
I agree that UCS aint for everyone, John. In the recent London VMUG someone said that to me direct, and also spoke about the “cost of change”. My response is that I would never advocate UCS for every case, just like I never _used_ to think that ESX was right for everything. After doing a number of UCS business cases recently, admittedly for the top-end enterprises, when you look at ALL of the hard numbers then the “premium” tag just doesn’t cut it: the way businesses buy and depreciate their capital assets, and they way they charge their operational expenses – it’s quite a scary large number in these orgs (especially power and labour) and, worst of all, the “Useful Work” output is scarily low. I don’t think everyone is quite aware of the storm that is coming to large enterprise IT yet…
about 11 months ago
Most (all?) commentators for and against UCS haven’t, AFAIK, built a real business case with customers. The scary thing about when we (Cisco) do this, is how horrible the numbers look for the incumbent architecture – I wish I could bottle the looks and gasps that I’ve seen in meeting rooms in the past few weeks…
about 11 months ago
Here’s my issue with UCS, actually it’s not an issue as much as just a statement. It’s brand-spankin’ new and I don’t feel comfortable running my services using servers that are relying on a standard that isn’t even finalized yet (FCoE). We are a full HP and Cisco shop and right now cabling is one of our biggest nightmares. But I’m not willing to save CAPEX dollars by using servers that are first generation. In 5 years maybe.
about 11 months ago
Thank you for your post! Dare I ask if you are a “influencer” or “decision maker”?
What’s your business plan?
Wow?
about 11 months ago
@Richard Boswell
Hi Richard,
Assuming you’re using, at least in part, HP Blade Systems have you considered Flex10 with CX4 uplinks to reduce the amount of cabling? Although this configuration would still have more cables (eg: for fibre) than a UCS solution it could be a good solution to your cabling woes.
Just a thought..
Cheers,
Simon
about 11 months ago
I know people who use this ‘best practice’ as a way of avoiding Having to deal with lack of trust and poor communication between the service owners that would otherwise have to share a common network connection.
I’ve been on too many incident bridges not to know that failure to communicate changes to shared services leads to outages, but that doesn’t mean that not sharing is the right answer.
Regards
Simon
about 11 months ago
Richard,
FCoE was ratified a few months ago. I believe you are referring to Cisco’s DCE, which is not an IEEE standard.
We are preparing for a POC on the UCS platform in the next couple of weeks. Some of our concerns at the moment are multi-tenancy and scaling beyond 256 VLANs per POD. But hopefully we can address those concerns during the POC.
I think five years is a very long time to wait to adopt a new technology, especially if you have a lot of competition and need to differentiate yourself from them.
about 11 months ago
FCoE is not only ready on Cisco, you can buy it from brocade which does the same. I do not see a major problem on FCoE in this case.
about 11 months ago
hearing the term “best practice” should set off alarm bells. It’s usually said by people who don’t understand what they’re talking about. They’re just parroting what they’ve been told. I was in the VI3.5 fast track course and was pelting the instructor with questions. why this, why that, how does X work…. He could only come back with “its best practice, why would you want to go against that?”
On FCoE. I like how cisco released UCS with all the DCE and FCoE magic safely contained inside. The standard take a while to get fully adopted then it takes more time for the interop bugs to be worked out. Later when DCE/FCoE is more mature and ready for prime time, it can be let out of the UCS walled garden. No need to wait for all the FCoE/DCE love, you can get all the goodness right away with UCS. savings right now (for largish installations) plus a good tech roadmap. who else offers that?