Provoking IT from Good to Great
Intel, UCS and vSphere: 10 advances since 2005, and why they matter

I remember when things were simple
I remember 2005 when HP brought out the excellent DL585s that had the NUMA chipset and non-FSB architecture. Things have changed significantly since then by getting better though more complicated. Here’s a layman’s description of what happened in the past four years.
Back in 2005…
…ESX2 ran really well on that old DL585 and customers were getting impressive returns on their investment through consolidation ratios up to and beyond the 40:1 guest:hosts. It seemed like everyone in London was buying them, they were awesome.
A minor complication when installing ESX2 on the DL585: I remember you had to go into the BIOS when installing ESX2 to decide whether to use it as a NUMA or non-NUMA machine. Something about “Intervleaved Memory” if I remember correctly. Do you know the difference between NUMA / non-NUMA? How will it affect your virtual machines? Plus HyperThreading? Are you sure you want to turn it on – to change it back means powering off the machine… the communities were full of questions about “how do I set it?” “how do I know if it’s been set” “what happens if it is (not) set?” etcetera.
But that was about it. Nothing more exciting than that. Check that memory was evenly installed across all four nodes, then get out the ESX2 installation CD.
ESX2 was nice and easy to understand too: there was the strict co-scheduling; no DRS; no HA. Life was simple. Even I could understand it. We talked about Farms and the MUI (pronounced moo-eee). Customers were glad to see me, back then. How times change
Fast forward to 2009…
…and it’s all a bit more complicated: it’s better, no doubt about that, but more complicated and more dynamic, less static. Nothing to worry about as long as you are diligent, have common sense, and believe that the Devil in the Detail. If you’re slap-dash and couldn’t-care-less, then I predict you’ll be one of those people back on the communities, shrieking “how do I…” “what’s the impact of…”. Tsk, tsk.
Take the Cisco UCS hardware: it is stateless, simple, designed for virtualization – right? Don’t be deceived by appearances: inside lives a diabolically clever bit of kit called the Nehalem Xeon 5500 CPU from Intel. This thing does so much more than the old CPUs, and here is where the Devil lurks and needs to be tamed, especially when you mix in UCS and ESX4:
- The CPU can do virtualization: that wasn’t possible in the old days. Intel’s VT-x gives ESX another option instead of Para-virtualization and Binary Translation. You need to turn this on in the BIOS, though
Don’t worry if you forget, ESX will take care of it and default to Binary Translation. - Memory Management Unit is virtualized: instead of using VMware’s shadow page tables, this can be done in hardware. There are some times you don’t want this, but again ESX will take care of the decision for you. Don’t forget to turn on EPT in the BIOS though
- The CPU can speed up and down: TurboBoost delivers power “when needed”. Hmmm, how does that affect performance, or at least predictability of performance? You can choose to turn this on or off in the BIOS. How does that affect your watts/server/rack/cluster?
- You can run concurrent threads with HyperThreading: the old dilemma was “turn it on or off?” Nowadays the VMware performance geniuses say “leave it on, we’ll work it out for you” – so don’t forget to turn it on in the BIOS.
- You can run concurrent processes with MultiCore: six core CPU sockets are now the norm. So instead of a 4 core server in 2005, we can have 24 cores, all with HyperThreads, in 2009. 4, 8, 16, 32 -> I guess that’s in Moore’s Law vicinity.
- The CPU can change the way it looks with FlexMigration: this works with vSphere’s Enhanced vMotion Compatibility (EVC) to relax the vMotion rules between different CPUs (by masking all but the common CPU features in a cluster). Turn this on in the BIOS. <- this is important, it means you can introduce these new CPUs into clusters with old servers and old CPUs and still use advanced 5500 performance and features. Smart! I told you!
- vSphere can turn parts of itself off using Distributed Power Management: when everyone goes home at 5pm on a Friday, the systems idle back to < 10%, so vSphere uses DRS to shrink down the workloads to a few hosts and powers the unused ones off….
- Cisco’s UCS plays tricks on the Intel CPU: by inserting a special Cisco application specific integrated circuit (ASIC), you can get 384GB per blade!
- SAN and LAN goes over the same NIC: the new Converged Network Adapters use the new Fibre Channel-over-Ethernet (FCoE) industry standard.
- The new CNAs are virtualized: the new Cisco Palo CNA can present virtual NICs and virtual HBAs to the operating system, such as ESX.
Why you should care about these “low-level details”
All of the above need a human to decide whether they should happen or not, by either configuring something or ordering something. The human doing the decision needs to know the impact of their decision on their virtualization solution, for two reasons:
Firstly, that most people clamour about is performance, chiming in unison: “What’s the optimal engineered configuration for maximum performance?”
Any performance engineer worth their salt will snort in derision at a vague statement like that. Performance of what workload? Use the best default for running vSphere (turn all the features on, let ESX work it out – after all, the VMware and Intel engineers have tested it to death already for you). If you insist, then decide whether the cost/benefit works for you to test for corner cases where the default is not desirable.
Secondly, all these features mean your system is not static anymore: it’s completely dynamic.
If you turn on all the features as a baseline configuration across your clusters and let the hardware and software work together to best to run your workloads, then life should be quiet for your operations guys. If you have different ESX admins running around applying inconsistent configurations across the cluster, then life will be exciting and communities.vmware.com will be in your bookmarks.
Consider the person doing the capacity management (you do have someone collecting data, forecasting, modeling and reporting, right?): they will be confused by the data if the configurations are applied incorrectly and inconsistently. Missing or poor capacity and configuration management has a direct impact on return on investment because you get less guests:host or cluster, and it costs you more to manage it.
The moral of the story is, regardless of your role in virtualization – whether an architect, and admin, an ops analyst, or capacity manager (or whatever), you really need to know about the above features and what they mean to your view of virtualization.
References
- Read my article on how clever ESX is when it comes to hardware features.
- Rodos is the first guy to use UCS in Australia and has posted lots of screenshots.
- Steve Kaplan wrote a great article on How vSphere, Intel services and integration tools create a ‘triple threat’.
- Read about the Intel Xeon 5500.
- Read about the Cisco UCS blades.
- A straight-to-the-point technical overview of UCS.
- Read about vSphere features.
- Learn more about the new industry standard Fibre Channel over Ethernet (FCoE).
- Scott Lowe has written some fantastic stuff about UCS, including his UCS training notes plus discussion around FCoE, on his blog.
Related posts:
| Print article | This entry was posted by Steve Chambers on 20 August, 2009 at 00:20, and is filed under Operations, UCS, vSphere. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site. |

about 1 year ago
Steve, check my post (http://rodos.haywood.org/2009/08/vsphere-on-ucs-screenshots.html) which shows the actual BIOS screen shots where many of these features can be enabled.
Also not that these BIOS settings currently not part of the service profile for the server. I think you would agree that this needs to be addressed.
Great stuff.
Rodos
about 1 year ago
@Rodos
Hey chief, I might steal… no, in fact, I am going to steal… that picture (I’ll link back + link to your article). Great work!
As for the BIOS setting, that’s in the system to be done… I’m checking on the priority of it