23. May 2009 · 5 comments · Categories: Barriers
These are my boxes, not yours, MINE!

These are my boxes, not yours, MINE!

Imagine the scene: you’ve just analyzed the latest Capacity Planner results and have a number for Total Storage Required for both consolidated servers, and your projection for net-new over the next 24 months.  The number might as well be expressed in Terror-Bites, because the Storage Manager, who owns the storage, is giving you a funny look.

  1. We don’t have any space for you / it will be available in six months
  2. We only have space for you on Tier 2 storage
  3. It is going to cost you $$ per GB
  4. You can only use Fibre Channel
  5. Your LUN sizes have to be 33GB each
  6. Common problems with storage
  7. Working with the Storage Manager

In fact, I’ve seen a trend over the years where VCPs are buying their own storage and plugging it in themselves as a way of overcoming this barrier.  Is this the right way?  If it gets you the payback withing the timeline you stated in the business case, with acceptable operational risk, then why not?  Be warned: if you want to run Tier 1 applications on your vSphere, then those business owners might demand that the storage is managed by the storage team: you have been warned!

We don’t have any space for you / it will be available in six months

When you write your business case for virtualization, you work out the TCO, then the ROI, and then the timeline for the payback.  In the payback will be a Break-Even point: before that point, VMware is Too Expensive, and after that point it is all upside!  But does your payback timeline take into account the availability of storage?

This barrier assumes that your organization has existing SAN facilities which your purchase per GB.  Ie. you fund the purchase of HBAs and an amount of storage capacity, and then the Storage Management team provides the rest (cabling to their switches, managing their fabric, single initiator zoning, and creating the meta/LUNs).

Without shared storage from your Storage Management team, you can’t do vMotion, DRS or HA.

  • vMotion = no planned downtime (improved availability = $$)
  • DRS = active workload balancing to increase guest:host ratio (improved ROI = $$)
  • HA = reduced unplanned downtime (improved availability = $$)

So, if the Storage Management service won’t be ready for six months/too long, then what to do?  Here are some ways around this:

  • Offer to buy the array for them out of your budget, but they still do the rest of the service.  You now receive a much lower service charge because you are providing the large CapEx item, and they are providing the fabric and storage management.  Making this work financially might be a challenge :-/
  • Buy your own fabric and array, but let the Storage Management team deploy and run it.  This means you pay the CapEx, but a drastically reduced Storage Management service charge.
  • Buy and run your own fabric and array.  This is taking your Storage Management team out of the running completely.  Watch out for politics, but if the business case supports it, like this is the easiest way to payback, then it might be very appealing.

We only have space for you on Tier 2 storage

Tier 1 storage is the top-line, active-active arrays that are very high performing: think Symmetrix.  Tier 2 storage is often described as “cost-effective”, though it is still very capable and not all that cheap: think Clariion.

What capacity, availability and performance characteristics do you require from your storage: have you defined these in your business case, and your design?  Consider:

  • If you have run a Capacity Planner project, you should have a projection for storage capacity requirements for consolidated servers, and a projection for net-new, future deployments.  Will this fit into a Tier 2 array?
  • What availability characteristics are required by the applications running a-top vSphere?  If they require 99.999% uptime, does a Tier 2 array offer that?  Will you need to deploy SRM?  Does your Tier 2 support SRM?
  • What performance characteristics are required by the applications running a-top vSphere?  For individual applications like Oracle OLTP, but what about the impact of mixed workloads?

Tier 2 storage is very capable, but check that it meets your requirements and your budget – obvious, I know, but maybe can live with Tier 2 for now and plan for Tier 1 storage next?

It is going to cost you $$ per GB

This can be a real TCO/ROI killer.  If you are getting your storage from your Storage Management team, they might price it to you in $/GB, and might add fabric costs on top of that.  What costs might there be?

  • One-off set up costs for the fabric – does this include HBAs, fabric ports, what about if there’s no free ports and another switch is required – do you have to buy that?  Does it include the man hourse to configure all of these things?
  • If there is no space on the array(s), do you have to purchase a new array?
  • Does the $/GB include backup and recovery?
  • Is Thin Provisioning used, and does this reduce the $/GB?

So, the initial $/GB needs to be investigated in deep detail but beware this might be a real barrier to your TCO/ROI case.

You can only use Fibre Channel

There are three ways to use shared storage with vSphere:

  1. NAS / NFS – the cheapest option, works great for many workloads.
  2. iSCSI – if it’s possible in your organization, this might reduce the $/GB
  3. Fibre Channel – ubiquitous, performant but usually highest $/GB

Back to the business case, do you need Fibre Channel?  What are your availability, performance and capacity requirements?  If you have Fibre Channel requirements and Fibre Channel budget then all is good!  But if you only have NAS money, then…

Your LUN sizes have to be 33GB each

There used to be a great picture doing the rounds of two cavemen playing tug-o-war, with the caveman on the left shouting “SMALLER LUNS” and the caveman on the right shouting “BIGGER LUNS”.

The practice of 33GB LUNs is fine for Windows OS, but not for a datacenter OS that aggregates compute, storage and networking.  The size of LUN you require to put a VMFS datastore on it is not 33GB and instead is influenced by a number of factors:

  • What size of disks will the VMs be putting on the datastores, and will you be separating OS disks from Data disks?
  • How many VMs will share one datastore, and how many hosts will share each datastore.  Think of metadata actions (snapshots, etc) and LUN locking.
  • What kind of backup/recovery solution do you have – will you be mirroring LUNs?

These are just a few, but typical LUN sizes go from 300GB to 1TB, depending on the above and other local factors.

Common Problems with Storage

If you have got as far as paying for your storage, and it’s up and running – I would bet a pint of Tetley’s that you experience some performance problems, and here are some common problems and resources:

  • Incorrect fabric zoning.  You *must* use single-initiator zoning for ESX – that means, for every HBA (remember, for one server, each HBA is on a different fabric – so HBA0 is on Fabric-A, and HBA1 is on Fabric-B) you need to create a zone for that HBA to both storage processor targets.  If you don’t do this, your HBA will see bus-resets from other SCSI devices and performance will be horrible.
  • Incorrect fail-over policy.  All hosts in a cluster that connect to the same array *must* have the correct and consistent fail-over policy applied, be it Most Recently Used or Fixed Path.  If you get this wrong, you are at risk of things like “LUN Thrashing” and performance will be horrible.
  • Incorrect Array Configuration.  Array vendors have weird and wonderful configurations to correctly work with ESX.  It might be setting an ESX device access type to Linux, or setting “Option 3 to A” – whatever that means!  Always check the array vendor docs and see a reference architecture (EMC are good at this).

The only way to avoid these is to work closely with your Storage Manager from day 1.

Working with the Storage Manager

When I work on virtualization projects, the first thing I do is call a meeting of all heads of department and walk them through the business case, show the executive support, and what is required of each of them.  Then they allocate me a subject matter expert in their team, and the show is on the road.

In the case of the Storage Manager, here’s what I’d do from start to finish:

  1. Check the business case details – $/GB, characteristics, timelines.
  2. Check the Bill of Materials – are we ordering the right kit, there’s no overlap, does the budget work, how’s the timeline?
  3. Check the Technical Design – from VM, through ESX, through the Fabric, to the Array: get the teams together and get it right.
    1. Read  Fibre Channel SAN Configuration Guide
    2. Read  iSCSI SAN Configuration Guide
  4. Check the Architecture & Test Plans – how will it be implemented, how will it be tested?  Test for availability, capacity, performance – fill up those LUNs and see what happens ;-)
  5. Check the Operational Run Book – who will do what and when?  Creating new VMDKs, creating new LUNs, fault isolation and the ticketing system.
  6. Now you are ready to build it, test it.  What is required for sign off?
  7. How much on budget are you?  Is the business case still correct?  Do you need more moolah?

If you work with Storage Managers from the start, then you have a great chance of overcoming this barrier.  You can always go it your own, but in a large enterprise that might be controversial: it depends on your organizations culture, and your realistic appraisal for whether you have the money, skill, experience and time to manage your own storage.

Related posts:

  1. Virtualization Barrier #1: Manual Processes
  2. Virtualization Barrier #2: It’s just for Test and Dev
  3. Virtualization Barrier #4: The Network Engineer
  4. Virtualization Barrier #3: VMware is Too Expensive
  5. Virtualization Barrier #6: The Application Developer

5 Comments

  1. John Gannon says:

    Steve — do you see any tools on the horizon that will help make storage management less painful? With cloud computing on the horizon, I imagine the pain around storage and it’s relationship with virtualization will intensify. Would be interested in your thoughts.

  2. This is something that is getting better with improved visibility from the vCenter end, and also better visibility from the storage end with technologies like NPIV where you can identify specific VMs traversing the fabric and accessing LUNs. I think it will be even better with Cisco UCS as the different protocols are collapsing onto the same ethernet pipes. For fault isolation I’d go for something like Splunk to get complete correlation from end-to-end. It’s worth asking Chad Sakac and his team?

  3. [...] Chambers – Virtualization Barrier series… (1, 2, 3, 4, 5)An old mate from my web hosting days has “CAT5? tattooed on his arm and a head shaped like an [...]

  4. [...] Chambers – Virtualization Barrier series… (1, 2, 3, 4, 5)An old mate from my web hosting days has “CAT5? tattooed on his arm and a head shaped like an [...]

  5. [...] Chambers – Virtualization Barrier series… (1, 2, 3, 4, 5)An old mate from my web hosting days has “CAT5? tattooed on his arm and a head shaped like an [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Spam protection by WP Captcha-Free