Recently, someone pointed out to me that it was high time for me to write a follow-up to my “What Will VAAI Do for You?” posting. Written shortly after the launch of VMware vSphere 4.1, it is my most-read post on the Everything VMware at EMC Community.
Since I wrote it, though, VMware has released vSphere 5, which (among a slew of other great features) includes version 2 of VAAI, now called the vSphere Storage API for Array Integration. (You’re not mistaken: they’ve added a word to the name but kept the acronym the same — the acronym has achieved wide-spread common usage and changing it would only cause confusion.)
VAAI allows ESXi servers that use VAAI-enabled storage to work more effectively with storage. In most cases this means offloading storage-related tasks from the server to the array, but there’s more to it than just that, as I’ll explain in this posting.
VAAI v1 Review
In vSphere 4.1, VAAI was only supported for block-based datastores. Specifically, this meant the features only worked on VMFS filesystems or raw data mappings (RDMs) that the ESX(i) server connected to using one of the block storage protocols (Fibre Channel, iSCSI, or FCoE). There were no VAAI features that worked with NFS-based datastores.
VAAI v1 had three features:
- Full Copy
- Block Zero
- Hardware-Assisted Locking
Since I wrote these up in detail previously, I’m going to take the lazy man’s way out and suggest that you read my original What Will VAAI Do for You? posting to get more information on these three features.
VAAI v2 Overview
Version 2 of VAAI, running on vSphere 5 expands support for block-based storage, adding two new features (Thin Provisioning Stun and Thin Provisioning Block Reclamation) to the original three, further improving ESXi’s effectiveness with VMFS and RDMs.
Additionally, VAAI v2 added 3 new features for NFS-based datastores:
- NFS Full Copy
- NFS Extended Stats
- NFS Reserve Space
This post cover the new block storage features in detail below. The new NFS features will be covered in a separate posting titled “What Will VAAI v2 Do for You? Part 2 of 2: NFS“, which will be published on Monday 26 March.
VAAI v2 for Block Storage
The original three features from VAAI v1 are still here, working pretty much the same way they always have. The only real change to these features has been adding some consistency to the naming of the features, as follows:
- Full Copy is now called Hardware-Assisted Copy
- Block Zero is now called Hardware-Assisted Zero
- Hardware-Assisted Locking’s name remains unchanged
The functionality of the features and the benefits they provide remain unchanged; only the names have changed.
Thin Provisioning Stun
Thin Provisioning is a great storage feature that allows for better overall utilization of storage and reduction of storage costs. It does, however, carry with it an increased chance of encountering out-of-space errors. The severity of an out-of-space error is entirely dependent on what you’re doing.
For example, if a fileserver encounters an out-of-space condition, it keeps running. It can’t write any new files until some space becomes available (either by files being deleted or the filesystem being expanded), but it continues to serve the files that were saved previously.
Databases, on the other hand, do not react well to out-of-space conditions. The main concern of the database application is the integrity of the data itself. The database application won’t knowingly do anything that could compromise the integrity of the data. So, when a database attempts to write data and receives an out-of-space error, some databases may switch to a “read-only” mode to ensure they don’t inadvertently cause any problems (this mode-switch, however, will most likely cause problems for the applications that depend on writing to the database), but most will simply shut down that database instance figuring that not doing anything at all to the data is the best way to ensure not messing it up.
The thing is, when using shared storage to serve multiple thin-provisioned volumes from the same pool, out-of-space errors can often clear themselves up fairly quickly. The condition may clear because another volume in the pool frees up some space, or because the storage array needed a moment to automatically add additional extents to the particular volume in question.
Most problems with thin-provisioned volumes occur because the application is unaware that it’s running on thin storage (which is sort of the point of thin provisioning). Because storage has told the application that the datastore as a 500GB volume, the application takes its word and believes that it has 500GB to work with — even if storage has only physically allocated 200GB for the datastore at the moment.
This is where Thin Provisioning Stun comes in. Thin Provisioning Stun (TPS), when used with a TPS-capable array, allows the ESXi server to be aware that it’s using a thinly-provisioned datastore. Individual VMs can be set to “dodge” out-of-space errors. When the VM is about to commit a write that would cause an error, the VM freezes as-is (is “stunned”) and raises a warning. When the condition clears up, the VM unfreezes (resumes) exactly where it was, as if nothing had happened. (This is part of the beauty of virtualization — this ability to “freeze it exactly as-is”, down to the state of the memory, is made possible by the fact that this is a virtual machine.)
Let me walk through a hypothetical example. Let’s say that I’m running a SQL database for an order-taking system. The database runs in a VM that’s on a 100GB thin-provisioned datastore. The datastore has 50GB of actual space allocated to it at the moment, of which 49.999GB are used. There isn’t enough space to write the data for another order.
Without Thin Provisioning Stun, the next order that comes in will:
- Get lost
- Cause the database to shutdown, requiring manual intervention before my site can take more orders
With Thin Provisioning Stun, that same next order would “stun” the VM, keeping the order data in memory. My storage array could then have time to add new extents to the datastore, at which point the VM would resume function, having preserved my customer’s order. Yes, the order-taking database experienced a brief “hiccup”, but most business owners would agree that having to display a “Sorry, we’re momentarily unable to take new orders, please try again in a minute” message is better than losing an order that a customer has taken the time to enter.
Thin Provisioning Block Reclamation
There’s been a lot of confusion about this particular feature, so I’ll do a two-part explanation: “The Idea” and “Current State of the Feature”.
We were just discussing thin provisioning above. This feature is designed to further improve the efficiency of thin-provisioned storage for vSphere. If multiple thinly-provisioned datastores share the same storage pool on the array, the different datastores don’t always play nice with each other.
The goal is that space is available from the pool for whichever datastore might need it now. When that datastore no longer needs the extra space, it should be freed up — the blocks returned to the list of those available for use by other datastores.
In practice, this isn’t what actually happens. What actually happens when a file is deleted on a VMFS filesystem is that the blocks the file used are “zero’ed out” — overwritten with zeros. This erases the file, but keeps the space as part of the datastore. It’s possible to reclaim the space for use in the thin provisioning pool, but it’s a manual process that must be performed on the storage array. The difficulty of the process and the amount of time it takes will vary based upon your storage vendor (providing, naturally, that your array has a thin provisioning space reclamation feature).
The idea of the Thin Provisioning Block Reclamation feature is that, when a file is deleted, VMFS would simply un-map the blocks, freeing them to be returned to the storage pool immediately (provided that the feature is turned on and that you’re using an array that supports the feature, naturally).
Current State of the Feature
As it turns out, it’s not actually an easy thing to implement this particular feature — it requires vSphere to work with storage at even deeper levels than the other VAAI features do.
Shortly after the release of vSphere 5, VMware realized that the feature wasn’t everything it needed to be just yet and withdrew support for Thin Provisioning Block Reclamation, advising that customers disable the feature. (The feature itself worked, but was having a negative performance affect on arrays, causing some operations like Storage vMotion to time out.)
So, to be clear, this means that — as of the time of this writing — no storage vendor supports the automated Thin Provisioning Block Reclamation feature (despite what I hear some of them are telling customers), as VMware doesn’t support it themselves.
That said, VMware just released vSphere 5.0U1. This release does a couple of things related to this feature. First, it makes having the automated Thin Provisioning Block Reclamation be disabled by default (normal default settings for vSphere are to enable all VAAI features), saving customers from having to reconfigure it themselves.
Second, it adds back partial Block Reclamation functionality.
Partial functionality? What does that mean?
In this case it means that, as a manual process, you can have vSphere issue a
SCSI UNMAP command (it’s an option to ESXi’s
vmkfstools command). VMware recommends only doing this during a maintenance window as the process has an adverse affect on the I/O performance of the datastore.
Also, you need to be careful with the command. You pass the command a numeric argument. That number is interpreted as the percentage of free space that you want vSphere to reclaim. Details on running the
vmkfstools command to perform the
UNMAP can be found in this VMware KnowledgeBase article.
Until Next Time…
That sums up the five block storage features of VAAI v2:
- Hardware-Assisted Copy
- Hardware-Assisted Zero
- Hardware-Assisted Locking
- Thin Provisioning Stun
- Thin Provisioning Block Reclamation