Switch to SAN but BE CAREFUL!

I am a big fan of Storage Area Network (“SAN”) technology. SAN technology offers high performance, highly flexible storage for Physical and Virtual Systems and is particularly valuable to empower advanced features for virtualization solutions such as VMWare, XENServer, and Hyper-V Virtual Hosts to name a few. 

The moral of this story is that you need to know that SANs have some very different management requirements compared to traditional Direct Attached Storage (“DAS”).  Many if not most IT Staff in small to medium size organizations are most familiar with throwing hard disks into physical servers in single or RAID configurations.  Now that they are are purchasing and deploying SAN technology we find that many problems occur if they do not fully understand how to implement and manage these new beasts. 

I would suggest that the difference between DAS and SAN technology is similar in spirit to the difference between a passenger car and a high performance race car.  Most passenger cars are designed to be slower and more forgiving than race cars, if you get into trouble it’s much easier for the average driver to recover.  Race cars will do many things that passenger cars won’t, but if you push them too far you will require very specific skills to recover or you will spin out of control.  SANs are in my opinion similar to this analogy, they will do many things that DAS won’t but if you don’t design and manage them properly you may quickly find yourself in spinning out of control. 

Here are a couple of things you need to know.

  1. Most popular SAN’s offer some impressive features such as “Thin Provisioning.”  Thin provisioning is one of those features that you will absolutely love, that is, until you don’t.  Thin Provisioning is a feature that allows you to provision gobs of storage – more, in fact, than you actually have – to Physical and Virtual Systems.  For example, you might have a SAN with 2 terabytes of physical storage, but then “provision” 10 individual 2 terabyte “volumes" and present them to your physical and virtual servers.  Your servers will see this as a combined total of 20 terabytes of storage. This is great but requires you to be very careful.  The reason is that you have offered 20 terabytes of “Virtual Storage” but of course you really only have 2TB of actual or “physical” storage.  So while your systems believe there is 20TB of available storage you have to insure that you do not attempt to put more than 2TB of physical data on this system or bad things will happen. 

    What bad things? Well in most cases once you fill up the physical space the volume will become unusable.  You must make sure you carefully monitor the system and pay close attention to it so that if you start reaching capacity, you’ll know in time to do something about it.  Our own SANs alert us when total available free space reaches 20%.  We have from time to time been between 70% and 90% utilization of our total capacity, and while everything is running just fine we know we have to watch this carefully.  If we hit 100% (intentionally or otherwise), the result could be catastrophic: the storage could immediately shut down and the recovery might not be quick. 

    This happened to a client last week. They started a backup, and when they realized that this was causing one of their thinly-provisioned SAN volumes to fill up to capacity, they immediately stopped the backup and deleted the data.  Unfortunately this caused the volume to fill to 100% and immediately shut down.  Fortunately this volume was mirrored to a second SAN and since there was a little more room on the mirror volume the workload actually stayed online until we could assist the client in recovering from this problem.  If the mirrored side had filled to 100% as well, the result would have been an immediate failure of a critical SQL server workload. 

    The client didn’t realize that copying all this backup data to a Windows Server VM would fill up the SAN volume, nor did they realize that simply deleting that data from the VM would not necessarily delete it from the SAN.  We were able to configure a new volume and replace the full half of the mirror with an empty volume, but only because their was some un-provisioned space available to use for this purpose.

  2. And that brings me to my second point – which is that deleting data from a Physical or Virtual server that is using SAN storage may not in many cases free the physical space on the SAN.  Again, DAS is like the passenger car, if you over-commit your storage you can simply press the brake (a.k.a. delete key) and recover immediately. With SAN storage it is common that once the space is committed you may have to perform specific tasks to free up this space.  This task might require special tools or utilities, or even require adding more physical disks.  In extreme cases it may require that you remove and recreate the over-committed SAN Volumes, and that is time consuming and painful.  In this scenario an ounce of prevention is worth a hundred pounds of cure. 

    We have been running our SAN over-provisioned for years with no ill affects or degradation in performance…but we know we are “on the bubble” and we carefully monitor and maintain this situation.

So please be aware that SAN technology is wonderful, but its important to learn some new skills to keep the systems performing their best.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.