High Availability and Fault Tolerance Part Two

In my last post on High Availability and Fault Tolerant servers (HA/FT) we talked a little bit about redundant power, meaning you have more than one source of electricity to run your servers. But there are numerous other internal threats that can cause unplanned server outages.

After backup power the next level of redundancy comes in your servers themselves. Most server class machines have numerous redundant components built right in such as hard drives and power supplies. This means that right off the shelf, these systems have some level of Fault Tolerance (FT) built in. This can keep application and data available when a component fails. However there are still numerous threats that can cause unplanned outages. This happens when non-redundant components fail, or when multiple components fail.

Remember that High Availability means that if a virtual or physical machine goes down, it will automatically restart and come back online. Fault Tolerance means that multiple components can fail with no loss of data and no interruption of application availability.

To take HA/FT to a higher level we can turn to one of several products available on the market. Products from companies like Vision Solutions (Double Take) can provide software that allows you to create a stand-by server. More sophisticated products from VMware and Stratus allow you to mirror applications and data on identical servers using a concept known as lock-step. Lock-step means that applications and data are being processed in real time across two hosts. With these products multiple components or an entire server can fail and your applications continue to be available to users.

With Double Take Software from Vision Solutions, IT staff can create a primary and standby server pair that replicates all of your data to a stand by server in real time. This is a sufficient solution for most small to medium enterprises. However, if the primary server fails, there is still a brief interruption in application availability while the failover to the standby server occurs. In special situations that require the highest levels of High Availability and Fault Tolerance we turn to solutions from VMware or Stratus. This provides a scenario where multiple components can fail on multiple servers and your application will continue to run.

Determining which approach is right for you is really an economic decision based on the cost of downtime. If you can’t put a dollar value on what it costs your business per hour or per day when a critical application is unavailable, then that application probably isn’t sufficiently critical for you to spend a lot of money on an HA/FT solution. If you do know what that cost is, then, just like buying any other kind of business insurance, you can make a business decision as to how much money you can justify spending to protect against that risk of loss.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.