Long Distance Disaster Recovery?

October 10th, 2007

Know What Virtualization Is, But What Is Next? - Chapter 04

To be honest I have been avoiding this subject. Long distance disaster recovery is one of the weak points in virtualization for a number of technical, and a few financial reasons. For example, some of the technical reasons often sited in reference to long distance recovery solutions:

  • Every single instance archival/restore solution available today does not scale well
  • Are vendor specific
  • Require extensive bandwidth
  • Limited vertical scale (imaging issues)
  • Scale horizontal (DASD allocation issues)
  • Hard to manage, monitor and control
  • Inflexible once implemented 

For example the financial reasons often sited in reference to long distance recovery:

  • Infrastructure that is under utilized/idle
  • Require network bandwidth over distance

As with any solution, or should I say situation? The world is changing, and vendors are running after issues, cough, dollar signs, VMworld 2007 was no exception in this regard, the hot topics for the super-sessions, meaning attendance was in the multiple 100s per session, were centered around disaster recovery, site management, and to a lesser degree image scaling. Unfortunately, the concept of total scale is still not quite enterprise level for my taste; for example, every single solution for disaster recovery presented would have issues beyond 250 host servers, or more than 1500 virtual instances, and anything at 10 or more of actual allocated terabytes of DASD per site. Wait, wait, some are saying already, that it is pretty big corporations that are doing that!  Well, in point of fact, everyone doing virtualization is looking at more not less virtualization, so realistic scaling for disaster recovery solutions over long distance should be looking to support 1000 or more virtual hosts, and at least 10,000 or more virtual instances across an enterprise, and distances should not be in the 10s or even 100s of miles, but in the 1000s of miles. Yes, 1000s of miles. Think of real disasters! A Disaster recovery site should/could be 1000 miles a way, even on a different tectonic plate if possible, or in other words, how far can a hurricane travel of over land, not far, but the flooding and storm damage is often 100s of miles in land. Archival/restore options per virtual instance just do not scale, thus storage array based methods will dominate the virtualization industry, and this is not a predicative comment, but a fact. All the vendors applicable know this, never mind the fact that we, as clients of virtualization have been yelling about this for the last 2 to 3 years.

But I digress, for long distance recovery, as a concept, to explode in a positive sense, regardless of where it is situated, a few things need to happen beyond the scaling discussion, these include the following:

  • Standardized use and implementation of image scaling
  • Standardized use of thin-disk methods for DASD allocation
  • Standardized use of storage-array level snapping, cloning, etc.
  • Convince management that no matter what is done, bandwidth will be required, maybe even a dedicated storage area network, cough, cough

I am not going to explain the bandwidth issue, it is obvious, very long distance disaster recovery models need bandwidth, and of course no one wants to hear that, but it is true. Storage Array networking is needed to implement a number of emerging virtualization technologies, welcome to emerging virtualization life. Moving on… standardized use of anything is good, from a practical perspective, so we eliminate one of our big issues, vendor specific implementations. We want storage array snapshots compatible just like SCSI is compatible today right?  Or better if we can get it. We want to have NetApp in one site migrate storage-array snapshots to EMC, for example. Don’t laugh, it will happen, but the storage array vendors don’t like the idea. This cross vendor model also would address migrations to newer platforms and different models of implementation. Image scaling, which is not here yet to any realistic degree, is the idea that DASD that is really read-only in concept, is leveraged. For example, 90 plus percent of the operating system foot-print in a virtual instance is static, so I should be able to use it over and over per instance, and then only the DASD that actually changes per virtual instance is isolated per instance. Dang, does this sound like a container model? It should! Image scaling, combined with thin-disking methods, where the operating system thinks it has 100GB for data, but the actual partitioning on the storage-array is what is needed plus a growth factor offset, and only unique data is DASD growth. For example, if the given instance is only using 20GB, it really only has a 20 plus GB footprint on the storage-array. This reduces the cost factors and allows for better utilization of resources for DASD, which should make the accounting geeks happy. Did I really just say accounting geek?

For those keeping score, about long distance disaster recovery, not the geek name calling, the last issue is monitoring, management, and control. Well, that is the real kicker, without universal standardization of storage-array models across vendors, at least all significant vendors, such a tool or methodology is lacking. Well… Actually… That is changing, but is in the embryonic stage at best. VMware and other virtualization vendors will follow soon, has a new toy, VMware Site Recovery Manager (SRM). It aims to solve this key issue that plagues our topic of discussion, the lack of monitoring, control and management. However, since VMware SRM does not employ thin-disking or image scaling as yet, the opportunity for someone else to snake this market niche away from VMware is obvious, no?

, , , , , , , , , , , , , , , ,

Entry Filed under: A Proper Virtual World, Virtual System Management

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Calendar

October 2007
M T W T F S S
« Sep   Nov »
1234567
891011121314
15161718192021
22232425262728
293031  

Most Recent Posts