When it comes to OpenStack, the open source cloud platform, a lot of folks are wondering about disaster recovery. That is: If a disaster takes out a data center, how can an OpenStack private cloud or public cloud come back to life? This topic, discussed at the OpenStack Summit today in Portland, Ore., did not go into individual device failures. Rather, this was a true full-blown disaster discussion, led by Michael Factor, a distinguished IBM (NYSE:IBM) engineer.

To get started think about:

  • What data needs to be copied, how often and to where?
  • How are you going to recover the infrastructure (where?) and the data? Is the back-up infrastructure operational all the time or just on-demand?
  • Recovery Point Objective: The closer you are to zero the more expensive it is. Zero means you've been copying the data in real-time from your primary to backup location. "Some workloads require RPO of zero, others don't."
  • Recovery Time Objective: This is how long it takes you to recover. Here again, if your RTO is zero -- that is, you are recovering in real-time -- your expenses go through the roof.
  • Think about Application Consistency.
  • Storage Systems Support: Think about a storage controller. "Most storage sub-systems support synchronous replication." This enables an RPO of zero, he noted. Asyncronous replication means there will be some data lost. Also, check out consistency groups -- which are essential for both asynchronous and synchronous subsystems. 

That's the background. So how can CSPs and enterprises ensure they have good disaster recovery capabilities in place for OpenStack? There are three approaches:

  • Generic approach: This involves scripting on native OpenStack mechanisms. He mentioned a range of pros and cons, involving consistency, and a fairly high RPO. There can also be a high RTO. (Note: Do your homework; I'm paraphrasing his comments.) Consistency can be a big challenge here. Complexity is moderate. 
  • Application-Specific Approach: In this case the assumption is the application knows how to provide DR for itself. For instance, databases can ship a log -- a copy of itself -- to a secondary site. OpenStack really doesn't get involved. The Admin manages the switch-over to the back-end. It can give good RPO and RTO and good consistency. But it depends on the application, and it works only within the confines of that application. Complexity is moderate; it depends on the details of the application. 
  • Storage Approach: For instance, you can use pre-pooled volumes and set up a copy relationship (synchronous or asynchronous) between them. He mentioned another approach but, true confession, I was taking photos. This approach is typically used by banks for financial transactions that require no loss of data. But they can be fairly complicated to manage, and there's no way to integrate them today in OpenStack. "It's not for the faint of heart to put together," he said.

One thing not covered: Third-party backup and disaster recovery tools built specifically for OpenStack or those that interoperate with OpenStack. One example, I think, might involve RightScale but I need to check on that.