Who actually enjoys thinking of backup plans when working with an organization’s data? There is additional management overhead in configuring the backup jobs, monitoring them, and looking into issues when you get alerted of failures. Then you need to maintain your backups as the environment grows over time, the backup plans you configured 6 months ago may not be the best way to protect your environment now. The environment tends to take a performance hit when backups are taking place, so after hours schedules are normally used. Disaster recovery, off premise, archives, replication, tape… not to mention coming up with resources to handle all of this, who wants to pay for a backup plan when everything is running just fine?! All this work and if everything goes well you’ll never need to restore!
How to do you convey the risk to management when they’ve never felt the burn of a failed RAID controller, Host failures, extended power outage, or hurricane?
It’s not fun, we get it. You know what else isn’t fun? Scrambling to get a server online after data corruption, looking for identical hardware to restore to, or sending workers home after hearing it’s going to take 4-6 hours before services are available again.
I’m speaking from experience here – going through that is the worst. Talk about a completely helpless feeling for all parties involved.
We can all agree we want to avoid unplanned downtime, but how do we start the conversation to design a backup strategy?
RTO and RPO are a great place to start. These can be defined large-scale or more granularly according to how critical a service is for business.
RTO, or Recovery Time Objective, is the target time you set for the recovery of IT and business activities after a disaster strikes. Another way to describe this: “our business can survive 4 hours of system down time” or “our RTO is 2 days.”
RPO, or Recovery Point Objective, is the amount of data loss you’re able to accept in order to restore the most recent recovery point. In other words, this helps define how often you’re creating recovery points.
RPO and RTO are two key metrics that will help provide a goal to work towards when defining how much you need to invest into your backup and recovery strategy. Another item is retention, or how long you’d like to hold on to recovery points. There is no wrong way to do this, each company is different. Some have compliance standards they must adhere to; others have services that need 99.999% uptime. I’ve worked with companies that only want to hold on to the most recent recovery point. At the same time, they need to hold on to all recovery points for just one service in their environment.