Data backup is one of those things that everybody talks about, few people do, fewer people do well, and fewer still have actually tested.
What makes a good backup strategy?
To me, a backup strategy needs to have a few qualities.
- It has to be easy. If it’s not, you won’t keep up with it.
- It has to be reliable. Backing up your data won’t do you any good if your backups aren’t good.
- It has to be redundant. Backups can go bad too.
- It has to be recoverable from. If you’re encrypting your backups and you forget your key, they’re useless.
Now, what brought me here, and how did I attain those goals?
First, why so
I’ve been paranoid about data loss for a long time, and I spent a good deal of time and effort trying to figure out what the best strategy would be for me that met all of the requirements that I outlined above. But why was I so paranoid to begin with?
When I was a freshman in college I experienced my first hard drive failure. My Western Digital hard drive suddenly gave up the ghost, taking with it all of my software that I had written over the years (much of it written in x86 assembly) with it. Try as I may, I couldn’t recover anything. I would have paid anything then to get that data back, but being a poor college student professional data recovery wasn’t an option. With no real backups, my entire digital life to this point was wiped clean.
It was at this point that I learned the importance of backups. I didn’t, however, learn the importance of a good backup strategy. To that end, I would burn CDs and email myself copies of things that were REALLY important. Other important things were zipped up and stored on another hard drive. Sometimes I’d just copy and paste a folder somewhere else. I’d have multiple copies of things floating around, and no real way to tell which was the most recent, or most correct.
At the time, I thought that this worked. Mostly because I just didn’t know any better. Had I experienced a drive failure during that period, I’d have been sent on a wild goose chase through my old email, unlabeled physical media, and folders upon folders of copies of various files and zip archives. I’ve since seen the error my ways. In part because I’ve gotten smarter, and in part because technology has gotten smarter.
A new strategy is born
My new backup strategy is much, much more robust, easy to manage, and easy to recover from.
- OS Drive - OS, applications, basic configurations and settings (160gb OCZ Vertex 2 SSD)
- Secondary Drive - VMWare Virtual Machine (1tb Seagate)
- External Array - All other documents (2nd Gen Drobo w/ 4x 2tb Seagate)
- External Drive - Local backups (1.5tb Seagate)
- External Enclosure - Enclosure for the External Drive (SentrySafe QA0121 Fire-Safe Waterproof Data Storage Chest)
- Printer/Scanner - Document digitizing (Epson Artisan 835)
How Does This Work Together?
Every 4 hours, CrashPlan backs up changes to the Virtual Machine on the Secondary Drive to the External Drive (encrypted with TrueCrypt), the Drobo, and to CrashPlan+.
Once per week, DroboCopy copies the Virtual Machine to the Drobo. This is done to give me an instantly available copy-and-paste snapshot of the server to get back up while I recover the most recent version through a CrashPlan restore.
In real-time, CrashPlan watches for changes to anything of high value on the Drobo and backs those changes up to the External Drive CrashPlan+.
Those HVFs, in addition to source code, pictures, tax returns, and the like, include scans of important physical documents (product warranties, contracts, receipts, etc.) from the Artisan 835. The original physical documents are kept in a separate fire/water-proof safe. In addition, using the Artisan, I create hard copies of digital documents (receipts and the like) for physical storage.
Source code is also stored in git repositories on the Virtual Machine so that I have full revision history for any project that I’m working on (my old CVS repositories have been deprecated and converted to git repositories).
What does all of this do for me? I have several points of recovery available, and aside from the OS and applications which have no irreplaceable value, I have no more than 4 hours of unrecoverable data. This all took quite a while to set up, but the peace of mind is worth it. The backups are pretty much out-of-site, out-of-mind, and I never have to worry about a manual step to protect my data.
Is it excessive? Perhaps, but I never worry about losing another piece of important data again.
There are a couple of things that I’d like to improve, but they’re not critical. One, I’d like to upgrade to a DroboFS to remove the dependency of having my Drobo physically attached to my primary machine. Second, I really wish CrashPlan would allow me to add machines to my account without buying a family subscription. I only have one more machine I’d like to add (the Virtual Machine), and the cost of a family subscription just isn’t worth it when I work around that limitation by backing up the entire machine (since it’s just a set of files). It’s just annoying.
I’ve since upgraded from the Gen2 Drobo mentioned above to a DroboFS (2x 3Tb, 2x 2Tb, 1x 1Tb with dual-drive redundancy). In addition to the speed benefits and the obvious benefits of being a NAS, my paranoia during rebuilds for the array makes dual-drive redundancy a must have. Unfortunately, the DroboFS is currently having a lot of different issues (though none that seem to be putting my data at risk). I have a support ticket in with Data Robotics and hopefully they can address the issues.