Massively Parallel Backup

Finished part I of the Unix System Administration class yesterday. Heard some interesting bits from the prof on how national laboratories dealing with problems of missing hard drives are moving to NetBoot scenarios, where the NetBoot server is behind “GGG” (guards, gates and guns). He had just finished purchasing, and is preparing to install, $38,000 worth of Apple XServes and X-Raids for a massively redundant, super-secure NetBoot deployment… designed to service four (count ’em!) users.

Talking a lot about backup techniques in the class this week. Question: at what point does a system become so large that backups simply defy the laws of physics and the limits of current technology? Heard about a system he had worked on with hundreds of servers, each with many terabytes of storage – around 200TB of data in all. Their engineers were among the best in the world at building high-speed parallel networks, super-efficient load-balancing servers, etc. And they owned some of the largest and fastest tape silos in the world. But no matter how much money they threw at the problem, they were not able to back up more than 75TB per week. Full nightly backups were simply not going to happen for them.

In my own little world, finally, after all these years, have a nightly backup system in place for the whole house – a modified version of the rsync scripts I use for birdhouse and journalism, which at all times keep both a complete bit-perfect current mirror and also parallel dirs for each of the past 30 days containing changed or deleted files. But for the home network, replaced the version of rsync that ships with OS X with the binary from RsyncX, which preserves HFS+ attributes and metadata.

Music: The Carter Family :: The East Virginia Blues

Massively Parallel Backup

One Reply to “Massively Parallel Backup”

Leave a Reply