[Beowulf] Surviving a double disk failure
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at caltech.eduFri Apr 10 13:15:54 PDT 2009
- Previous message: [Beowulf] Surviving a double disk failure
- Next message: [Beowulf] Surviving a double disk failure
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Billy Crook <billycrook at gmail.com> wrote: > As a very, > very, general rule, you might put no more than 8TB in a raid5, and no > more than 16TB in a raid6, including what's used for parity, and > assuming magnetic, enterprise/raid drives. YMMV, Test all new drives, > keep good backups, etc... Thankfully I don't have to do this myself, not having data anywhere near that size to cope with, but it seems to me that backing up a nearly full 16TB RAID is likely to be a painful, expensive, exercise. Going with tape first... The fastest tape drives that I know of are Ultrium 4's at 120 MB/s. In theory that could copy 1GB every 8.3 seconds, 1TB every 8300 seconds ( AKA 138 minutes, or a bit over 2 hours), and for that 16 TB data set, something over 32 hours. Except that there is no tape with that capacity, Max listed is still 800 GB, so it would take 20 tapes. And really obtaining a sustained 120MB/s from the RAID to the tape is likely extremely challenging. In any case, it looks like this calls for a tape robot of some sort, with many drives in it. Not cheap. On the plus side, transporting a box of 20 tape cartridges to "far away" is not particularly difficult, and they are fairly impervious to abuse during shipment. The other obvious option is to replicate the RAID. Now if the duplicate RAID is on site, connected by a 1000baseT network, one could obtain a very similar transfer rate - and a full backup would take just as long as for the single tape drive (neglecting rewind and cartridge change times). This at the expense of still losing all the data in some sort of sitewide disaster. I can imagine, and suspect somebody has this already, implementing, a specialized disk->disk connect, such that one would plug Raid A into Raid B, and all N disks in A could copy themselves in parallel onto all N disks in B at full speed. Assuming 1TB disks and a sustained 75Mb/sec read from A and write to B, the whole copy would be done in about 222 minutes. Not exactly the blink of an eye, but a heck of a lot better than 32 hours. Placing the backup RAID physically offsite would improve the odds of the data surviving, but reduce the bandwidth available, and moving the copied RAID physically offsite after each backup is a recipe for short disk lives. Since all of the obvious options are so slow, I expect most sites are doing incremental backups. Which is fine, until the day comes when one has to restore the entire data array from two year's worth of incremental backups. Or maybe folks carry the tape incremental backups to the offsite backup RAID and apply them there? Is there an easier/faster/cheaper way to do all of this? Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] Surviving a double disk failure
- Next message: [Beowulf] Surviving a double disk failure
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
