Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] OT: recoverable optical media archive format?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Reuti reuti at staff.uni-marburg.de
Tue Jun 8 12:03:59 PDT 2010


Hi,

Am 08.06.2010 um 19:44 schrieb David Mathog:

> This is off topic so I will try to keep it short:  is there an
> "archival" format for large binary files which contains enough error
> correction to that all original data may be recovered even if there  
> is a
> little data loss in the storage media?
>
> For my purposes these are disk images, sometimes .tar.gz, other times
> gunzip -c of dd dumps of whole partitions which have been "cleared" by
> filling the empty space with one big file full of zero, and then that
> file deleted.  I'm thinking of putting this information on DVD's (only
> need to keep it for a few years at a time) but I don't trust that  
> media
> not to lose a sector here or there - having watched far too many
> scratched DVD movies with playback problems.
>
> Unlike an SDLT with a bad section, the good parts of a DVD are still
> readable when there is a bad block (using dd or ddrescue) but of  
> course
> even a single missing chunk makes it impossible to decompress a .gz  
> file
> correctly.  So what I'm looking for is some sort of .img.gz.ecc  
> format,
> where the .ecc puts in enough redundant information to recover the
> underlying img.gz even when sectors or data are missing.   If no such
> tool/format exists then two copies should be enough to recover all  
> of an
> .img.gz so long as the same data wasn't lost on both media, and if bad
> DVD sectors always come back as "failed read", never ever showing up  
> as
> a good read but actually containing bad data.  Perhaps the frame
> checksum on a DVD is enough to guarantee that?

besides splitting the file, I would suggest to generate some par/par2  
files. This format was originally used on the Usene, to have a  
reliable way to transfer binary attachements. I.e. first you split  
your files into e.g. 10 pieces each and generate 5 par/par2 files for  
each of them. Then you need any 10 out of these 15 into total to be  
good to recover the original file.

http://en.wikipedia.org/wiki/Parchive

-- Reuti


> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list