[Beowulf] dedupe filesystem
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Lawrence Stewart stewart at serissa.comFri Jun 5 12:09:40 PDT 2009
- Previous message: [Beowulf] dedupe filesystem
- Next message: [Beowulf] dedupe filesystem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Jun 5, 2009, at 1:12 PM, Joe Landman wrote: > Lux, James P wrote: > > It only looks at raw blocks. If they have the same hash signatures > (think like MD5 or SHA ... hopefully with fewer collisions), then > they are duplicates. > >> maybe a better model is a “data compression” algorithm on the fly. > > Yup this is it, but on the fly is the hard part. Doing this > comparison is computationally very expensive. The hash calculations > are not cheap by any measure. You most decidedly do not wish to do > this on the fly ... > >> And for that, it’s all about trading between cost of storage space, >> retrieval time, and computational effort to run the algorithm. > > Exactly. I think the hash calculations are pretty cheap, actually. I just timed sha1sum on a 2.4 GHz core2 and it runs at 148 Megabytes per second, on one core (from the disk cache). That is substantially faster than the disk transfer rate. If you have a parallel filesystem, you can parallize the hashes as well. -L
- Previous message: [Beowulf] dedupe filesystem
- Next message: [Beowulf] dedupe filesystem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
