Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Re: dedupe filesystem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Dave Love d.love at liverpool.ac.uk
Mon Jun 29 05:30:31 PDT 2009


Ashley Pittman <ashley at pittman.co.uk> writes:

> If you relied on the md5 sum alone there would be collisions and those
> collisions would result in you losing data.

The question is whether the probability of collisions is high compared
with other causes -- presumably hardware, assuming no-one puts figures
on the software reliability.  As far as I remember, the calculation for
SHA-1 for Plan 9's Venti¹, which no-one seems to have mentioned, says
ignore collisions for petabyte filesystems.

Ob-Beowulf:  You can run Venti on GNU/Linux,² but I don't know how the
current implementation performs.  Also, GlusterFS has a `data
de-duplication translator' on its roadmap, which I didn't see mentioned.

--
1. http://plan9.bell-labs.com/sys/doc/venti/venti.html
2. http://swtch.com/plan9port/




More information about the Beowulf mailing list