[Beowulf] dedupe filesystem
hahn at mcmaster.ca
Fri Jun 5 06:52:55 PDT 2009
>> have tiered storage today, but in the future i can see a need to have
>> a storage pool with SATA and a storage pool with SAS or faster drives
>> in it.
IMO, this is a dubious assertion. I bought a couple incredibly cheap
desktop disks for home use a couple weeks ago: just seagate 7200.12's.
these are of the latest 500G/platter generation, so have the high density
and thus bandwidth:
sure, your application may require low-latency. but bandwidth is easy.
>> Some of the researchers where I am, work on data for months.
my organization's current policy is to be fairly stingy with /home and /work,
neither of which have any timeouts. /scratch currently has a 1-month timeout,
which unfortunately tends to be too short to encourage use.
>> Is this something better solved with pre/post-amble copies or through
we currently have a periodic crawler that collects data on each filesystem:
hashing each file to avoid people gaming timeouts with touch.
> The best of both worlds would certainly be a central, fast storage filesystem,
> coupled with a hierarchical storage management system.
I'm not sure - is there some clear indication that one level of storage is
not good enough?
> Oh wait, it might exist already... Well, at least it's in the works: Sun and
> CEA are working on implementing such an HSM for Lustre 2.0. See
> http://wiki.lustre.org/images/8/8b/AurelienDegremont.pdf for details.
this seems like a bad design to me. I would think (and I'm reasonably
familiar with Lustre, though not an internals expert) that if you're going to
touch Lustre interfaces at all, you should simply add cheaper, higher-density
OSTs, and make more intelligent placement/migration heuristics. I guess that
CEA already has a vast investment in some existing HSM, so can't do this.
regards, mark hahn
More information about the Beowulf