[Beowulf] distributed file storage solution?

Bill Broadley bill at cse.ucdavis.edu
Mon Dec 11 17:53:58 PST 2006


Eric Thibodeau wrote:
> You can look into OpenAFS but be warned that you have to know infrastructure software quite well (LDAP+kerberos). It's cross-platform, can be distributed but don't think it's up to multiple writes on different mirrors though.
> 

Indeed.  There are many tough compromises in distributed filesystems.  Alas 
there are many conflicting goals.  Coherency vs performance is a big one, you 
pretty much get one or the other.  Locking is another ugly one, databases
and some applications assume bit range locking which is sometimes available,
sometimes not.  Many unix programs assuming posix locking, again sometimes
available.  So, unfortunately it's easy to ask for a distributed filesystem
which does not exist.

I'll provide my current brain dump on the various pieces I've been tracking,
I'm sure there are some inaccuracies included, but hopefully they are small 
ones.  As always comments and corrections welcome.

A high level overview of opanafs:
* Openafs is distributed, but not p2p.
* performs well (assuming cache friendliness, and a single peer accessing
   the same files/directories)
* scales well (for reads, because RO volumes can be replicated)
* has a universal namespace
* places little trust in a peer (getting root on a client != ability to
   read all files)
* allows for transparent volume migration (the client doesn't complain when a
   volume is migrated)
* perfect coherency (via a subscription model)
* It also supports linux, OSX, and Windows (among others).
* relatively complex.

NFS in contrast:
* Isn't distributed (unless you count automount)
* has loose coherency (poll based)
* No replication (corrections?)
* Doesn't scale easily
* Volume migration isn't easy (nfs4 claims to enable this, I've yet to see it
   demonstrated in the real world).
* Is mostly unix specific (Microsoft had an NFS client but MS EoL'd it?)
* relatively simple

Lustre:
* client server
* scales extremely well, seems popular on the largest of clusters.
* Can survive hardware failures assuming more than 1 block server is connected
   to each set of disks
* unix only.
* relatively complex.

PVFS2:
* Client server
* scales well
* can not survive a block server death.
* unix only
* relatively simple.
* designed for use within a cluster.

Oceanstore:
* p2p
* claims scalability to billions of users
* Highly available/byzantine fault tolerant
* complex
* slow
* in prototype stage
* Requires use of an API (AFAIK it is not available as a transparently mounted
   filesystem)

So the end result (from my skewed perspective) is:
* NFS is hugely popular, easy, not very secure (at least by default), poor
   coherency, but for things like sharing /home within a cluster it works
   reasonably well.  Seems most appropriate for LAN usage.  Diskless to most
   implies NFS (and works well within a cluster or LAN).
* Lustre and PVFS2 are popular in clusters for sharing files in larger
   clusters where more than single file server worth of bandwidth is required.
   Both I believe scale well with bandwidth but only allow for a single
   metadata server so will ultimately scale only as far as single machine
   for metadata intensive workloads (such as lock intensive, directory
   intensive, or file creation/deletion intensive workloads).  Granted this
   also allows for exotic hardware solutions (like solid state storage) if you
   really need the performance.
* AFS is popular for internet wide file service, researchers love the ability
   to run an application that requires 100 different libraries anywhere in the
   world.  Sysadmins love it because then can migrate volumes without having
   to notify users or schedule downtime.  I believe performance is usually
   somewhat less than NFS within a cluster (because of higher overhead), and
   usually significantly better outside a cluster (better caching and
   coherency).

I'm less familiar with the various commercial filesystems like ibrix.

Hopefully others will expand and correct the above.




More information about the Beowulf mailing list