Large FOSS filesystems, was Re: [Beowulf] 512 nodes Myrinet cluster Challanges

Craig Tierney ctierney at
Fri May 5 11:36:01 PDT 2006

Dan Stromberg wrote:
> On Thu, 2006-05-04 at 12:16 -0600, Craig Tierney wrote:
>> Dan Stromberg wrote:
>>>> Ooops, sorry, english is not my native language and I can make
>>>> mistakes :-) I liked pvfs before and I love pvfs2 now. 
>>>> Well, I think the problems are those you are mentioning, first it 
>>>> goes a bit slower than let's say nfs or something like gfs over gnbd
>>>> (for small clusters)... in any case it is not so slow. The other
>>>> is that you need the nodes that are metadata or I/O  servers have
>>>> to be up, that means that the probability of file system failure is higher.
>>>> The adventages are many, parallel I/O is a plus, not only for mpi programs
>>>> but also for the normal tasks, if you  try to convert the format of a lot 
>>>> of images you can split the work between nodes, but this is an adventage 
>>>> only if your file system can handle that, which is not the case of nfs 
>>>> obviously.
>>>> In other words, pvfs2 is free, great and useful. it works well  as a 
>>>> scratch area and it uses resources that otherwise are not visible
>>>> for the user. And for myrinet users it goes over gm which is nice.
>>> On a somewhat related note, are there any FOSS filesystems that can
>>> surpass 16 terabytes in a single filesystem - reliably?
>> What do you want to do with your 16 TB?  Does PVFS2 not meet your needs 
>> or your level of reliability? What don't you find reliable about it?
> We want to store scientific datasets - and we actually wanted more like
> 30T, but had to settle for less.
>> I expect that xfs would work just fine.  The question is, how can you 
>> access it?  You can export it with NFS, but the performance doesn't scale.
> If it'd be at least semi reliable, this application would probably be
> fine with that.

My concern wouldn't be the stability of the xfs filesystem.  We have 
used it for almost 5 years now in the configuration discussed above.
The filesystems weren't as large as 16TB (no more than 2TB), but
that is so we could divide performance over several servers.

My concern with this setup isn't xfs, it would be the stability of
the storage.  Also, if there is a disk hiccup  (which will happen) that
repairing a 16 TB filesystem takes a long time.  A distributed 
filesystem (PVFS2, Ibrix, etc) you would only have to fix the one 
volume, not the entire filesystem.  There may be some filesystem 
consistency checks after repair, but not to the extent of a full 
filesystem check.


>> Why FOSS (not to start a flame war)?  What if AcmeFS was reasonably 
>> priced and did what you needed it to do?
> We already have a commercial solution that's working pretty well, so if
> we go commercial, we might return to that.  FOSS tends to improve faster
> though, and if you get it with a support contract, it seems pretty
> win-win.

More information about the Beowulf mailing list