Large FOSS filesystems, was Re: [Beowulf] 512 nodes Myrinet cluster Challanges

Gerry Creager N5JXS gerry.creager at tamu.edu
Sat May 6 06:05:28 PDT 2006


We've several years of xfs experience now, and have not seen any 
filesystem-related issues on systems up to 6TB.  We've used a 
distributed storage model until recently, but with teh CoRAID AoE 
protocol hardware in place we now have over 30TB to start stringing 
together on one consistent (virtual) hardware platform.  So far, it's 
xfs, with no problems.

gerry

Craig Tierney wrote:
> Dan Stromberg wrote:
> 
>> On Thu, 2006-05-04 at 12:16 -0600, Craig Tierney wrote:
>>
>>> Dan Stromberg wrote:
>>>
>>>>> Ooops, sorry, english is not my native language and I can make
>>>>> mistakes :-) I liked pvfs before and I love pvfs2 now.
>>>>> Well, I think the problems are those you are mentioning, first it 
>>>>> goes a bit slower than let's say nfs or something like gfs over gnbd
>>>>> (for small clusters)... in any case it is not so slow. The other
>>>>> is that you need the nodes that are metadata or I/O  servers have
>>>>> to be up, that means that the probability of file system failure is 
>>>>> higher.
>>>>>
>>>>> The adventages are many, parallel I/O is a plus, not only for mpi 
>>>>> programs
>>>>> but also for the normal tasks, if you  try to convert the format of 
>>>>> a lot of images you can split the work between nodes, but this is 
>>>>> an adventage only if your file system can handle that, which is not 
>>>>> the case of nfs obviously.
>>>>>
>>>>> In other words, pvfs2 is free, great and useful. it works well  as 
>>>>> a scratch area and it uses resources that otherwise are not visible
>>>>> for the user. And for myrinet users it goes over gm which is nice.
>>>>
>>>> On a somewhat related note, are there any FOSS filesystems that can
>>>> surpass 16 terabytes in a single filesystem - reliably?
>>>
>>> What do you want to do with your 16 TB?  Does PVFS2 not meet your 
>>> needs or your level of reliability? What don't you find reliable 
>>> about it?
>>
>>
>> We want to store scientific datasets - and we actually wanted more like
>> 30T, but had to settle for less.
>>
>>> I expect that xfs would work just fine.  The question is, how can you 
>>> access it?  You can export it with NFS, but the performance doesn't 
>>> scale.
>>
>>
>> If it'd be at least semi reliable, this application would probably be
>> fine with that.
> 
> 
> My concern wouldn't be the stability of the xfs filesystem.  We have 
> used it for almost 5 years now in the configuration discussed above.
> The filesystems weren't as large as 16TB (no more than 2TB), but
> that is so we could divide performance over several servers.
> 
> My concern with this setup isn't xfs, it would be the stability of
> the storage.  Also, if there is a disk hiccup  (which will happen) that
> repairing a 16 TB filesystem takes a long time.  A distributed 
> filesystem (PVFS2, Ibrix, etc) you would only have to fix the one 
> volume, not the entire filesystem.  There may be some filesystem 
> consistency checks after repair, but not to the extent of a full 
> filesystem check.
> 
> Craig
> 
> 
> 
> 
>>
>>> Why FOSS (not to start a flame war)?  What if AcmeFS was reasonably 
>>> priced and did what you needed it to do?
>>
>>
>> We already have a commercial solution that's working pretty well, so if
>> we go commercial, we might return to that.  FOSS tends to improve faster
>> though, and if you get it with a support contract, it seems pretty
>> win-win.
>>
>>
>>
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843



More information about the Beowulf mailing list