Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] NFS shared file system

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Ted Sariyski tsariysk at craft-tech.com
Mon Dec 5 08:37:05 PST 2005


Hi Mark,

Thanks for your comments. We combined clusters into a single one in 
anticipation of jobs running on more than ~60 cpu and in a hope to get 
better utilization of the existing resources. While in 3x30 state the 
typical CPU load was ~1-5% and CFD IO was ~700-1000 req/s. It required a 
lot of nfs tuning but once tuned it worked fine. Now I am trying to find 
out what is the cheapest solution for  90-node cluster. One solution in 
20k-30k range we have been offered is a 3 TB fiber channel  nSTORE SAN 
with Montilio's RapidFile I/O engine.  Does anybody have experience with 
it? Lustre, GPFS, HP's SFS, Panasys, etc. will be our next  longterm step.

Thanks, Ted



Mark Hahn wrote:

>>Each cluster had its own head node and its own cheap, in-house build
>>RAID exported over GB NFS. Recently we combined the existing clusters
>>    
>>
>
>when it was in the 3x30 state, did you do any measurements of the raid's
>internal performance, and performance when under "normal" load by the nodes?
>also, have you characterized the IO load of your CFD application?
>
>  
>
>>into one and the first problem we have is with the mass storage,
>>occasionally it cannot handle the IO load. My question is if I buy a
>>commercial NAS what are the chances that after that I'll need to replace
>>GB with Mirinet (e.g.)? 
>>    
>>
>
>well, the better question is why you got rid of two of the IO nodes - 
>or did you?
>
>  
>
>>clusters but from what I read in this newsgroups my understanding is
>>that 90 nodes is a small cluster and I didn't expect scalability
>>problems at this level.
>>    
>>
>
>the traffic here is somewhat specialized, of course - people doing 16-node
>clusters are not having any problems, and so don't speak up ;)
>
>90 nodes is clearly enough to show real scaling problems if the load is 
>reasonably intensive and from multiple nodes simultaneously.  is it safe 
>to assume you've done the basic first steps in tuning (lots of nfsd's,
>perhaps also higher AC parameters on the client side, probably not using
>the default 32K packets?)
>
>  
>
>>If a commercial storage optimized for IO is a
>>solution what is the price I'm facing? Any recomendations?
>>    
>>
>
>depends on what your IO goals are.  do you insist on a single filesystem
>implemented across multiple server nodes?  if so, you have to look into 
>cluster-fs things like Lustre, GPFS, HP's SFS, Panasys, etc.  the overhead
>(dollars and brains) is nontrivial.
>
>I would probably split the workload across three independent NFS's,
>and also try some basic tuning.  these are cheap, easy to do and will
>definitely improve performance.
>
>more speculative things:
>
>	- use LACP or related techniques to provide more bandwidth out 
>	of the NFS server(s).  this will probably not improve the bandwidth
>	seen by a single node, but should come close to doubling the
>	aggregate.
>
>	- try out fscache - this is an add-on layer being promulgated by 
>	RH which creates a local disk cache to unload your NFS.
>
>  
>





More information about the Beowulf mailing list