[Beowulf] how large of an installation have people used NFS with? would 300 mounts kill performance?

Wed Sep 9 22:11:38 PDT 2009

> Our new cluster aims to have around 300 compute nodes. I was wondering
> what is the largest setup people have tested NFS with? Any tips or

well, 300 is no problem at all.  though if you're talking to a 
single Gb-connected server, you can't home for much BW per node...

> comments? There seems no way for me to say if it will scale well or
> not.

it's not to hard to figure out some order-of-magnitude bandwidth
requirements.  how many nodes need access to a single namespace 
at once?  do jobs drop checkpoints of a known size periodically?
faster/more ports on a single NFS server gets you fairly far 
(hundreds of MB/s), but you can also agregate across multiple NFS
servers (if you don't need all the IO in a single directory...)

> I have been warned of performance hits but how bad will they be?

NFS is fine at hundreds of nodes.  nodes can generate a fairly high 
load of, for instance, getattr calls, but that can be mitigated some
with an acregmin setting.

> Infiniband is touted as a solution but the economics don't work out.

depends on how much bandwidth you need...

> Assume each of my compute nodes have gigabit ethernet AND I specify
> the switch such that it can handle full line capacity on all ports.

but why?  your fileservers won't support saturating all nodes links 
at once, so why a full-bandwidth fabric?  the fabric backbone only 
needs to match the capacity of the storage (I'd guess 10G would be 
reasonable, unless you really ramp up the number of fat fileservers.)
or do you mean the fabric is full-bandwidth to optimally support MPI?

> If not NFS then Lustre etc options do exist. But the more I read about

yes - I wouldn't resort to Lustre until it was clear that NFS wouldn't do.
Lustre does a great job of scaling content bandwidth and capacity all
within a single namespace.  but NFS, even several instances, is indeed
a lot simpler...