[Beowulf] Re: how large of an installation have people used NFS with? would 300 mounts kill performance?

Thu Sep 10 11:13:02 PDT 2009

On Sep 10, 2009, at 10:44 AM, Rahul Nabar wrote:

> On Wed, Sep 9, 2009 at 3:38 PM, Greg Keller <Greg at keller.net> wrote:
>> For example, Lot's of nodes reading and writing
>> different files in a generically staggered fashion,
>
> How do you enforce the staggering? Do people write staggered I/O codes
> themselves? Or can on alliviate this problem by scheduler settings?
Although there's probably a way to enforce it at the app level, or  
scheduler, all of that would require specific knowledge of what jobs  
(and nodes) are accessing what files how at what time.  I was thinking  
that if it's largely embarassingly parallel jobs that start/stop  
independently and have somewhat randomized IO, then there is some  
natural staggering.  If the app starts on all nodes simultanously and  
then they all start reading/writing the same files nearly  
simultaneously, then staggering is probably impossible and a parallel  
FS is worth investigating.

>
>> Luster or eventually pNFS if things get ugly.  But not all NFS  
>> servers are
>> created equal, and a solid purpose built appliance may handle loads a
>> general purpose linux NFS server won't.
>
> Disk array connected to generic Linux server? Or standalone
> Fileserver? Reccomendations?
>
> What exactly does a "solid purpose built appliance" offer that a
> Generic Linux server (well configured) connected to an array of disks
> does not offer?
Joe's post is spot on here.  Don't let legend and lore scare you off,  
NFS can do great things on current generic and special purpose servers  
with the right config and software.  There's nothing in your  
configuration and usage summary that screams NFS killer to me.  If you  
use generic or special purpose servers, you can repurpose them as part  
of a parallel FS if you need to.

Purpose built *appliances* generally give you:
Simple setup and admin GUI
Replication and other fancy features HPCC doesn't normally care about
Zero flexibility if you change course and head towards a parallel FS.
A singular support channel to complain to if things go badly (YMMV)

None of those matter to me more than the money they cost, so I buy  
standard servers and run standard linux NFS on internal raid  
controllers with no HA, and have occasional crashes and issues I can't  
resolve cleanly.  We are perpetually looking for a "next step" to get  
better support/stability, but it's good enough for our 300 and 600  
node systems at the moment.

>
> -- 
> Rahul

Cheers!
Greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20090910/008716c2/attachment.html>