[Beowulf] Re: scratch File system for small cluster

Thu Sep 25 09:52:40 PDT 2008

Glen,

I have had great success with the *right* 10GbE nic and NFS.  The  
important things to consider are:

How much bandwidth will your backend storage provide?  2 x FC 4 I'm  
guessing best case is 600Mb but likely less.
What access patterns do the "typical apps" have?
All nodes read from a single file (no prob for NFS, and fscache may  
help even more)
All nodes write to a single file (NFS may need some help or may be too  
slow when tuned for this)
All nodes read and write to separate files (NFS is fine if the files  
aren't too big for the OS to cache reasonably).

The number of IO servers really is a function of how much disk  
throughput you have on the backend, frontend, and through the kernel/ 
filesystem goo.  My experience is a 10GbE nic from Myricom can easily  
sustain 500-700MB/s if the storage behind it can and the access  
patterns aren't evil.  Other nics from large and small vendors can  
fall apart at 3-4 Gb so be careful and test the network first before  
assuming your FS is the troublemaker.  There are cheap switches with 2  
or 4 10GbE CX4 connectors that make this much simpler and safer with  
or without the Parallel FS options.

Depending on how big/small and how "scratch" the need is... a big  
tmpfs/ramdisk can be fun :)

Good luck!
Greg

On Sep 25, 2008, at 9:01 AM, beowulf-request at beowulf.org wrote:

> Date: Thu, 25 Sep 2008 09:40:54 -0400
> From: Glen Beane <Glen.Beane at jax.org>
> Subject: [Beowulf] scratch File system for small cluster
> To: "beowulf at beowulf.org" <beowulf at beowulf.org>
> Message-ID: <C5010D26.184D%glen.beane at jax.org>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I am considering adding a small parallel file system ~(5-10TB) my  
> small
> cluster (~32 2x dual core Opteron nodes) that is used mostly by a  
> handful of
> regular users.  Currently the only storage accessible to all nodes  
> is home
> directory space which is provided by the Lab's IT department (this  
> is a SAN
> volume connected to the head node by 2x FC links, and NFS exported  
> to the
> compute nodes). I don't have to "worry" about the IT provided SAN  
> space -
> they back it up, provide redundant hardware, etc.  The parallel file  
> system
> would be scratch space (and not backed up by IT).  We have a mix of  
> home
> grown apps doing a pretty wide range of things (some do a lot of I/ 
> O, others
> don't), and things like BLAST and BLAT.
>
> Can anyone out there provide recommendations for a good solution for  
> fast
> scratch space for a cluster of this size?
>
> Right now I was thinking about PVFS2. How many I/O servers should I  
> have,
> and how many cores and RAM per I/O server?
> Are there other recommendations for fast scratch space (it doesn't  
> have to
> be a parallel file system, something with less hardware would be nice)
>
> --
> Glen L. Beane
> Software Engineer
> The Jackson Laboratory
> http://www.jax.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080925/36f29d7c/attachment.html>