[Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes

Rahul Nabar rpnabar at gmail.com
Thu Sep 3 04:06:05 PDT 2009


On Wed, Sep 2, 2009 at 10:54 PM, Joshua Baker-LePain<jlb17 at duke.edu> wrote:
> On Wed, 2 Sep 2009 at 10:29pm, Rahul Nabar wrote
>>
> Speccing storage for a 300 node cluster is a non-trivial task and is heavily
> dependent on your expected access patterns.  Unless you anticipate
> vanishingly little concurrent access, you'll be very hard pressed to service
> a cluster that large with a basic Linux NFS server.

Thanks Joshua! Question is, what's my alternatives:

Software: Change from NFS to xxx?
Hardware: Go for a external Netapp storage box?
Others.......?

>
> Whatever you end up with for storage, you'll need to be vigilant regarding
> user education.  Jobs should store as much in-process data as they can on
> the nodes (assuming you're not running diskless nodes) and large jobs should
> stagger their access to the central storage as best they can.

Nope. Not diskless nodes. Nodes have local OS and /scratch space. But
userfiles and executable installations reside on a central NFS store.
Luckily my usage patterns are such that there are no new code
development on this particular cluster. I have a tight control over
what exact codes are running (DACAPO, VASP, GPAW)
 Thus so long as I compile, wrapper-script and optimize each of the
codes the users can do no harm.

-- 
Rahul




More information about the Beowulf mailing list