[Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes
gmkurtzer at gmail.com
Thu Sep 3 13:16:48 PDT 2009
On Wed, Sep 2, 2009 at 11:18 PM, Mark Hahn<hahn at mcmaster.ca> wrote:
>> That brings me to another important question. Any hints on speccing
>> the head-node?
> I think you imply a single, central admin/master/head node. this is a very
> bad idea. first, it's generally a bad idea to have users on a fileserver.
> next, it's best to keep cluster-infrastructure
> (monitoring, management, pxe, scheduling) on a dedicated admin machine.
> for 300 compute nodes, it might be a good idea to provide more than one
> login node (for editing, compilation, etc).
To expand on Mark's comment...
I would SPEC >=2 systems for head/masters and either spread the load
of the required services (e.g. management, monitoring and other
sysadmin tasks and put scheduling on the other) OR put all of the
services on a single master and then run a shadow master for
redundancy. I would not put users on either of these systems.
If you were using Perceus.....
I would either create an interactive VNFS capsule (include compilers,
additional libs, etc..) or make a large more bloated compute VNFS
capsule and use that on all of the nodes.
In this scenario, all nodes could run stateless *and* diskful so if
you need to change the number of interactive nodes you can do it with
a simple command sequence:
# perceus vnfs import /path/to/interactive.vnfs
# perceus node set vnfs interactive n000[0-4]
# perceus vnfs import /path/to/compute.vnfs
# perceus node set vnfs compute n0[004-299]
Have your cake and eat it too. :)
The file system needs to be built to handle the load of the apps. 300
nodes means you can go from the low end (Linux RAID and NFS) to a
higher end NFS solution, or upper end of a parallel file system or
maybe even one of each (NFS and parallel) as they solve some different
More information about the Beowulf