[Beowulf] Some beginner's questions on cluster setup
jac67 at georgetown.edu
Thu Jul 9 08:16:02 PDT 2009
The diskless provisioning system is definitely the way to go. We use the
cluster toolkit called, Jesswulf, which is available at
By default it runs on RedHat/Centos/Fedora systems, though it has been
ported to Ubuntu and SuSE without too much trouble. Perseus/Warewulf
also work well. We also teach cluster courses, which may be helpful.
To answer some of your questions, I prefer the read-only NFSROOT
approach with a small (less than 20 MB ramdisk). We use this on all of
our clusters (about 7 clusters) and it works fine. We even use it on
heterogeneous systems. One cluster has a mix of P4 Xeons, dual-core
Opterons, and quad-core Xeons all using the same NFSROOT so you simply
update one directory on the master node and *all* of the compute nodes
have the new software. We love it! We simply either compile the kernel
or make the initrd with hardware support for all of the nodes. We often
use different hardware for the master and compute nodes, without issue.
The only thing that we don't mix is 32 and 64-bit. We have a couple of
32-bit clusters and the rest are 64-bit.
The main issue that you need to deal with is having a fast enough
storage system for parallel jobs that generate a lot of data. We use the
local hard drives in the computes nodes for "scratch" space and we have
some type of shared file system. On the small clusters, we use NFS, but
on the bigger clusters we use Glusterfs with Infiniband, which has
proven to be very nice. If you are running MPI jobs with lots of data,
you might want to consider adding Infiniband. Even the cheap ($125)
Infiniband cards give much better performance than standard Gigabit. And
you can always run IP over IB for applications or services that need
You mention that you don't think that you will have too much MPI
traffic, but that you will be copying the results back to the master.
This is when we see the highest load on our NFS file systems when all of
the compute nodes are writing at the same time, even on small clusters
(less than 20 nodes). We've found that a clustered file system like
Glusterfs provides very low I/O wait load when copying lots of files
compared to NFS. You may consider picking up some of the cheap IB cards
($125) and switches ($750 for 8-ports/$2400 for 24-ports) in order to do
some relatively inexpensive testing. Here is one place where you can
I'd be happy to talk to you. My phone number is below and you have my
Advanced Research Computing &
High Performance Computing Training
> Im new to the list & also to cluster technology in general.
> Im planning on building a small 20+node cluster, and I have some basic
> We're planning on running 5-6 motherboards with quad-core amd 3.0GHz
> phenoms, and 4GB of RAM per node.
> Off the bat, does this sound like a reasonable setup
> My first question is about node file&operating systems:
> I'd like to go with a diskless setup, preferably using an NFS root for each
> However, based on some of the testing Ive done, running the nodes off of the
> NFS share(s) has turned out to be rather slow & quirky.
> Our master node will be running on a completely different hardware setup
> than the slaves, so I *believe* it will make it more complicated & tedious
> to setup&update the nfsroots for all of the nodes (since its not simply a
> matter of 'cloning' the master's setup&config).
> Is there any truth to this, am I way off?
> Can anyone provide any general advice or feedback on how to best setup a
> diskless node?
> The alternative that I was considering was using (4GB?) USB flash drives to
> drive a full-blown,local OS install on each node...
> Q: does anyone have experience running a node off of a usb flash drive?
> If so, what are some of the pros/cons/issues associated with this type of
> My next question(s) is regarding network setup.
> Each motherboard has an integrated gigabit nic.
> Q: should we be running 2 gigabit NICs per motherboard instead of one?
> Is there a 'rule-of-thumb' when it comes to sizing the network requirements?
> (i.e.,'one NIC per 1-2 processor cores'...)
> Also, we were planning on plugging EVERYTHING into one big (unmanaged)
> gigabit switch.
> However, I read somewhere on the net where another cluster was physically
> separating NFS & MPI traffic on two separate gigabit switches.
> Any thoughts as to whether we should implement two switches, or should we be
> ok with only 1 switch?
> The application we'll be running is NOAA's wavewatch3, in case anyone has
> any experience with it.
> It will utilize a fair amount of NFS traffic (each node must read a common
> set of data at periodic intervals),
> and I *believe* that the MPI traffic is not extremely heavy or constant
> (i.e., nodes do large amounts of independent processing before sending
> results back to master).
> Id appreciate any help or feedback anyone would be willing&able to offer...
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf