[Beowulf] Experience of using multiple network devices on a node in cluster
ajt at rri.sari.ac.uk
Mon May 16 08:48:32 PDT 2005
Mark Hahn wrote:
>>We have implemented clusters using one interface for parallel traffic
>>(Score) and one for general purpose/NFS traffic.
> segregating traffic is a common suggestion, but I don't really understand
> why it would be sensible. a node is unlikley to be running some mixture
> of MPI and IO jobs, at least the normal kind of node (dual).
> control/monitoring really ought to be minimal in bandwidth (per-node), no?
I used a single network fabric at first, which relied on the switches to
segregate the network traffic: We have a 64-node diskless Beowulf
cluster which is based on the EPCC 'BOBCAT' model. You are right that
the control/monitoring bandwidth is minimal, but we are using openMosix
to load-balance and the i/o can be very high as processes are migrated.
I think it is essential to throttle the bandwidth used by oM process
migration: In fact, we initially ran MPI on the 'NFS' network and left
the full bandwidth of the second fabric for oM process migration. When
we were useing a single network fabiric and the cluster was busy we had
problems with NFS timeouts and it was difficult to control the cluster.
Using two network fabrics has eliminated the problem completely...
Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk
Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687
More information about the Beowulf