[Beowulf] using two separate networks for different data streams
ajt at rri.sari.ac.uk
Fri Jan 27 10:47:18 PST 2006
Douglas Eadline wrote:
> Indeed, an excellent question. It seems logical, does it really help though
> (or do I just feel clever about using the extra Ethernet Port) I can see
> that if you have a lot of monitoring traffic that might cause an issue,
> but I have never tested that notion as well. Of course it all depends...
> I wonder if a dual Ethernet node would be better served by something like
> a FNN (http://aggregate.org/FNN/) Tim Mattox can probably weigh in on
One of the first systems I saw that used a dual ethernet, as opposed to
just channel bonding multiple NIC's, was the EPCC BOBCAT:
Although this system has now been dismantled, it inspired me to build a
similar cluster here at the Rowett:
The most important feature of a 'BOBCAT' architecture Beowulf is the use
of 'diskless' compute nodes with separate dual network fabrics for the
'system' and 'application' traffic. The 'diskless' nodes are really
'dataless' because they have scratch disks for /tmp and swap, but no
operating system installed.
This approach is useful because it means that you can still control the
Beowulf cluster via the 'system' network even if the 'application'
network becomes staturated. The traffic is segregated between the two
private network fabrics.
In fact, the system I built here has three NIC's in the servers and uses
NAT on the head node to allow compute nodes to make outgoing connections
to the internet from the private cluster network via the LAN so that,
for example, our folding at home jobs on the nodes can download work units.
This system works very well and, incidentally, demonstrates that poor
perfomance of 'diskless' compute nodes with NFS-mounted root filesystems
might have more to do with saturation of the cluster interconnection by
HPC 'application' traffic than NFS congestion on a 64-node cluster. I'm
aware that NFS does not scale up very well to large clusters: No flames!
Our cluster has three networks:
220.127.116.11 LAN 100Base-T (public)
192.168.0.0 System 100Base-T (private)
192.168.1.0 Application Gigabit (private)
The compute nodes have two NIC's connected to the private network. The
servers have three NIC's connected to the private networks and the LAN.
Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk
Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687
More information about the Beowulf