[Beowulf] torus versus (fat) tree topologies

Dan Kidger daniel.kidger at quadrics.com
Sun Nov 14 08:07:49 PST 2004


On Saturday 13 November 2004 12:55 am, Greg Lindahl wrote:
> On Fri, Nov 12, 2004 at 06:02:08PM -0500, Patrick Geoffray wrote:
> > How do you close your Torus without long cables ? Unless you stack your
> > nodes in a circle, you will need long cables.
>
> BlueGene's trick is to attach every other node to the end, and then
> come back with the unused ones. So:
>
> Nodes: 0 1 2 3 4
>
> Connections: 0 -> 2 -> 4 -> 3 -> 1 -> 0
>
> No long cable. This is probably a classic solution not invented by IBM.

This is pretty common and certainly not an IBM invention. I believe most 
Dolphin SCI Linux clusters are cabled like this.  (In these cases I have also 
seen the nodes numbered following the the interconnect position rather than 
the physical position in the rack.)
    The disadvantage is getting your head round the cabling in the second 
dimension - you need a good diagram to follow.  (and if a 3d torus...)


When discussing topologies don't foget to include the hypercube topology. Many 
systems used this - most notably being the SGI Origin 2000 and the Origin 
3000.   The successor is the SGI Altix (a Linux cluster) but this uses a 
switched fat-tree. Small Altix 350 configurations however use a 1d ring to 
link the compute nodes.

Finally when comparing interconnect topologies consider the subject of 
performance tuning. With a fat-tree or a full-crossbar, MPI performance 
should be identical no matter which set of nodes of your cluster you use for 
your application. For a hypercube or a 2d/3d torus performance varies 
depending on which nodes your application used for a particular run - making 
it very hard to get repeatable timings and hence optimise your code.
   Likewise for a full fat tree, your application performance is not affected 
by other jobs running on the same cluster (since there are independant routes 
through the network). With the other two toplogies, application runs can be 
slowed down by other people's parallel jobs. I remember this annoying users 
who paid for their HPC usage by wallclock.


Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------







More information about the Beowulf mailing list