Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

large clusters and topologies

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Patrick GEOFFRAY pgeoffra at lhpca.univ-lyon1.fr
Sat Jul 29 06:53:14 PDT 2000


Steffen Persvold wrote:

> First of all you should know that scalable SCI clusters doesen't use
> switches, but is connected in a torus topology. This is possible because
> the adapters can switch traffic themselves between several link
> controllers (LC). In fact the 6-port SCI-switch is basically 6 LC's
> connected together with the B-Link (Backside Link for SCI link
> controllers). Thus you don't need more than one adapter on the PCI bus,
> just plug on an addoncard (mezzanine) to the adapter, and you have a
> torus topology instead of a single ringlet. Up to two mezzanines can be
> connected (3D).

That's interesting. So, if I understand well, the switch is
onboard, the B-link plays this role. It's similar to the model of
the ATOLL network or another network developped by a University in
Paris (MPC interconnect) where there's a small crossbar on the NIC
itself.
Basically, with this model, you do need a external switch, just a
lot of cables :-)

The bottleneck in this case is the bandwidth of the B-link. The
B-link is a 64bits/50 Mhz bus (400 MB/s) with a very efficient
arbitration loop (1 cycle). It's ok with a 32 bits/33 Mhz PCI bus,
as the B-link can sustain at least 3 times the PCI traffic. But
with a 64/66 PCI, the B-link is not able to sustain the PCI
bandwidth. (We have measured 500 MB/s on 64/66 PCI on a
Pentium-based motherboard).

> CONCLUSION: When the bandwidth provided by the SCI interconnect is
> higher than one provided on PCI, the scalability in terms of bandwidth
> is linear up to 1700 nodes (assuming a 3D-torus).

And if the bandwith of the B-link is large enough to sustain 3
times the PCI bandwitdh for a 3D-torus. With a 66/64 PCI bus, you
can only do 1D.

Anyway, the paper is well written, it's a good reference. 
Do you have/plan similar studies about the scalability in term of
latency ? In case of a 3D-torus, the number of "hops" to reach a
node at the other end of the torus can be large, so the cost to
cross the intermediate B-links will increase linearly with the
number of hops.
Do you know if dolphin plan to increase the bandwidth of the
B-link, to provide a full crossbar performance for a 3D-torus
topology with 64/66 PCI, that means at least 3x500 MB/s = 1.5 GB/s
?

> Finally; if anyone feels offended by getting this information, I am
> sorry.

This is a mailing list, we are here to share information. Don't be
sorry :-)
Nobody will be offended if we talk about technical points (at
least not me).


Patrick Geoffray
---
Aerospatiale Matra - Sycomore
Universite Lyon I - RESAM
http://lhpca.univ-lyon1.fr




More information about the Beowulf mailing list