Don Holmgren djholm at fnal.gov
Thu Jun 12 08:08:21 PDT 2008

Ramiro -

You might want to also consider buying just a single 24-port switch for your 22 
nodes, and then when you expand either replace with a larger switch, or build a 
distributed switch fabric with a number of leaf switches connecting into a 
central spine switch (or switches).  By the time you expand to the larger 
cluster, switches based on the announced 36-port Mellanox crossbar silicon will 
be available and perhaps per port prices will have dropped sufficiently to 
justify the purchase delay and the disruption at the time of expansion.

If your applications can tolerate some oversubscription (less than a 1:1 ratio 
of leaf-to-spine uplinks to leaf-to-node connections), a distributed switch 
fabric (leaf and spine) has the advantage of shorter (and cheaper) cables 
between the leaf switches and your nodes, and relatively fewer longer cables 
from the leaves back to the spine, compared with a single central switch.

We have many Flextronics switches - SDR and DDR, 24-port and 144-port - on a 
pair of large clusters (520 nodes, and 600 nodes) built in 2005 and 2006. No 
complaints.  But, we have been self-supporting, and I would guess you would have 
very different support structures with Voltaire or Qlogic.  With the Flextronics
switches you will definitely be using the OFED stack, and you will have to run
a subnet manager on one of your nodes (dedicated is probably best).  You could
optionally buy an embedded subnet manager on the Voltaire or Qlogic switches,
depending upon model, though I believe for a large fabric an external subnet
manager is still recommended.

On Tue, 10 Jun 2008, Ramiro Alba Queipo wrote:

> Hello everybody:
> We are about to build an HPC cluster with infiniband network starting
> from 22 dual socket nodes with AMD QUAD core processors and in a year or
> so we will be having about 120 nodes. We will be using infiniband both
> for calculation as for storage.
> The question is that we need a modular solution and we are having 3
> candidates:
> a) Voltaire Grid Director SDR or DDR 288 ports (9988 or 2012 models)->
> seems very good and well supported, but very expensive.
> b) Qlogic SilverStorm 9120 (144 ports) -> no price and support
> information yet
> c) Flextronics 10U 144 Port Modular-> very good at price but little
> support => risky option?.
> I am in a mess. What is your opinion about this matter? Are you using
> any of this products.
> Regards

