[Beowulf] many cores and ib

Tom Elken tom.elken at qlogic.com
Mon May 5 15:07:34 PDT 2008


>> Since we have some users that need
>> shared memory but also we want to build a normal cluster for
>> mpi apps, we think that this could be a solution. Let's say about
>> 8 machines (96 processors) pus infiniband. Does it sound correct?
>> I'm aware of the bottleneck that means having one ib interface for 
>> the mpi cores, is there any possibility of bonding?

> Bonding (or multi-rail) does not make sense with "standard IB" in PCIe
> x8 since the PCIe connection limits the transfer rate of a single
> IB-Link already. 

PCIe x8 Gen2 provides additional bandwidth as Gilad said.  On Opteron
systems that is not available yet (and won't be for some time), so you
may want to search for AMD-CPU or Intel-CPU based boards that have PCIe
x16 slots.

> My hint would be to go for Infinipath from QLogic or the new ConnectX
from Mellanox since message rate is probably your limiting factor and
those technologies have a huge advantage over standard Infiniband
SDR/DDR. 

I agree that message rate may be your limiting factor.
Results with QLogic (aka InfiniPath) DDR adapters:

DDR                Peak MPI Bandwidth      Peak Message Rate
Adapter                                   (no message coalescing**)
QLE7280    PCIe x16         1950 MB/s         20-26* Million/sec (8 ppn)
QLE7240    PCIe x8          1500 MB/s         19    Million/sec  (8 ppn)

Test details:  All run on two nodes, each with 2x Intel Xeon 5410
(Harpertown, quad-core, 2.33 GHz CPUs), 8 cores per node, SLES 10.
except,
* 26 M messages/sec requires faster CPUs, 3 to 3.2 Ghz.

8 ppn means 8 MPI processes per node.  The non-coalesced message rate
performance of these adapters scales pretty linearly from 1 to 8 cores.
That is not the case with all modern DDR adapters.

Benchmark = OSU Multiple Bandwidth, Message Rate benchmark, osu_mbw_mr.c
The above performace results can be had with either MVAPICH 1.0 or
QLogic MPI 2.2 (other MPIs are in the same ballpark with these
adapters). 

Note that MVAPICH 0.9.9 had meassage-coalescing on by default, and
MVAPICH 1.0 has it off by default.  There must be a reason.

Revisiting:
>
> Bonding (or multi-rail) does not make sense with "standard IB" in PCIe
> x8 since the PCIe connection limits the transfer rate of a single
> IB-Link already. 

Some 4-socket motherboards have independent PCIe buses to x8 or x16
slots.  In this case, multi-rail does make sense.  You can run the
QLogic adapters as dual-rail without bonding.  On MPI applications, half
of the cores will use one adapter and half will use the other.   Whether
the more expensive dual-rail arrangement is necessary and/or
cost-effective would be very application-specific.

Regards,
-Tom Elken
 

 

 

 

>

>

> Infinipath and ConnectX are available as DDR Infiniband and provide a

bandwidth of more than 1800 MB/s 

 

Good suggestion.   





More information about the Beowulf mailing list