[Beowulf] evaluating FLOPS capacity of our cluster

Rahul Nabar rpnabar at gmail.com
Mon May 11 17:34:20 PDT 2009


On Mon, May 11, 2009 at 6:22 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
> Oops, I misunderstood what you said.
> I see now.  You are bonding channels on the your nodes' dual GigE
> ports to double your bandwidth, particularly for MPI, right?

Yes. Each node has dual gigabit eth cards.

> I am curious about your results with channel bonding.
> OpenMPI claims to work across two or more networks without the need
> for channel bonding.
> What MPI do you use?

We use OpenMPI. I've never really found a good way to measure my
performance. I've tested bandwidth and it definately improves. So also
file transfer times. But I have not tried any computation relevant
benchmarks. It also seems very much a function of what channel bonding
mode is used. The relevant modes seem to be able to split traffic only
when talking to two different hosts at the same time. So a strict peer
to peer communication is not affected at all. At least so far as I
understood those intricacies.

To set or not to set LAG groups on the switch was another factor. A
lot of that stuff seemed to be manufacturer specific on the switch.

> In any case, the single 24-48 port GigE switches (if of good brand)
> should have a single flat latency time between any pair of ports, right?

I think bonding improves bandwidth. But does not touch latency. My
brand's Dell. PowerConnect. Not sure if that fits "good brand"
definitions.



-- 
Rahul




More information about the Beowulf mailing list