[Beowulf] ethernet bonding performance comparison "802.3ad" vs Adaptive Load Balancing
kyron at neuralbs.com
Wed Sep 17 17:23:48 PDT 2008
Rahul Nabar wrote:
> On Wed, Sep 17, 2008 at 4:05 PM, Eric Thibodeau <kyron at neuralbs.com> wrote:
>> Well, apart from the fact that ssh is compressed and, as Digo pointed out and that 47 MB/sec is probably your HDD's transfer capacity as >Shannon pointed out, also keep in mind your bus's capacity ( http://en.wikipedia.org/wiki/List_of_device_bandwidths is a nice list). So, >unless you've got both NICs on PCI-E (or independant PCI channels, which I've only heard of in high-end Compaq servers with hotswap PCI >interfaces) you're saturating your bus.
> Thanks for all those responses guys! Eric; I'll check my bus speed; my
> server is not very high end. These are Dell Power Edge 1435's. But
> after I first posted I did a couple more debugs and diagnostics:
> (1) As Shannon pointed earlier, I did give netperf a shot now. Funny
> resut is this:
> If I netperf from Machine A to B I get only 1Gbps.
> If I start two netperfs on A and try to talk to B ; each gets 0.5Gbps.
> Thus aggregate of still 1 Gbps
> BUT if I start two netperfs on A and one talks to B and another to C
> each gets 1 Gbps. Thus I got an aggregate of 2 Gbps out [desired
> In the last situation if I disable one link then I fall back to 0.5
> Gbps each. So this is my (almost) perfect situation.
> Forces me to conclude that I am _not_ disk, bus nor I/O limited. What
> do you think?
> The sad thing though is this: I could never get a peer-to-peer (A
> talks to B alone) mode that would give me a 2 Gbps aggregated. This is
> frustrating. These are 8 cpu/node servers and frequently even a 16 cpu
> job will span across only 2 compute-nodes. Then if I cannot use both
> the eth cards it seems an awful waste of capacity. Just think about
> this: If two-processes talk from A-to-B I get 1 Gbps aggregate. But if
> I have two processes and just route one through a
> passive-forwarding-machine C (thus A-to-B and A-to-C-to-B) then I will
> end up with an aggregate of 2 Gbps. This seems a very strange,
> non-intuitive and undesirable outcome of the current bonding setup , I
> I might have to actually _force_ jobs to span more than two servers
> just to be able to use both my eth cards! Feels very strange to me.
> I tried both modes 4 and 6. Rick Jones, the netperf maintainer gave me
> a very promising suggestion that I might be able to modify my bonding
> hash algorithm so that it bonds traffic coming from two different
> processes originating on the same node. Currently I cannot. Anybody
> else has given this a shot?
> I'm eager to hear any other comments people might have.
Well, I don't have "bondable" hardware so I'm really interested in how
you technically manage this one at the end.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf