[Beowulf] How to Diagnose Cause of Cluster Ethernet Errors?
ballen at gravity.phys.uwm.edu
Mon Apr 2 13:58:06 PDT 2007
Fun to see you here!! I was just looking through some old Goleta pictures
Just for kicks have a look at these figures:
This was part of a study that we did to select edge switches for the NEMO
cluster. We were able to find sub-$100 switches that were wire speed up
to MTUs of about 6k.
There was a big difference between similar looking cheap switches from
various companies. And indeed, 'under the hood' they all used integrated
chip sets from a handful of chip vendors.
Here are some more testing results from different edge switches:
(Note: our processing is embarassingly parallel, so we are primarily
building compute farms. We don't need very high bandwidth very low
latency connections, eg infiniband or myrinet performance.)
On Sun, 1 Apr 2007, Jon Forrest wrote:
> Douglas Eadline wrote:
>> I am constantly amazed at how many people buy the
>> latest and greatest node hardware and then connect
>> them with a sub-optimal switch (or cheap cables), thus reducing
>> the effective performance of the nodes (for parallel
>> applications). Kind "penny wise and pound foolish" as they say.
> I sincerely appreciate all the comments about my problem. I will reply
> to them in due time. However, I'd like to comment on this, which
> admittedly is off-topic from my original posting.
> I don't disagree with what you're saying. The problem is how
> to recognize "sub-optimal" equipment. For example, I see
> three tiers in ethernet switching hardware:
> 1) The low-end, e.g. Netgear, Linksys, D-link, ...
> 2) The mid-end, e.g. HP Procurve, Dell, SMC, ...
> 3) The high-end, e.g. Cisco, Foundry, ...
> What I, as a system manager, not as an Electrical Engineer,
> have trouble understanding, is what the true differences
> are between these levels, and, at one level, between
> the various vendors.
> These days I suspect that many of the vendors are using
> ASICs made by other chip companies, and the many vendors
> use the same ASICs. Assuming that's true, where's the
> added value that justifies the cost differences? Sometimes
> the value is in the "management" abilities of a device.
> I don't deny this can be a major selling point in a
> large enterprise environment, but in a 30-node cluster,
> or a small LAN, it's hard to justify paying for this.
> In terms of ethernet performance, once a device
> can handle wirespeed communication on all ports,
> where's the added value that justifies the added
> cost? I'm looking for empirical answers, which
> aren't always easy to find, and sometimes to understand.
> In the case of my cluser, it was configured and purchased
> before I got here, so I had nothing to do with choosing
> its components but I have to admit that I'm not
> sure what I would have done differently.
> Jon Forrest
> Unix Computing Support
> College of Chemistry
> 173 Tan Hall
> University of California Berkeley
> Berkeley, CA
> jlforrest at berkeley.edu
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf