Network adapter.
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Nov 8 08:35:54 PST 2000
- Previous message: Network adapter.
- Next message: Cluster Monitoring software?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 8 Nov 2000, Jon Tegner wrote: > Thanks a lot for comments and suggestions! > > > I think that you have a problem with your switch : > > The maximum aggregate bandwidth available is too low to sustain high > > troughput transfer between more than 2 nodes. Which switch brand and > > model do you use for your cluster ? > > I have done some further testing, using netpipe > (http://www.scl.ameslab.gov/Projects/ClusterCookbook/nprun.html). That > application sends packages of increasing size between two nodes, and > there is a drastic reduction in throughput when the size of the block > approaches around 5800 bytes. For 5780 bytes speed is 67.45 Mbps and > for 5800 bytes it is 1.34 Mbps. For increasing sizes it stays low (at > least up till 1e6 bytes (where I stopped)). > > This is just for communication between two nodes, but since the > communication passes through the switch (D-link, des-3225G) it can > still be a result of a poorly configured one (no expert in that area), > but it can(?) still be a problem with the card. > > I'll change to different cards on two of the nodes to check it out. I've seen somewhat similar behavior on cheap netgear switches (FS108). See the graphs in the ALSC paper/talk on www.phy.duke.edu/brahma, and the discussion of latency vs bandwidth bounded communications. Note also the interesting differences between the netperf-based figure (the paper) and the bw-tcp based figure (the talk). The latter is more easily understandable in terms of some sort of e.g. buffer size boundary, but I haven't yet had time to go into the bw-tcp code to add buffer size as a parameter (it is a parameter in netperf, but netperf looks frankly "broken" in some way and is no longer apparently being actively maintained). I'll try to crank out a similar sweep with netpipe wrapped in my perl sweep script and generate a related figure. I too don't know what part of the clearly revealed problem is the switch, what part is the particular NIC, what part the TCP stack or the kernel itself. So much complexity, so little time... If you figure it out, let me know too -- I'd really like to be able to recover a smooth and understandable transition from latency bounded to bandwidth bounded behavior (see figure on www.phy.duke.edu/brahma/brahma.html). I haven't gotten smooth and predictable behavior like this since 2.2.x was introduced, and my peak network performance (on the same hardware) has never been the same either. I have no idea why, but would love to. rgb > > /jon > > _______________________________________________ > Beowulf mailing list > Beowulf at beowulf.org > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Network adapter.
- Next message: Cluster Monitoring software?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
