newbie: 16-node 500Mbps design

Mark Hahn hahn at coffee.psychology.mcmaster.ca
Mon Aug 28 13:34:21 PDT 2000


> > there seems to be quite a lot of urban legendry here.  I certainly
> > don't see any "TCP stalls" or know anyone who does.  perhaps on crappy
> 
> Go read http://www.icase.edu/coral/LinuxTCP2.html, then you'll know about
> someone who does.  I've also had problems with various 2.2.x kernels.  Getting

no.  Josip's (fine) works is a specific tuning for small-packet performance;
it violates the standards, or at least accepted practice for TCP.
that's fine for tweaking your cluster, but it does NOT show a general problem
with stalls.  it's a little unclear to me why he calls these events
"deadlocks", since afaikt, they're simply retransmit timeouts in TCP
terminology, part of TCP's congestion-avoidance heuristics.

there's nothing wrong with breaking TCP for intra-cluster performance;
TCP might even be the wrong basic design for a switched, never-congested,
single-hop network.

> the right driver for our tulip cards has been a pain too.  Each version will
> work with a different set of cards.  Go check the archives of this list and

I don't know of any card-specific tulip problems in modern (2.4) kernels.

> > old 2.2 kernels, but anyone who runs them deserves what they get.
> 
> I could also say anyone who runs experimental kernels on a production system
> deserves what they get.

oh, you mean excellent performance and improved stability?  
yes, you're right.  if you're satisfied with 2.2, good for you!  

but it's just plain dishonest to pretend that 2.2 is better than 2.4
because it's code is, uh, "more mature".  2.4 has several improvements
that are relevant to cluster computing, 
as well as fixes that result in better stability.

regards, mark hahn.





More information about the Beowulf mailing list