Switch recommendations?

Tue Nov 28 09:40:36 PST 2000

On Tue, 28 Nov 2000, Petr Ladislav Kodym wrote:

> 	Hi,
>
> >Cutthrough switches are much more expensive but also have much lower
> >latency.
>
> A year ago I've spent a lot of time shopping for a Fast-Ethernet
> cut-through switch. The result was --- there is none! All switches with
> cut-through capability in their feature list did cut-through only at
> 10Mbps, but they always fell back to store-and-forward at 100Mbps.
>
> It is quite logical, as Ethernet packets are ten times shorter at 100Mbps
> (therefore the time saving is much less significant) and cut-through at
> 100Mbps is much more demanding for switching fabric than at the lower
> speed.  It was pretty difficult to find this out, none really seemed to
> know it, especially not sales representatives and customer support
> personnel.
>
> So, is there some 100Mbps cut-through switch now? Which one?
> Or did I miss something a year ago?

I think you missed some a year ago.  Or maybe I found some that weren't
really there. Anyway, today right now, e.g.

http://www.cisco.com/warp/public/cc/pd/si/casi/ca1900/prodlit/s1928_ov.htm

"...gives network administrators a choice between lowest latency (..7
microseconds to 100 BaseT ports) and maximum error checking..."

The 1900 or 2820 even automatically uses store and forward for
broadcasts (to give you hub-like efficiency on broadcasts) while still
giving cut through performance on unicasts.

The catalyst 1900 series is just one of many of their swtiches, of
course, but I think you'll find that many of the really high end
switches from just about any vendor are cut through (or use some arcane
technology mix that delivers the equivalent <~10 microsecond latencies
but aren't properly either one) but don't say anything at all about
their forwarding mechanism and the lower end cheap switches are store
and forward and often (but not always) say so.

Let's see:  When I bought my Netgear FS108 a year or so ago I looked
hard at (IIRC) the FS508(?), which was cut through and cost 3x as much
for the same 8 ports (but claimed 15-20 microsecond latencies).  Didn't
buy it -- this was my money for a home beowulf.  I think that the list
discussion from the event is in the list archives, though.  The Allied
Telesyn AT-8118 offers "fragment free cut through or store-and-forward
switching mode".  Intel's 5x0 switches offer "Forwarding Mode:
Cut-through, fragment free, store-and-forward; each port self-tunes for
optimal performance" and a latency of 7.5 microseconds.

HP's 212 doesn't say what it uses but at 15 microsecond latency I
suspect a bad cut through or good S&F (since store and forward typically
advertises 75-80 microsecond latency and delivers twice that if you're
lucky in at least MY measurements in practice).  HP's ProCurve also
doesn't say what it uses but claims <10 microsecond latency.  So does it
matter?  Hmmm.

It may be that the distinction doesn't matter as much anymore -- with
faster memory and faster underlying switching hardware, perhaps high end
switches can store and forward and still deliver cut through latency
performance while delivering better QoS so they may no longer talk about
it.  As I understand, HP doesn't make their own high end switches --
Foundry Networks does -- so I should probably go there to see what their
switching technology is (if they say anything about it at all).  But I'm
tired of web-browsing so I'll just conclude with...

...and so forth.

So I'd have to say that either you didn't shop hard enough a year ago (I
myself was shopping-impaired many years ago -- but then I got married
and learned from a pro) or that you didn't use a decent web search
engine or go the right vendor sites.  Don't worry, you too can be
taught.  Christmas is a good time to practice;-)

However, you're right -- the HP example alone indicates that I should
stop worrying about what kind of switching technology a given switch is
based on and focus on just the latency issue itself, since that is what
matters.  Here is the corrected "rule":

Switches that advertise latency in the 10 microsecond range or less are
"good" (or at any rate "better":-) for fine grain packet traffic,
whatever the mechanism they use for switching (which might be some
proprietary mix of SnF and CT and black magic for all I really know
about the internal hardware of a switch -- I just plug them in and they
work, on a good day;-).  They also cost "more" (per port) and often
come with other possibly desireable features, e.g. manageability, VLAN
capabilities and so forth.

Switches that advertise 70-80 microsecond latencies are "bad" (or worse)
for fine grain traffic but probably work just perfectly for data
intensive big packet traffic where throughput is not interpacket latency
bound.  They cost "less" (per port) and quite often don't even feature
an on/off switch -- you plug them in, plug in the NICs, and Inshallah
they just work.

Both kinds of switch PROBABLY deliver considerably worse latencies than
they advertise in practice, but it is damn hard to get anything like a
reliable and consistent benchmark of a switch.  Partly this is because
point-to-point latency (which is what you measure with e.g. netperf or
lmbench or tcppipes) is a multifactorial measure that includes
>>nontrivial<< contributions from CPU, kernel, memory architecture, NIC
architecture, switch architecture, and system load on both ends.

It might well be that if I had sprung for the Netgear 508, for example,
I would have measured tcp latencies of only 120 microseconds instead of
180, since I measure 180 with my existing setup and the switch CLAIMS to
be responsible for only 80 of that.  A factor of 3 in greater cost for a
1/3 performance increase, rather than a factor of 3 in greater cost for
a 4x performance increase -- a pretty serious difference.  I suppose I
could connect the same two systems with a crossover cable and do the
same latency test and subtract to see if I do get 100 microseconds as my
SYSTEMs' contributions to the overall latency...

To All Folks out there with too much time and hardware to play with and
too little to do! (yeah, right...;-) please take note:

  One thing I'd really love to see is a reliable protocol (a set of
  tools and methods) for testing switch performance in the context of
  beowulfery developed and results accumulated and published online.
  If/when I ever have time I might try to tackle the former (if somebody
  hasn't already done so) and will guarantee a home for the latter if
  they do both on the brahma site and in the online beowulf book.

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu