[Fwd: Re: [Fwd: Re: 32-port gigabit switch]]

Dave Vehrs davidv at aspsys.com
Fri Mar 7 10:18:00 PST 2003

Oops, sent the reply just to Jeff.  Here's a copy for the list.

Dave V.

-----Forwarded Message-----

From: Dave Vehrs <davidv at aspsys.com>
To: jeffrey.b.layton at lmco.com
Subject: Re: [Fwd: Re: 32-port gigabit switch]
Date: 07 Mar 2003 10:54:24 -0700

On Fri, 2003-03-07 at 10:46, Jeff Layton wrote:
> > I'm not trying to start a flame war, and I'm really curious.  I suggest
> > that you're starting the flame war with your attacking tone and lack of
> > any facts (or even one example) backing up your statements.  Just saying
> > "it depends" doesn't help the rest of us learn.  When is Gigabit better?
>    Where's RGB when you need him? :) I think enough people have
> pointed out that your statement is wrong. Have you looked in the
> beowulf archives? How about a googling?

I'm still trying to work through the links RGB posted this morning, I'll
have to find some time this weekend.

> > In my experience the computation portion of a Beowulf will always
> > require low latencies for optimal performance.
>    OK. We have 3 MPI applications. Two are internally written and
> one is from NASA. We have extensively tested these 3 codes with
> many varying data sets on all kinds of HPC equipment (Cray's,
> SGI Origin's, SP's, clusters, etc.). However, I'll focus on clusters
> (beowulf's in particular).
>    We have tested on equipment with Myrinet, GigE, and FastE. The
> nodes were the same and only the network changed along with
> some tuning to get the best performance out of each. Here's what
> we have found:
> Code 1 - First internal code. Running on Myrinet compared to GigE
> only gives you about 20% better wall-clock time for some cases. For
> other cases, Myrinet is slower than GigE (still trying to explain that
> one :). Myrinet is about twice as fast as FastE.
> Observations - We think this code is more constrained by latency
> than bandwidth when you compare Myrinet and GigE. We have
> looked at the message sizes and they are fairly small (tiny). This
> pushes this code down the bandwidth/mesage size curve almost
> to the point where you measure latency. So latency appears to be
> a driver for this code. Also, not much overlapping communication/
> computation in this code.
> Code 2 - Second internal code. Running on Myrinet compared to
> GigE is only about 3% faster for just about all cases. Myrinet is
> about twice as fast as FastE.
> Observations - Although we should see better performance with
> Myrinet compared to GigE due to better bandwidth, we think this
> code is limited by bandwidth instead of latency. The message sizes
> for this code are very large, pushing the code way up the bandwidth/
> message size curve. We're still working on identifying all of the
> bottlenecks, but from a networking standpoint, this is what we
> have concluded so far. Also, not much overlapping communication/
> computation in this code.
> Code 3 - NASA code. This code only runs about 2-3% faster on
> GigE and Myrinet compared to FastE. The code appears to be well
> thought out with respect to overlapping communication/computation.
> Obsverations - This code appears not to be constrained by either
> latency nor bandwidth.
> Disclaimer - There are lots of things I ignored in this simple analysis
> such as memory bandwith, etc. The data to support these observations
> also came from testing on other systems and on testing with other
> types of networking (Quadrics, Scali, etc.). All of the numbers are
> wall-clock times.

OK. So to put a slightly different interpretation on it, under certain
applications, Gigabit and Myrinet can have equal performance.  So it
becomes a price issue.    
>    With these general rules of thumb (we always test before we
> buy) and knowing the mix between the codes, we do a price/performance
> to configure the best system. Right now (and this is subject to change),
> GigE provides better price/performance for our code mix.

I'm 100% with you on test before you buy.  We've seen some big
performance differences after simple changes.    

>    Of course, this also depends on what GigE equipment we're talking
> about. I think Mark has pointed out in the past, as well as others, that
> not all GigE equipment is created equal (this is also generally true for
> FastE as well). However, for the GigE equipment we have tested on
> and also have in production we have found GigE is the way to go for
> us for our mix of codes.

FYI, Aspen Systems is an authorized Myrinet reseller so my experience
may be heavily weighted in that direction.  Additionally the only large
clusters (up to 768 nodes) that I have worked on have been Myrinet.    

> > On the other hand, when I have applications that need to transfer a lot
> > of data as well, I find that having two networks is the way to go.  One
> > for control and messaging traffic (low latency - Myrinet) and one for
> > data traffic (high throughput - Gigabit).
>    What kinds of applications?
>    So you run control and MPI messsage traffic over Myrinet and
> NFS over GigE? Myrinet has better bandwidth than GigE, so
> it appears that if data transfer is important I would switch NFS
> to Myrinet and MPI traffic to GigE (unless of course you see a
> big difference in performance). If you do see a big difference in
> performance, what about using two Myrinet networks (trying to
> get you some sales Patrick! :)?

Never seen two Myrinet networks in one cluster...

>    If latency is that important, have you tried Quadrics? In our
> experience it has lower latencies than Myrinet. What MPI
> implementations have you tried? Do you run 1 ppn with single
> CPUs, or 1 ppn with SMP nodes, or 2 ppn with SMP nodes,
> or something else? All of things can have a large impact on
> performance.

Personally, I haven't gotten much time with Quadrics yet.  I hope to in
the near future.

> > If you would rather take it off list, then feel free to email me
> > directly, but I would really like to know because I can't think of one
> > example that works.
>    I hope my response answered your question. Anybody care to
> present another example where bandwidth is more important than
> latency? Greg? Mark? RGB? Doug? Don?

Yes, I think I understand better.  Still need to work through RGB's


Dave V.

> Jeff
> --
> Dr. Jeff Layton
> Senior Engineer
> Lockheed-Martin Aeronautical Company - Marietta
> Aerodynamics & CFD
> "Is it possible to overclock a cattle prod?" - Irv Mullins
> This email may contain confidential information. If you have received this
> email in error, please delete it immediately, and inform me of the mistake by
> return email. Any form of reproduction, or further dissemination of this
> email is strictly prohibited. Also, please note that opinions expressed in
> this email are those of the author, and are not necessarily those of the
> Lockheed-Martin Corporation.

David E Vehrs, System Engineer		Aspen Systems
davidv at aspsys.com			3900 Youngfield Street
Tel: +01 303 431 4606			Wheat Ridge CO 80033, USA
Fax: +01 303 431 7196			http://www.aspsys.com

More information about the Beowulf mailing list