[Fwd: Re: 32-port gigabit switch]
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jeff Layton jeffrey.b.layton at lmco.comFri Mar 7 09:46:42 PST 2003
- Previous message: [Fwd: Re: 32-port gigabit switch]
- Next message: [Fwd: Re: 32-port gigabit switch]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> I'm not trying to start a flame war, and I'm really curious. I suggest > that you're starting the flame war with your attacking tone and lack of > any facts (or even one example) backing up your statements. Just saying > "it depends" doesn't help the rest of us learn. When is Gigabit better? Where's RGB when you need him? :) I think enough people have pointed out that your statement is wrong. Have you looked in the beowulf archives? How about a googling? > In my experience the computation portion of a Beowulf will always > require low latencies for optimal performance. OK. We have 3 MPI applications. Two are internally written and one is from NASA. We have extensively tested these 3 codes with many varying data sets on all kinds of HPC equipment (Cray's, SGI Origin's, SP's, clusters, etc.). However, I'll focus on clusters (beowulf's in particular). We have tested on equipment with Myrinet, GigE, and FastE. The nodes were the same and only the network changed along with some tuning to get the best performance out of each. Here's what we have found: Code 1 - First internal code. Running on Myrinet compared to GigE only gives you about 20% better wall-clock time for some cases. For other cases, Myrinet is slower than GigE (still trying to explain that one :). Myrinet is about twice as fast as FastE. Observations - We think this code is more constrained by latency than bandwidth when you compare Myrinet and GigE. We have looked at the message sizes and they are fairly small (tiny). This pushes this code down the bandwidth/mesage size curve almost to the point where you measure latency. So latency appears to be a driver for this code. Also, not much overlapping communication/ computation in this code. Code 2 - Second internal code. Running on Myrinet compared to GigE is only about 3% faster for just about all cases. Myrinet is about twice as fast as FastE. Observations - Although we should see better performance with Myrinet compared to GigE due to better bandwidth, we think this code is limited by bandwidth instead of latency. The message sizes for this code are very large, pushing the code way up the bandwidth/ message size curve. We're still working on identifying all of the bottlenecks, but from a networking standpoint, this is what we have concluded so far. Also, not much overlapping communication/ computation in this code. Code 3 - NASA code. This code only runs about 2-3% faster on GigE and Myrinet compared to FastE. The code appears to be well thought out with respect to overlapping communication/computation. Obsverations - This code appears not to be constrained by either latency nor bandwidth. Disclaimer - There are lots of things I ignored in this simple analysis such as memory bandwith, etc. The data to support these observations also came from testing on other systems and on testing with other types of networking (Quadrics, Scali, etc.). All of the numbers are wall-clock times. With these general rules of thumb (we always test before we buy) and knowing the mix between the codes, we do a price/performance to configure the best system. Right now (and this is subject to change), GigE provides better price/performance for our code mix. Of course, this also depends on what GigE equipment we're talking about. I think Mark has pointed out in the past, as well as others, that not all GigE equipment is created equal (this is also generally true for FastE as well). However, for the GigE equipment we have tested on and also have in production we have found GigE is the way to go for us for our mix of codes. > On the other hand, when I have applications that need to transfer a lot > of data as well, I find that having two networks is the way to go. One > for control and messaging traffic (low latency - Myrinet) and one for > data traffic (high throughput - Gigabit). What kinds of applications? So you run control and MPI messsage traffic over Myrinet and NFS over GigE? Myrinet has better bandwidth than GigE, so it appears that if data transfer is important I would switch NFS to Myrinet and MPI traffic to GigE (unless of course you see a big difference in performance). If you do see a big difference in performance, what about using two Myrinet networks (trying to get you some sales Patrick! :)? If latency is that important, have you tried Quadrics? In our experience it has lower latencies than Myrinet. What MPI implementations have you tried? Do you run 1 ppn with single CPUs, or 1 ppn with SMP nodes, or 2 ppn with SMP nodes, or something else? All of things can have a large impact on performance. > If you would rather take it off list, then feel free to email me > directly, but I would really like to know because I can't think of one > example that works. I hope my response answered your question. Anybody care to present another example where bandwidth is more important than latency? Greg? Mark? RGB? Doug? Don? Jeff -- Dr. Jeff Layton Senior Engineer Lockheed-Martin Aeronautical Company - Marietta Aerodynamics & CFD "Is it possible to overclock a cattle prod?" - Irv Mullins This email may contain confidential information. If you have received this email in error, please delete it immediately, and inform me of the mistake by return email. Any form of reproduction, or further dissemination of this email is strictly prohibited. Also, please note that opinions expressed in this email are those of the author, and are not necessarily those of the Lockheed-Martin Corporation.
- Previous message: [Fwd: Re: 32-port gigabit switch]
- Next message: [Fwd: Re: 32-port gigabit switch]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
