Beowulfs can compete with Supercomputers [was Beowulf: A theorical approach]
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Greg Lindahl glindahl at hpti.comSat Jun 24 20:24:49 PDT 2000
- Previous message: Beowulfs can compete with Supercomputers [was Beowulf: A theorical approach]
- Next message: Beowulfs can compete with Supercomputers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> Interesting, but, the SC667 is no beowulf. The SC667 is extremely similar to the IBM SP. The question was asking about a comparison between an AlphaLinux/Myrinet cluster and an IBM SP. > First, a plot of linear speed up would put into perspective > exactly how this code scales. That tail off tells me why > you're interested in high speed interconnects, eg., myrinet. Not really; mm5's scaling is hurt by load imbalance more than interconnect. > This leads us into comparing system interconnects. As an > example of SPEC in action, vs this chart, the SGI O2 400 > manages to outperform the SP WH2 and matches the ACL/667, > even though SPEC says it shouldn't. That could be because it's extremely unlike SPEC. It may be scaling like one of the component benchmarks in SPEC, which are pretty wildly different. I assure you that the ACL/667 beat the snot out of the O2 400 mhz on the overall FSL benchmarks. > That is most likely > due to this code not being able to keep the cpus busy doing > useful work. No, I measured for that, and it's a load imbalance. Yes, pretty much everyone's high-speed interconnects use busy-wait loops when blocking for messages, so CPU utilization %'s aren't useful for figuring out when someone's hung. The mpich ch_p4 device does that on the sending side, for example, if it can't get all the data into the kernel buffer. So I used an mpi profiling gizmo, and compared the application cpu time for various nodes. > One last comment. In the notes section, the SC667 is actually > using only one or two cpus per 4 cpu node. That would indicate > the SC667 nodes run out of bandwidth somewher since they chose not > to post the 4 processors per node run. It is not considered correct by most folks in the industry to run a benchmark that way. Most RFPs and formal benchmark situations prohibit such runs, unless the actual product is a 4 cpu chassis with 2 cpus in it and 2 empty slots. -- g
- Previous message: Beowulfs can compete with Supercomputers [was Beowulf: A theorical approach]
- Next message: Beowulfs can compete with Supercomputers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
