Beowulfs can compete with Supercomputers [was Beowulf: A theorical approach]

Sat Jun 24 20:24:49 PDT 2000

> Interesting, but, the SC667 is no beowulf.

The SC667 is extremely similar to the IBM SP. The question was asking about
a comparison between an AlphaLinux/Myrinet cluster and an IBM SP.

> First, a plot of linear speed up would put into perspective
> exactly how this code scales. That tail off tells me why
> you're interested in high speed interconnects, eg., myrinet.

Not really; mm5's scaling is hurt by load imbalance more than interconnect.

> This leads us into comparing system interconnects. As an
> example of SPEC in action, vs this chart, the SGI O2 400
> manages to outperform the SP WH2 and matches the ACL/667,
> even though SPEC says it shouldn't.

That could be because it's extremely unlike SPEC. It may be scaling like one
of the component benchmarks in SPEC, which are pretty wildly different. I
assure you that the ACL/667 beat the snot out of the O2 400 mhz on the
overall FSL benchmarks.

> That is most likely
> due to this code not being able to keep the cpus busy doing
> useful work.

No, I measured for that, and it's a load imbalance. Yes, pretty much
everyone's high-speed interconnects use busy-wait loops when blocking for
messages, so CPU utilization %'s aren't useful for figuring out when
someone's hung. The mpich ch_p4 device does that on the sending side, for
example, if it can't get all the data into the kernel buffer. So I used an
mpi profiling gizmo, and compared the application cpu time for various
nodes.

> One last comment. In the notes section, the SC667 is actually
> using only one or two cpus per 4 cpu node. That would indicate
> the SC667 nodes run out of bandwidth somewher since they chose not
> to post the 4 processors per node run.

It is not considered correct by most folks in the industry to run a
benchmark that way. Most RFPs and formal benchmark situations prohibit such
runs, unless the actual product is a 4 cpu chassis with 2 cpus in it and 2
empty slots.

-- g