[Beowulf] Performance characterising a HPC application
patrick at myri.com
Thu Mar 22 04:43:12 PDT 2007
Greg Lindahl wrote:
> On Wed, Mar 21, 2007 at 06:41:07AM -0400, Scott Atchley wrote:
>> I have not benchmarked any applications that need more than 250 MB/s
>> during computation,
> There is a large class of computations which alternate non-overlapped
> compute and communicate cycles. The average over the lifetime of the
> process can be ~ 100 MB/s, but that could easily mean it is sending
> at 1 GByte/s for 10% of the runtime. If you slow down communications
> by a factor of 4, the app will run a lot slower.
In codes that alternate computation and communication (naive but most
common design), the communication phase is usually much smaller that the
computation, unless the problem is imbalanced or too small. In my
experience, 10% of the runtime is realistic, but at 250 MB/s. So, when
you bump the network to 1 GB/s, you only gain 7.5% of runtime. That's
why 1G (GigE) is fine for a lot of cases, 2G is interesting for a subset
and 10G for even less apps.
As everybody is communicating at the same time, you need to increase the
network bandwidth when you increase the number of cores, to keep
constant resources per core. However, these codes are often bounded by
contention rather than bandwidth. The communication pattern is
all-to-all or, worse, a careless hand-made exchange. Even with a clean
all-to-all, slight imbalances easily create contention in the network
(N->1 contention), and the backpressure flow-control propagates it
everywhere as everybody is communicating at the same time. Bigger pipes
helps contention a bit, but not much.
People doing their homework are still buying more 2G than 10G today,
because of better price/performance for their codes (and thin cables).
More information about the Beowulf