[Beowulf] Q: IB message rate & large core counts (per node) ?
richard.walsh at comcast.net
richard.walsh at comcast.net
Fri Feb 26 11:16:07 PST 2010
Larry Stewart wrote:
>Designing the communications network for this worst-case pattern has a
>number of benefits:
> * it makes the machine less sensitive to the actual communications pattern
> * it makes performance less variable run-to-run, when the job controller
> chooses different subsets of the system
I agree with this and pretty much all of your other comments, but wanted to
make the point that a worst-case, hardware-only solution is not required
or necessarily where all of the research and development effort should
be placed for HPC as a whole. And let's not forgot that unless they are supported
by some coincidental volume requirement in another non-HPC market,
they will cost more (sometimes a lot). If worst-case hardware solutions were required then clusters
would not have pushed out their HPC predecessors, and novel high-end designs
would not find it so hard to break into the market. Lower cost hardware solutions often
stimulate the more software-intelligent use of the additional resources that come along
for the ride. With clusters you paid less for interconnects, memory interfaces,
and packaged software, and got to spend the savings on more memory, more
memory bandwidth (aggregate), and more processing power. This in turn
had an effect on the problems tackled, weak scaling an application was an
approach to use the memory while managing the impact of a cheaper
So, yes let's try to banish latency with cool state-of-the-art interconnects engineered
for worst-case, not common-case, scenarios (we have been hearing about the benefits of
high radix switches), but remember that interconnect cost and data locality and partitioning
will always matter and may make the worse-case interconnect unnecessary
>There's a paper in the IBM Journal of Research and Development about this,
>they wound up using simulated annealing to find good placement on the most
>regular machine around, because the "obvious" assignments weren't optimal.
Can you point me at this paper ... sounds very interesting ... ??
>Personally, I believe our thinking about interconnects has been poisoned by thinking
>that NICs are I/O devices. We would be better off if they were coprocessors. Threads
>should be able to send messages by writing to registers, and arriving packets should
>activate a hyperthread that has full core capabilities for acting on them, and with the
>ability to interact coherently with the memory hierarchy from the same end as other
>processors. We had started kicking this around for the SiCortex gen-3 chip, but were
>overtaken by events.
Yes to all this ... now that everyone has made the memory controller an integral
part of the processor. We can move on to the NIC ... ;-) ...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf