<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Arial; font-size: 12pt; color: #000000'><br>Larry Stewart wrote:<div><br><div><div>>Designing the communications network for this worst-case pattern has a</div><div>>number of benefits:  </div><div>></div><div>>   * it makes the machine less sensitive to the actual communications pattern</div><div>>   * it makes performance less variable run-to-run, when the job controller</div><div>>     chooses different subsets of the system</div><div><br></div><div>I agree with this and pretty much all of your other comments, but wanted to </div><div>make the point that a worst-case, hardware-only solution is not required</div><div>or necessarily where all of the research and development effort should</div><div>be placed for HPC as a whole.  And let's not forgot that unless they are supported</div><div>by some coincidental volume requirement in another non-HPC market,</div><div>they will cost more (sometimes a lot).  If worst-case hardware solutions were required then clusters</div><div>would not have pushed out their HPC predecessors, and novel high-end designs</div><div>would not find it so hard to break into the market. Lower cost hardware solutions often</div><div>stimulate the more software-intelligent use of the additional resources that come along</div><div>for the ride.  With clusters you paid less for interconnects, memory interfaces,</div><div>and packaged software, and got to spend the savings on more memory, more</div><div>memory bandwidth (aggregate), and more processing power.  This in turn</div><div>had an effect on the problems tackled, weak scaling an application was an</div><div>approach to use the memory while managing the impact of a  cheaper</div><div>interconnect.  </div><div><br></div><div>So, yes let's try to banish latency with cool state-of-the-art interconnects engineered</div><div>for worst-case, not common-case, scenarios (we have been hearing about the benefits of</div><div>high radix switches), but remember that interconnect cost and data locality and partitioning</div><div>will always matter and may make the worse-case interconnect unnecessary</div><div><br></div><div>>There's a paper in the IBM Journal of Research and Development about this,</div><div>>they wound up using simulated annealing to find good placement on the most</div><div>>regular machine around, because the "obvious" assignments weren't optimal.</div><div><br></div><div>Can you point me at this paper ... sounds very interesting ... ??</div><div><br></div><div>>Personally, I believe our thinking about interconnects has been poisoned by thinking</div><div>>that NICs are I/O devices.  We would be better off if they were coprocessors.  Threads</div><div>>should be able to send messages by writing to registers, and arriving packets should</div><div>>activate a hyperthread that has full core capabilities for acting on them, and with the</div><div>>ability to interact coherently with the memory hierarchy from the same end as other</div><div>>processors.  We had started kicking this around for the SiCortex gen-3 chip, but were</div><div>>overtaken by events.</div><div><br></div><div>Yes to all this ... now that everyone has made the memory controller an integral</div><div>part of the processor.  We can move on to the NIC ... ;-) ...</div><div><br></div><div>rbw</div><div><br></div></div></div></div></body></html>