[Beowulf] Infiniband modular switches

Mon Jul 28 01:54:03 PDT 2008

On Sat, 2008-07-26 at 20:54 -0700, Nifty niftyompi Mitch wrote:
> On Thu, Jul 24, 2008 at 05:22:24PM -0700, Greg Lindahl wrote:
> > On Mon, Jul 14, 2008 at 01:42:07PM -0400, Patrick Geoffray wrote:
> > 
> > > AlltoAll of large messages is not a useless synthetic benchmark IMHO.
> > 
> > AlltoAll is a real thing used by real codes, but do keep in mind that
> > there are many algorithms for AlltoAll with various message sizes and
> > network topologies, so it's testing both the raw interconnect and the
> > AlltoAll implementation. I don't know of the results you mention were
> > run with an optimal AlltoAll... do you?
> 
> Is there a single "optimal AlltoAll"?
> 
> I can imagine a handful of ways to build an AlltoAll but I suspect that
> various cards, system, transports, switches, topologies ... each will
> act differently on different processors and memory systems.    Is there
> a collection of coded algorithms that can be built into the likes of
> OpenMPI?  If so a simple site hook to benchmark then pick/linkto one
> over another could follow.

If only it were that simple.  A basic AlltoAll is easy to implement but
getting it to work well is difficult and getting a single algorithm
which works well across a number of different topologies is extremely
difficult.  You forget two other variables, the message size and the
size of the communicator, both of which can vary within the same job
which effectively prevent there being a single optimum "site" algorithm.

AlltoAll *is* the hardest MPI function to implement well and in my view
it makes a good benchmark not just of the network but also of the MPI
stack itself, there is a good chance that if AlltoAll works well on a
given machine for a given job size then most other things will as well.

Ashley Pittman.