[Beowulf] Re: TOE on Linux
landman at scalableinformatics.com
Tue May 20 12:10:26 PDT 2008
Greg Lindahl wrote:
> Joe Landman wrote:
>> Contrary to the detractors of the technologies comments, the
>> TOE/RDMA card *did* provide fairly significant performance delta for
>> real apps running MPI over gigabit ethernet.
> As a detractor of TOEs, I should point out that one data point does
> not prove that it's common that apps get a benefit.
True, however it does point out that it is possible to get better
> I'd be willing to bet that this app was doing extremely large
> transfers, and maybe even managed to get more concurrency with the
StarCD. Not big transfers, it doesn't move GB to its nodes.
> TOE... which could easily be a flaw in the MPI implementation's TCP
> driver, a pretty common thing to be wrong. For example, LAM was always
Yes this could be possible.
> much better than MPICH over TCP, and I wouldn't be surprised if
> OpenMPI continues this superiority over MPICH-2.
Minor issues with OpenMPI and things like Overflow, but other than that,
it does work extremely well.
> The most interesting thing, to me, is that the various people selling
> TOEs in the HPC arena publish almost no benchmarks. What's the message
> rate and N1/2? The only N1/2 I've ever seen published was 100 kbytes.
What concerns me less than microbenchmarks are the issues of real
application wallclock differences. Frankly we have seen far too many
microbenchmarks pushed where real applications are avoided.
For this test, on 16 machines, with 2 processors per machine, the StarCD
run was about 4x better on the TOE/RDMA Ammasso card than it was over
this exact same infrastructure without the TOE/RDMA. Every MPI
application we ran showed some similar behavior (Fluent, etc).
As Ammasso is out of business, this is sadly nothing we could really use
Mark Hahn and others pointed out that the CBA for this may not work
well, and I agree. The cost of TOE/RDMA honestly does not look like it
provides significant benefits in HPC relative to other technologies.
There may be some specific corner cases where it does, but I think the
hardware has improved, and baseline SDR IB is quite competitive with TOE
that using TOE may not make much sense in many situations.
> (Obviously I'm not including Myricom in this bucket: they do publish
> -- greg
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf