[Beowulf] Performance characterising a HPC application

Christian Bell christian.bell at qlogic.com
Mon Mar 26 10:42:48 PDT 2007

On Mon, 26 Mar 2007, Gilad Shainer wrote:

> > Offload, usually implemented by RDMA offload, or the ability 
> > for a NIC to autonomously send and/or receive data from/to 
> > memory is certainly a nice feature to tout.  If one considers 
> > RDMA at an interface level (without looking at the 
> > registration calls required on some interconnects), it's the 
> > purest and most flexible form of interconnect data transfer.  
> > Unfortunately, this pure form of data transfer has a few caveats...
> When Mellanox refers to transport offload, it mean full transport
> offload - for all transport semantics. InfiniBand, as you probably 
> know, provides RDMA AND Send/Receive semantics, and in both cases 
> you can do Zero-copy operations. 
Zero-copy at the transport level doesn't translate into zero-copy in
the MPI application.  It would be disingenuous to lead people into
believing that zero-copy means "no copy at all" through the entire
software stack.

> This full flexibility provides the programmer with the ability to choose
> the best semantics for his use. Some programmers choose
> Send/Receive and some RDMA. It is all depends on their application. 

Vanilla send/receive and RDMA are arguably not the best semantics for
MPI, since MPI is a receiver-driven model.

Buying the screwdriver set with 288 bits doesn't mean it will include
the 5-pt torx bit you need to solve your problem (that's why my
seagate hard drive enclosure is still sealed tight!)

    . . christian

christian.bell at qlogic.com
(QLogic SIG, formerly Pathscale)

More information about the Beowulf mailing list