Basic channel bonding question/problem

Martin Siegert siegert at sfu.ca
Mon Feb 18 14:05:34 PST 2002


On Mon, Feb 18, 2002 at 04:22:36PM -0500, rhixon wrote:
> 
> I'm trying to compare a 3x100T channel bonding vs. a single 1000T connection.  The hardware/software for the channel bonding is:  6 Dell P4 1700/1800 boxes, all with Red Hat 7.2.  Each box has 3 3Com 3C905B NIC cards, bonded together.  I have them connected via 3 DLink 16-port 100T switches.  The program I'm running is a Fortran 90 code using MPI; the MPI I'm using is LAM-6.5.6.  The compiler is the Intel IFC 6.0 beta.
> 
> Here's the problem:  Using blocking sends/recvs (MPI_SEND), the code works fine, and performance is pretty good.  However, when using nonblocking sends/recvs (MPI_ISEND), the code runs for a bit and appears to lock up semi-randomly (not at the same place every time).  
> 
> The code works fine on a single channel 100T or 1000T.  It duplexes fine on a single channel, but on the channel bonding, when it does run, the performance is no better than the blocking calls (appears not to be duplexing).
> 
> OK, so is this normal?  Is there anything in the hardware/software I can fix?  If I can't use nonblocking communications, my code is in serious trouble -- better to find out now with only 6 machines before I buy the next 64.
> 

1) If you just want to benchmark 3x100T vs. 1000T, the best thing is
   netpipe:

   www.scl.ameslab.gov/netpipe/

2) If you are more interested in reliable MPI performance: you may have
   run into a problem that I have not been able to solve although I'd
   spend a huge amount of time debugging it: Some programs that do
   nonblocking send/recv just hang under LAM. The problem appears randomly
   (as in your case), however, (just by running the same code repeatedly)
   you can make a program hang with probability close to 1.
   The fix is actually quite simple: use MPICH (version 1.2.2 or later).
   At least in the case I've looked at it never happened with MPICH.

Cheers,
Martin

========================================================================
Martin Siegert
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6
========================================================================



More information about the Beowulf mailing list