Basic channel bonding question/problem
siegert at sfu.ca
Mon Feb 18 14:05:34 PST 2002
On Mon, Feb 18, 2002 at 04:22:36PM -0500, rhixon wrote:
> I'm trying to compare a 3x100T channel bonding vs. a single 1000T connection. The hardware/software for the channel bonding is: 6 Dell P4 1700/1800 boxes, all with Red Hat 7.2. Each box has 3 3Com 3C905B NIC cards, bonded together. I have them connected via 3 DLink 16-port 100T switches. The program I'm running is a Fortran 90 code using MPI; the MPI I'm using is LAM-6.5.6. The compiler is the Intel IFC 6.0 beta.
> Here's the problem: Using blocking sends/recvs (MPI_SEND), the code works fine, and performance is pretty good. However, when using nonblocking sends/recvs (MPI_ISEND), the code runs for a bit and appears to lock up semi-randomly (not at the same place every time).
> The code works fine on a single channel 100T or 1000T. It duplexes fine on a single channel, but on the channel bonding, when it does run, the performance is no better than the blocking calls (appears not to be duplexing).
> OK, so is this normal? Is there anything in the hardware/software I can fix? If I can't use nonblocking communications, my code is in serious trouble -- better to find out now with only 6 machines before I buy the next 64.
1) If you just want to benchmark 3x100T vs. 1000T, the best thing is
2) If you are more interested in reliable MPI performance: you may have
run into a problem that I have not been able to solve although I'd
spend a huge amount of time debugging it: Some programs that do
nonblocking send/recv just hang under LAM. The problem appears randomly
(as in your case), however, (just by running the same code repeatedly)
you can make a program hang with probability close to 1.
The fix is actually quite simple: use MPICH (version 1.2.2 or later).
At least in the case I've looked at it never happened with MPICH.
Academic Computing Services phone: (604) 291-4691
Simon Fraser University fax: (604) 291-4242
Burnaby, British Columbia email: siegert at sfu.ca
Canada V5A 1S6
More information about the Beowulf