Basic channel bonding question/problem
rhixon at n2netmail.com
Mon Feb 18 13:22:36 PST 2002
I'm trying to compare a 3x100T channel bonding vs. a single 1000T connection. The hardware/software for the channel bonding is: 6 Dell P4 1700/1800 boxes, all with Red Hat 7.2. Each box has 3 3Com 3C905B NIC cards, bonded together. I have them connected via 3 DLink 16-port 100T switches. The program I'm running is a Fortran 90 code using MPI; the MPI I'm using is LAM-6.5.6. The compiler is the Intel IFC 6.0 beta.
Here's the problem: Using blocking sends/recvs (MPI_SEND), the code works fine, and performance is pretty good. However, when using nonblocking sends/recvs (MPI_ISEND), the code runs for a bit and appears to lock up semi-randomly (not at the same place every time).
The code works fine on a single channel 100T or 1000T. It duplexes fine on a single channel, but on the channel bonding, when it does run, the performance is no better than the blocking calls (appears not to be duplexing).
OK, so is this normal? Is there anything in the hardware/software I can fix? If I can't use nonblocking communications, my code is in serious trouble -- better to find out now with only 6 machines before I buy the next 64.
Thanks in advance!
More information about the Beowulf