Running two MPI jobs simultaneously

Miska Le Louarn lelouarn at eso.org
Tue Dec 10 05:56:19 PST 2002


Dear all,

I am facing a strange problem (or "feature") related to either Linux or 
MPI or maybe their interaction.

I have written two programs, in C, which both use MPI.

I run these programs on a 6 node cluster of PCs, each PC running Linux. 
More precise hardware / software description at the end of this mail.

When I run one program (any of the two - with the command mpirun), 
everything goes fine, the program doesn't crash and provides the right 
result. All PCs work happily and everything seems to be ok.

I should say the two programs are completely independant (different 
executables and so on, I don't make any communication between the two...).

BUT when I try to run these two programs at the same time, one of them 
hangs. It just stops doing anything and sits there without crashing 
until the other program is completed. Then it starts to work again.

I am surprised by this behavior. I would have expected that both 
programs run independantly, slower (because they share resources like 
network and CPU) but still run. Now this one program hogs all resources 
and the other one just sits there doing nothing.

I have also tried to run two copies of the first ("hog") process. Now 
one of the copies also freezes completely (but doesn't seem to restart 
once the hog process is finished).

What I am doing now to avoid the problem is to run the programs 
sequencially. It just would be conveniant sometimes to have the progs 
run at the same time - although slower.

I haven't tried running the two programms as two different users. I 
should maybe try that.

So does anybody have any idea why this is ? Is it a Linux scheduler 
"feature" related to the network communication between the nodes (if I 
launch 2 non-MPI jobs, I get the standard slow-down) ? Or maybe 
interference inside MPI between the two processes ?

Any tests I could do to see what is going on ?

Thanks in advance,

Miska

Cluster:

5 Nodes are Pentium IV, 1.8 GHz, with 1 GB of RAM, running Linux RH 7.3 
(stock kernel)
Master: Pentium Xeon 2 CPU, 1GB of RAM, RH 7.3 (stock SMP kernel)
All machines have a Gigabit network card, and we have a gigabit switch.

Software:
MPICH 1.2.3
Programs written in C, compiled with gcc 2.96-112 (have also tried gcc 
3.2 without any change).
The programs perform quite of lot of different operations, various 
computations, MPI communications, disk access on the local disk etc...




More information about the Beowulf mailing list