[Beowulf] WRF model on linux cluster: Mpi problem

John Hearns john.hearns at streamline-computing.com
Mon Jul 4 00:48:47 PDT 2005


On Fri, 2005-07-01 at 09:38 +0200, Federico Ceccarelli wrote:
> yeas, 
> 
> I will remove openmosix. 
> I patched the kernel with openmosix because I used the cluster also for
> other smaller applications, so the load balance was useful to me.
> 
> I already tried to switch off openmosix with
> 
> > service openmosix stop
Having a small amount of Openmosix experience, that should work.

Have you used the little graphical tool to display the loads on each
node? (can't remember the name).

Anyway, I go along with the earlier advice to look at the network card
performance.
Do an lspci -vv on all nodes to check that your riser cards are running
at full speed.

What I would do is break this problem down.
Start by running the Pallas benchmark, on one node, then two, then four
etc. See if a pattern develops.
The same with your model, if it is possible to cut down the problem
size. Run on one node (two processors), then two then four.




More information about the Beowulf mailing list