[Beowulf] running MPICH on AMD Opteron Dual Core Processor Cluster( 72 Cpu's)

Chris Samuel csamuel at vpac.org
Wed Jan 3 14:59:58 PST 2007


On Thursday 04 January 2007 02:53, Mark Hahn wrote:

> personally, I'm pretty convinced that MPI implementations should stay
> out of the jobstarter business, and go with straight agentless (ssh-based)
> job spawning.

Noooooo...  please not ssh again, make the pain go away!

Seriously though, this is what the PBS TM interface is for (not used SGE, so I 
don't know if it has a similar interface, I'd be surprised if it didn't)..

The TM interface is important as it means Torque can keep a close beady eye on 
the MPI processes spawned and kill off the processes when needed (which all 
too often get left behind otherwise and need hacks like epilogue scripts to 
fix).

It also stops users changing their previous 32 CPU job script to ask for 4 
CPUs in the queue and then forgetting to change the -np parameter for mpirun 
as well.   Nodes don't like that sort of load.

cheers!
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20070104/6866f1bf/attachment.sig>


More information about the Beowulf mailing list