[Beowulf] picking out a job scheduler

Nathan Moore ntmoore at gmail.com
Tue Jan 2 20:55:10 PST 2007


Torque was really easy to install, but it seems like my /etc/hosts  
file must be screwed up, as I can't get the cluster nodes to  
respond.  Specifically, within a cluster of 3 machines, each having  
an /etc/hosts file of:

	127.0.0.1       localhost.localdomain   localhost
	199.17.152.17   runner
	199.17.152.135  muscovey
	199.17.152.13   pekin
	(( other workstations follow ))

Now, when I have the pbs_server running on runner, and the pbs_mom  
daemons running on muscovey, pekin, and runner, I et the following  
status message,

	[root at runner torque-2.1.6]# pbsnodes -a
	pekin
	     state = down
	     np = 1
	     ntype = cluster

	muscovey
	     state = down
	     np = 1
	     ntype = cluster

	runner
	     state = down	
	     np = 1
	     ntype = cluster

I realize this is a pretty low-level question, but what the heck is  
wrong with my /etc/hosts file?

regards,

NT


ps,  the trouble shooting message given by torque is,

	[root at runner torque-2.1.6]# momctl -d 3

	Host: runner/runner   Version: 2.1.6
	WARNING:  server not specified (set $pbsserver)
	PID:                    30531
	HomeDirectory:          /var/spool/torque/mom_priv
	MOM active:             2518 seconds
	Server Update Interval: 45 seconds
	LOGLEVEL:               0 (use SIGUSR1/SIGUSR2 to adjust)
	Communication Model:    RPP
	TCP Timeout:            20 seconds
	NOTE:  no prolog configured
	Alarm Time:             0 of 10 seconds
	Trusted Client List:    199.17.152.17,127.0.0.1
	Configured to use /usr/bin/scp -rpB
	NOTE:  no local jobs detected

	diagnostics complete



- - - - - - - - - - - - - - - - - - - - - - -

Nathan Moore
Physics
Winona State University
nmoore at winona.edu
AIM:nmoorewsu

- - - - - - - - - - - - - - - - - - - - - - -


On Jan 2, 2007, at 7:23 PM, Chris Samuel wrote:

On Wednesday 03 January 2007 08:06, Chris Dagdigian wrote:

> Both should be fine although if you are considering *PBS you should
> look at both Torque (a fork of OpenPBS I think)

That's correct, it (and ANU-PBS, another fork) seem to be the defacto  
queuing
systems in the state and national HPC centers down here.

Torque is just *so* much better than OpenPBS used to be (not that it was
particularly hard).

cheers,
Chris
-- 
  Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
  Victorian Partnership for Advanced Computing http://www.vpac.org/
  Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http:// 
www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list