[Beowulf] TORQUE issues

Reuti reuti at Staff.Uni-Marburg.DE
Sun Apr 13 10:53:53 PDT 2008


Hi,

Am 13.04.2008 um 04:52 schrieb Lance S. Jacobsen:
> I recently put together a small cluster of Xeons using CentOS 5.1  
> x86_64.  This cluster is my first real big experience with Linux  
> and administration. It took some learning and such to install NIS,  
> NFS, etc., but now the machines seem to be working well, and so I  
> am working on the next step: installing a que scheduler. I decided  
> on TORQUE 2.3.0 since its free and I don't know any better. I have  
> installed this and am having trouble getting it to detect my nodes.
>
> I think the problem is that I named them starting with numbers in  
> my /etc/hosts file: 1of12 , 2of12, ... 12of12. Instead of something  
> like node01, node02, ...
>
> After the installation, TORQUE did not create a file called 'nodes'  
> which it told me that I needed, and so after searching the web I  
> found the command to create it:
>
> # qmgr -c "create node 2of12"
>
> When I do this it gives me the following reply:
>
> qmgr: syntax error - checklist failed
> create node 2of12
>                   /\
>
> If I do this naming my node with a letter in front (n2of12) then it  
> seems to work and generate the nodes file.
>
> Now if I then go and do the "pbsnodes -a" command it tells me:
>
> n2of12
>
> state = down
> np =1
> ntype = cluster
>
> seems fine... should be down since there is no n2of12 in my hosts  
> file.
>
> Now if I then go and rename the node in the node file back to 2of12  
> and type the following to kill and restart the server:
>
> # qterm
> # pbs_server
>
> I get the following reply:
>
> PBS_Server: pbsd_init(setup_nodes), token "2of12" doesn't start  
> with alpha on line 1.
>
> PBS_Server: PBS_Server, pbsd_init failed
>
> Now I am reluctant to go and change all of my node names (IP  
> aliases) since everything else about my cluster is finally working  
> well and so I have been trying to find out why pbsd_init will not  
> accept host names that start with numbers. Also, I would hate to go  
> and change this if it is not the problem.
>
> Does anyone know if I might be able to edit the setup files  
> associated with pbsd_init to get this to work (or any other ways to  
> do this)?

I wouldn't use in general a digit as first charcter, like it's  
outlined here:

http://rfc.net/rfc1178.html page 4.

Some programs might simply check the first character to decide  
whether it's a hostname or TCP/IP address. Thinking in long terms and  
additional software in your cluster (maybe even parallel apps), I  
would suggest to change the names of the machines.

-- Reuti

BTW: Torque has a list on its own at: http://www.clusterresources.com


> Thanks,
>
> Lance
>
> -- 
> Lance S. Jacobsen, Ph.D.
> President
> GoHypersonic Incorporated
> 714 E. Monument Ave., Suite 201
> Dayton, OH 45402-1382
> Tel: 937-531-6678
> Fax: 937-531-6679
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list