[Beowulf] Re: TORQUE issues

Marc Noguera Julian marc at klingon.uab.cat
Sun Apr 13 12:38:00 PDT 2008


Hi,

I have torque installed on my cluster (40 nodes amd opteron) and is running
fine. My advice is to change names of the nodes as you indicate. That is,
beginnin with a letter as a general practice as Reuti says. My experience with
torque is that you can't submit a job with the "name" parameter (qsub -N)
beginning with a number, so i expect a similar behaviour with node names.

Hope it helps
Marc

------------------------------------------------------
Marc Noguera i Julian, PhD
System Manager / Researcher
Despatx C7-149. Edifici Cn.
Campus UAB. Bellaterra
08193. Barcelona
email: marc_at_klingon.uab.es
web: http://klingon.uab.es/marc
Tlf/Phone: 00 34 935812173
-------------------------------------------------------
> 
> Message: 2
> Date: Sun, 13 Apr 2008 19:53:53 +0200
> From: Reuti <reuti at Staff.Uni-Marburg.DE>
> Subject: Re: [Beowulf] TORQUE issues
> To: "Lance S. Jacobsen" <lance at gohypersonic.com>
> Cc: Beowulf at beowulf.org
> Message-ID:
> 	<22C3BC2E-BFF3-41AE-9477-196A02958541 at staff.uni-marburg.de>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
> 
> Hi,
> 
> Am 13.04.2008 um 04:52 schrieb Lance S. Jacobsen:
> > I recently put together a small cluster of Xeons using CentOS 5.1  
> > x86_64.  This cluster is my first real big experience with Linux  
> > and administration. It took some learning and such to install NIS,  
> > NFS, etc., but now the machines seem to be working well, and so I  
> > am working on the next step: installing a que scheduler. I decided  
> > on TORQUE 2.3.0 since its free and I don't know any better. I have  
> > installed this and am having trouble getting it to detect my nodes.
> >
> > I think the problem is that I named them starting with numbers in  
> > my /etc/hosts file: 1of12 , 2of12, ... 12of12. Instead of something  
> > like node01, node02, ...
> >
> > After the installation, TORQUE did not create a file called 'nodes'  
> > which it told me that I needed, and so after searching the web I  
> > found the command to create it:
> >
> > # qmgr -c "create node 2of12"
> >
> > When I do this it gives me the following reply:
> >
> > qmgr: syntax error - checklist failed
> > create node 2of12
> >                   /\
> >
> > If I do this naming my node with a letter in front (n2of12) then it  
> > seems to work and generate the nodes file.
> >
> > Now if I then go and do the "pbsnodes -a" command it tells me:
> >
> > n2of12
> >
> > state = down
> > np =1
> > ntype = cluster
> >
> > seems fine... should be down since there is no n2of12 in my hosts  
> > file.
> >
> > Now if I then go and rename the node in the node file back to 2of12  
> > and type the following to kill and restart the server:
> >
> > # qterm
> > # pbs_server
> >
> > I get the following reply:
> >
> > PBS_Server: pbsd_init(setup_nodes), token "2of12" doesn't start  
> > with alpha on line 1.
> >
> > PBS_Server: PBS_Server, pbsd_init failed
> >
> > Now I am reluctant to go and change all of my node names (IP  
> > aliases) since everything else about my cluster is finally working  
> > well and so I have been trying to find out why pbsd_init will not  
> > accept host names that start with numbers. Also, I would hate to go  
> > and change this if it is not the problem.
> >
> > Does anyone know if I might be able to edit the setup files  
> > associated with pbsd_init to get this to work (or any other ways to  
> > do this)?
> 
> I wouldn't use in general a digit as first charcter, like it's  
> outlined here:
> 
> http://rfc.net/rfc1178.html page 4.
> 
> Some programs might simply check the first character to decide  
> whether it's a hostname or TCP/IP address. Thinking in long terms 
> and  additional software in your cluster (maybe even parallel apps), 
> I  would suggest to change the names of the machines.
> 
> -- Reuti
> 
> BTW: Torque has a list on its own at: http://www.clusterresources.com
> 
> > Thanks,
> >
> > Lance
> >
> > -- 
> > Lance S. Jacobsen, Ph.D.
> > President
> > GoHypersonic Incorporated
> > 714 E. Monument Ave., Suite 201
> > Dayton, OH 45402-1382
> > Tel: 937-531-6678
> > Fax: 937-531-6679
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit  
> > http://www.beowulf.org/mailman/listinfo/beowulf
> 
> ------------------------------
> 
> _______________________________________________
> Beowulf mailing list
> Beowulf at beowulf.org
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> End of Beowulf Digest, Vol 50, Issue 24
> ***************************************







More information about the Beowulf mailing list