Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Re: TORQUE issues

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Marc Noguera Julian marc at klingon.uab.cat
Sun Apr 13 12:38:00 PDT 2008


Hi,

I have torque installed on my cluster (40 nodes amd opteron) and is running
fine. My advice is to change names of the nodes as you indicate. That is,
beginnin with a letter as a general practice as Reuti says. My experience with
torque is that you can't submit a job with the "name" parameter (qsub -N)
beginning with a number, so i expect a similar behaviour with node names.

Hope it helps
Marc

------------------------------------------------------
Marc Noguera i Julian, PhD
System Manager / Researcher
Despatx C7-149. Edifici Cn.
Campus UAB. Bellaterra
08193. Barcelona
email: marc_at_klingon.uab.es
web: http://klingon.uab.es/marc
Tlf/Phone: 00 34 935812173
-------------------------------------------------------
> 
> Message: 2
> Date: Sun, 13 Apr 2008 19:53:53 +0200
> From: Reuti <reuti at Staff.Uni-Marburg.DE>
> Subject: Re: [Beowulf] TORQUE issues
> To: "Lance S. Jacobsen" <lance at gohypersonic.com>
> Cc: Beowulf at beowulf.org
> Message-ID:
> 	<22C3BC2E-BFF3-41AE-9477-196A02958541 at staff.uni-marburg.de>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
> 
> Hi,
> 
> Am 13.04.2008 um 04:52 schrieb Lance S. Jacobsen:
> > I recently put together a small cluster of Xeons using CentOS 5.1  
> > x86_64.  This cluster is my first real big experience with Linux  
> > and administration. It took some learning and such to install NIS,  
> > NFS, etc., but now the machines seem to be working well, and so I  
> > am working on the next step: installing a que scheduler. I decided  
> > on TORQUE 2.3.0 since its free and I don't know any better. I have  
> > installed this and am having trouble getting it to detect my nodes.
> >
> > I think the problem is that I named them starting with numbers in  
> > my /etc/hosts file: 1of12 , 2of12, ... 12of12. Instead of something  
> > like node01, node02, ...
> >
> > After the installation, TORQUE did not create a file called 'nodes'  
> > which it told me that I needed, and so after searching the web I  
> > found the command to create it:
> >
> > # qmgr -c "create node 2of12"
> >
> > When I do this it gives me the following reply:
> >
> > qmgr: syntax error - checklist failed
> > create node 2of12
> >                   /\
> >
> > If I do this naming my node with a letter in front (n2of12) then it  
> > seems to work and generate the nodes file.
> >
> > Now if I then go and do the "pbsnodes -a" command it tells me:
> >
> > n2of12
> >
> > state = down
> > np =1
> > ntype = cluster
> >
> > seems fine... should be down since there is no n2of12 in my hosts  
> > file.
> >
> > Now if I then go and rename the node in the node file back to 2of12  
> > and type the following to kill and restart the server:
> >
> > # qterm
> > # pbs_server
> >
> > I get the following reply:
> >
> > PBS_Server: pbsd_init(setup_nodes), token "2of12" doesn't start  
> > with alpha on line 1.
> >
> > PBS_Server: PBS_Server, pbsd_init failed
> >
> > Now I am reluctant to go and change all of my node names (IP  
> > aliases) since everything else about my cluster is finally working  
> > well and so I have been trying to find out why pbsd_init will not  
> > accept host names that start with numbers. Also, I would hate to go  
> > and change this if it is not the problem.
> >
> > Does anyone know if I might be able to edit the setup files  
> > associated with pbsd_init to get this to work (or any other ways to  
> > do this)?
> 
> I wouldn't use in general a digit as first charcter, like it's  
> outlined here:
> 
> http://rfc.net/rfc1178.html page 4.
> 
> Some programs might simply check the first character to decide  
> whether it's a hostname or TCP/IP address. Thinking in long terms 
> and  additional software in your cluster (maybe even parallel apps), 
> I  would suggest to change the names of the machines.
> 
> -- Reuti
> 
> BTW: Torque has a list on its own at: http://www.clusterresources.com
> 
> > Thanks,
> >
> > Lance
> >
> > -- 
> > Lance S. Jacobsen, Ph.D.
> > President
> > GoHypersonic Incorporated
> > 714 E. Monument Ave., Suite 201
> > Dayton, OH 45402-1382
> > Tel: 937-531-6678
> > Fax: 937-531-6679
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit  
> > http://www.beowulf.org/mailman/listinfo/beowulf
> 
> ------------------------------
> 
> _______________________________________________
> Beowulf mailing list
> Beowulf at beowulf.org
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> End of Beowulf Digest, Vol 50, Issue 24
> ***************************************







More information about the Beowulf mailing list