Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] PBS scheduler error

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Onur Destanoğlu odestanoglu at gmail.com
Thu Aug 18 06:16:59 PDT 2005


Hi all, 

 this is my script file:

#PBS -N firstscp
#PBS -l nodes=1:ppn=2
#PBS -l mem=4mb
#PBS -l walltime=1:00:00
#PBS -V
#PBS -m bea
#PBS -o /home/niyazi/cikislog
cd /home/niyazi
mpirun -np 2 first

this is the configuration of server and queue:

# Create queues and set their attributes.
#
#
# Create and define queue startup
#
create queue startup
set queue startup queue_type = Execution
set queue startup acl_user_enable = True
set queue startup acl_users = niyazi at bee00.bee-hive
set queue startup resources_default.mem = 400mb
set queue startup resources_default.ncpus = 2
set queue startup enabled = True
set queue startup started = True
#
# Set server attributes.
#
set server scheduling = True
set server managers = root at bee00.bee-hive
set server operators = niyazi at bee00.bee-hive
set server operators += root at bee00.bee-hive
set server default_queue = startup
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.mem = 9gb
set server resources_available.ncpus = 12
set server resources_max.mem = 9gb
set server resources_max.ncpus = 12
set server scheduler_iteration = 600
set server node_ping_rate = 300
set server node_check_rate = 150
set server tcp_timeout = 6
set server default_node = bee01
set server node_pack = True
set server job_stat_rate = 30

and these are the three mails that came after my execution command
(qsup firstscp)

1.
PBS Job Id: 31.bee00.bee-hive
Job Name:   firstscp
Begun execution

2.
PBS Job Id: 31.bee00.bee-hive
Job Name:   firstscp
Execution terminated
Exit_status=215
resources_used.cput=00:00:00
resources_used.mem=0kb
resources_used.vmem=0kb
resources_used.walltime=00:00:00

3.
PBS Job Id: 31.bee00.bee-hive
Job Name:   firstscp
File stage in failed, see below.
Job will be retried later, please investigate and correct problem.
Post job file processing error; job 31.bee00.bee-hive on host bee01/1+bee01/0

Unable to copy file 31.bee00.be.OU to bee00.bee-hive:/home/niyazi/cikislog
>>> error from copy
rcmd: getaddrinfo: Temporary failure in name resolution
>>> end error output
Output retained on that host in: /var/spool/torque/undelivered/31.bee00.be.OU

Unable to copy file 31.bee00.be.ER to bee00.bee-hive:/home/niyazi/firstscp.e31
>>> error from copy
rcmd: getaddrinfo: Temporary failure in name resolution
>>> end error output
Output retained on that host in: /var/spool/torque/undelivered/31.bee00.be.ER

So why my pretty system can not finish any job that i submit....




More information about the Beowulf mailing list