[Beowulf] PBS scheduler error

Onur Destanoğlu odestanoglu at gmail.com
Thu Aug 18 06:16:59 PDT 2005


Hi all, 

 this is my script file:

#PBS -N firstscp
#PBS -l nodes=1:ppn=2
#PBS -l mem=4mb
#PBS -l walltime=1:00:00
#PBS -V
#PBS -m bea
#PBS -o /home/niyazi/cikislog
cd /home/niyazi
mpirun -np 2 first

this is the configuration of server and queue:

# Create queues and set their attributes.
#
#
# Create and define queue startup
#
create queue startup
set queue startup queue_type = Execution
set queue startup acl_user_enable = True
set queue startup acl_users = niyazi at bee00.bee-hive
set queue startup resources_default.mem = 400mb
set queue startup resources_default.ncpus = 2
set queue startup enabled = True
set queue startup started = True
#
# Set server attributes.
#
set server scheduling = True
set server managers = root at bee00.bee-hive
set server operators = niyazi at bee00.bee-hive
set server operators += root at bee00.bee-hive
set server default_queue = startup
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.mem = 9gb
set server resources_available.ncpus = 12
set server resources_max.mem = 9gb
set server resources_max.ncpus = 12
set server scheduler_iteration = 600
set server node_ping_rate = 300
set server node_check_rate = 150
set server tcp_timeout = 6
set server default_node = bee01
set server node_pack = True
set server job_stat_rate = 30

and these are the three mails that came after my execution command
(qsup firstscp)

1.
PBS Job Id: 31.bee00.bee-hive
Job Name:   firstscp
Begun execution

2.
PBS Job Id: 31.bee00.bee-hive
Job Name:   firstscp
Execution terminated
Exit_status=215
resources_used.cput=00:00:00
resources_used.mem=0kb
resources_used.vmem=0kb
resources_used.walltime=00:00:00

3.
PBS Job Id: 31.bee00.bee-hive
Job Name:   firstscp
File stage in failed, see below.
Job will be retried later, please investigate and correct problem.
Post job file processing error; job 31.bee00.bee-hive on host bee01/1+bee01/0

Unable to copy file 31.bee00.be.OU to bee00.bee-hive:/home/niyazi/cikislog
>>> error from copy
rcmd: getaddrinfo: Temporary failure in name resolution
>>> end error output
Output retained on that host in: /var/spool/torque/undelivered/31.bee00.be.OU

Unable to copy file 31.bee00.be.ER to bee00.bee-hive:/home/niyazi/firstscp.e31
>>> error from copy
rcmd: getaddrinfo: Temporary failure in name resolution
>>> end error output
Output retained on that host in: /var/spool/torque/undelivered/31.bee00.be.ER

So why my pretty system can not finish any job that i submit....




More information about the Beowulf mailing list