Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Processes get SIGKILL after about 15 secs (Scyld)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Thomas Clausen tclausen at wesleyan.edu
Wed Mar 26 14:23:42 PST 2003


Hi,

I have a problem I can't solve:

We are running Scyld with kernel

Linux version 2.4.17-0.18.18_Scyldsmp (support at builder.scyld.com) (gcc
version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 SMP Thu Jul 11
18:54:56 EDT 2002

on 70 nodes. 20 of them are newly acquired dual Athlons on Tyan 2466 boards.
When I start a process on any of these nodes (ex: bpsh 64 sleep 500) they
get a SIGKILL after about 15 secs:

[pid  5685] nanosleep({500, 0}, 0)      = -1 EINTR (Interrupted system call)
[pid  5684] <... select resumed> )      = 3 (in [4 5 6], left {286, 370000})
[pid  5685] +++ killed by SIGKILL +++
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
close(4)                                = 0
read(5, "", 4096)                       = 0
close(5)                                = 0
read(6, "", 4096)                       = 0
close(6)                                = 0
wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], 0, NULL) = 5685
write(2, "bpsh: Child process exited abnor"..., 39bpsh: Child process exited
abnormally.
) = 39
wait4(-1, 0xbffff548, 0, NULL)          = -1 ECHILD (No child processes)
_exit(255)                              = ?

I have tried to find out where the signal comes from but without success.
I can run memtest86 (booting from floppy) on the machines and the hardware
seems to be running fine. I'm at a loss...

Thomas

-- 
   .^.    Thomas Clausen, graduate student
   /V\    Physics Department, Wesleyan University, CT
  // \\   Tel 860-685-2018, fax 860-685-2031
 /(   )\  
  ^^-^^   Use Linux



More information about the Beowulf mailing list