Processes get SIGKILL after about 15 secs (Scyld)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Thomas Clausen tclausen at wesleyan.eduWed Mar 26 14:23:42 PST 2003
- Previous message: 1300 hosts in a GridEngine cluster
- Next message: sun grid engine?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, I have a problem I can't solve: We are running Scyld with kernel Linux version 2.4.17-0.18.18_Scyldsmp (support at builder.scyld.com) (gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 SMP Thu Jul 11 18:54:56 EDT 2002 on 70 nodes. 20 of them are newly acquired dual Athlons on Tyan 2466 boards. When I start a process on any of these nodes (ex: bpsh 64 sleep 500) they get a SIGKILL after about 15 secs: [pid 5685] nanosleep({500, 0}, 0) = -1 EINTR (Interrupted system call) [pid 5684] <... select resumed> ) = 3 (in [4 5 6], left {286, 370000}) [pid 5685] +++ killed by SIGKILL +++ rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0 close(4) = 0 read(5, "", 4096) = 0 close(5) = 0 read(6, "", 4096) = 0 close(6) = 0 wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], 0, NULL) = 5685 write(2, "bpsh: Child process exited abnor"..., 39bpsh: Child process exited abnormally. ) = 39 wait4(-1, 0xbffff548, 0, NULL) = -1 ECHILD (No child processes) _exit(255) = ? I have tried to find out where the signal comes from but without success. I can run memtest86 (booting from floppy) on the machines and the hardware seems to be running fine. I'm at a loss... Thomas -- .^. Thomas Clausen, graduate student /V\ Physics Department, Wesleyan University, CT // \\ Tel 860-685-2018, fax 860-685-2031 /( )\ ^^-^^ Use Linux
- Previous message: 1300 hosts in a GridEngine cluster
- Next message: sun grid engine?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
