Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

going nutty with permission errors etc.!

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Eric Linenberg elinenbe at umich.edu
Mon Aug 6 14:35:06 PDT 2001


I am trying to run LS-DYNA on a 5 node cluster.  Each node is a dual 
processor and for some reason it does not work!  The program is compiled for 
both LAM and MPICH and I have both of these installed on my system.  The 
example programs for both libraries work flawlessly, and all securities are 
turned off everywhere on the cluster.  I am running RedHat 7.1

I can 'ssh hostname command' or 'rsh hostname command' and both work as 
expected.

Every node mounts rw the ls-dyna dir, the mpich dir, the lam dir, and the 
/root and /home directories by nfs at boot time.

I can't figure out wha tis wrong and I have been stuck here for a while.  
here is output that the smarter ones out there may be able to figure out 
something from (this is from lam_mpirun):

[guest at kitkat lstc]$ mpirun -c 4 lam_mpp960
>>>>>  Process 0  <<<<<
>>>>> Signal 11 : Segmentation Violation <<<<<
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
-----------------------------------------------------------------------------
 
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
 
PID 4331 failed on node n0 with exit status 1.
-----------------------------------------------------------------------------
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
bash: kill: (1245) - No such pid



and here is the output from mpich_mpirun

[guest at kitkat lstc]$ mpich_mpirun -np 4 mpich_mpp960
>>>>>  Process 0  <<<<<
>>>>> Signal 11 : Segmentation Violation <<<<<
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
rm_l_1_1257:  p4_error: net_recv read:  probable EOF on socket: 1
rm_l_2_1165:  p4_error: net_recv read:  probable EOF on socket: 1
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
PGFIO/stdio: Permission denied
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
rm_l_3_1094:  p4_error: net_recv read:  probable EOF on socket: 1
bm_list_4667:  p4_error: net_recv read:  probable EOF on socket: 1
 
[guest at kitkat lstc]$

----------------------

but if I am root and run the mpich_mpirun I get this:
(this seems to work okay!!!!!)  but if I go above -np 5 (remember I have 10 
processors here) then it seems to hang for a long while!!! UG!  Any help at 
all is apprecitate!  Thanks, eric



[root at kitkat lstc]# mpich_mpirun -np 5 -v  mpich_mpp960
running /usr/local/lstc/mpich_mpp960 on 5 LINUX ch_p4 processors
Created /usr/local/lstc/PI5843
      Date: 08/06/2001      Time: 17:31:35
 Executing with local workstation license
 
     ___________________________________________________
     |                                                 |
     |  Livermore  Software  Technology  Corporation   |
     |                                                 |
     |  7374 Las Positas Road                          |
     |  Livermore, CA 94550                            |
     |  Tel: (925) 449-2500  Fax: (925) 449-2507       |
     |  www.lstc.com                                   |
     |_________________________________________________|
     |                                                 |
     |  LS-DYNA, A Program for Nonlinear Dynamic       |
     |  Analysis of Structures in Three Dimensions     |
     |  Version:  960          Date: 07/22/2001        |
     |  Revision: 447          Time: 14:04:43          |
     |                                                 |
     |  Licensed to: Exponent Failure Analysis         |
     |                                                 |
     |  Platform   : PC (MPICH-P4)                     |
     |  OS Level   : Linux 2.12                        |
     |  Hostname   : kitkat                            |
     |  Precision  : Single precision (I4R4)           |
     |                                                 |
     |  Unauthorized use infringes LSTC copyrights     |
     |_________________________________________________|
 
 
  please define input file names or change defaults :
 >





More information about the Beowulf mailing list