going nutty with permission errors etc.!
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Eric Linenberg elinenbe at umich.eduMon Aug 6 14:35:06 PDT 2001
- Previous message: Channel-bonding progress
- Next message: Network Boot with 905C-TX
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I am trying to run LS-DYNA on a 5 node cluster. Each node is a dual processor and for some reason it does not work! The program is compiled for both LAM and MPICH and I have both of these installed on my system. The example programs for both libraries work flawlessly, and all securities are turned off everywhere on the cluster. I am running RedHat 7.1 I can 'ssh hostname command' or 'rsh hostname command' and both work as expected. Every node mounts rw the ls-dyna dir, the mpich dir, the lam dir, and the /root and /home directories by nfs at boot time. I can't figure out wha tis wrong and I have been stuck here for a while. here is output that the smarter ones out there may be able to figure out something from (this is from lam_mpirun): [guest at kitkat lstc]$ mpirun -c 4 lam_mpp960 >>>>> Process 0 <<<<< >>>>> Signal 11 : Segmentation Violation <<<<< PGFIO/stdio: Permission denied PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13. In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line number 2097 ----------------------------------------------------------------------------- One of the processes started by mpirun has exited with a nonzero exit code. This typically indicates that the process finished in error. If your process did not finish in error, be sure to include a "return 0" or "exit(0)" in your C code before exiting the application. PID 4331 failed on node n0 with exit status 1. ----------------------------------------------------------------------------- PGFIO/stdio: Permission denied PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13. In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line number 2097 PGFIO/stdio: Permission denied PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13. In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line number 2097 bash: kill: (1245) - No such pid and here is the output from mpich_mpirun [guest at kitkat lstc]$ mpich_mpirun -np 4 mpich_mpp960 >>>>> Process 0 <<<<< >>>>> Signal 11 : Segmentation Violation <<<<< PGFIO/stdio: Permission denied PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13. In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line number 2097 rm_l_1_1257: p4_error: net_recv read: probable EOF on socket: 1 rm_l_2_1165: p4_error: net_recv read: probable EOF on socket: 1 PGFIO/stdio: Permission denied PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13. In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line number 2097 PGFIO/stdio: Permission denied PGFIO/stdio: Permission denied PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13. In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line number 2097 PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13. In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line number 2097 rm_l_3_1094: p4_error: net_recv read: probable EOF on socket: 1 bm_list_4667: p4_error: net_recv read: probable EOF on socket: 1 [guest at kitkat lstc]$ ---------------------- but if I am root and run the mpich_mpirun I get this: (this seems to work okay!!!!!) but if I go above -np 5 (remember I have 10 processors here) then it seems to hang for a long while!!! UG! Any help at all is apprecitate! Thanks, eric [root at kitkat lstc]# mpich_mpirun -np 5 -v mpich_mpp960 running /usr/local/lstc/mpich_mpp960 on 5 LINUX ch_p4 processors Created /usr/local/lstc/PI5843 Date: 08/06/2001 Time: 17:31:35 Executing with local workstation license ___________________________________________________ | | | Livermore Software Technology Corporation | | | | 7374 Las Positas Road | | Livermore, CA 94550 | | Tel: (925) 449-2500 Fax: (925) 449-2507 | | www.lstc.com | |_________________________________________________| | | | LS-DYNA, A Program for Nonlinear Dynamic | | Analysis of Structures in Three Dimensions | | Version: 960 Date: 07/22/2001 | | Revision: 447 Time: 14:04:43 | | | | Licensed to: Exponent Failure Analysis | | | | Platform : PC (MPICH-P4) | | OS Level : Linux 2.12 | | Hostname : kitkat | | Precision : Single precision (I4R4) | | | | Unauthorized use infringes LSTC copyrights | |_________________________________________________| please define input file names or change defaults : >
- Previous message: Channel-bonding progress
- Next message: Network Boot with 905C-TX
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
