newbie: still pvm problems.

Georgia Southern Beowulf Cluster Project gscluster at hotmail.com
Mon Sep 18 07:03:08 PDT 2000


Hello,

I've sent a previous e-mail entitled "newbie: rsh and pvm problems" that 
should be available in last week's mailings.  With your help I've been able 
to solve the rsh problem.  It turned out to be that the wrong permissions 
were set in a couple PAM files.  However, pvm will not start on my nodes.  
These nodes use the etherboot package to boot from a floppy, and NFS mount 
their root filesystem from a server node.  Each node has a unique filesystem 
with the exception of the /home directory, which they share with the server, 
all other nodes, and our development workstations.  This helps make sure 
that user profiles and files are consistant within the cluster.  Also, the 
/tmp directory has permissions 1777 so anyone can write to it.  In testing 
I've set up my $PVM_TMP to point to /tmp/<username> so that I can avoid 
seeing other users pvmd and pvml files.  This is all to describe my setup.  
Now to the problem.

When I manually login (not remote, but with keyboard and monitor) a node and 
try to start pvm with just typing "pvm" it gives the following lines:

libpvm [pid #] /tmp/<username>/pvmd.<uid>: No such file or directory
libpvm [pid #]: Console: can't start pvmd

My directory exists and there is not a thing there, which is verified 
because I just created it.  Furthermore, my pvml.<uid> file is created with 
the following comments:

pvmd[pid #] date time mksocs() socket loclsock: Invalid argument
pvmd[pid #] date time pvmbailout(0)

When starting pvm remotely by addining it into an already running daemon 
(say adding a node from our server node) I recieve the following messages:

PVM Daemon Files Found on <node>!

And it proceeds to tell me to delete any present pvmd.<uid> files or socket 
files.  However, nothing is present in my /tmp/<username> directory or the 
/tmp directory (having been freshly deleted before trying this).  Could 
these errors be the result of NFS, or is there another file that is causing 
the problems.  I created the nodes from the /bin, /sbin, /lib, and other 
directories of our server node tar'ed into a template directory, the 
executables and libraries of which are hard linked by all other nodes.  
Additionally, I use the bash/bash2 shell and all of my pvm variables are 
declared in the .bashrc file in my $HOME directory.

Any help or guidance is very much appreciated.

Thank you,

Wes Wells

<><><><><><><><><><><><><><><><><><>
Georgia Southern University
Beowulf Cluster Project
gscluster at hotmail.com

_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.

Share information about yourself, create your own public profile at 
http://profiles.msn.com.





More information about the Beowulf mailing list