RH7.1,portmap, yp,nfs (again)

Georgia Southern Beowulf Cluster Project gscluster at hotmail.com
Thu Jun 21 15:01:14 PDT 2001


Hello,

I've sent out a previous message asking if any is having portmap problems 
with RH7.1.  Well, my machines just broke again today and I'm quite upset 
about it.  All of my machines are running a kickstarted install of RH7.1, 
firewalling turned off, yp for passwords and the auto.home files (and a few 
others), NFS and AutoFS to mount user directories and other working 
directories.  The kernel is 2.4.2 and I've not recompiled it from the 
default redhat kernel.  All machines are PII 350 or better.  I have four 
clusters configured in a tree pattern where each cluster of 16 computes is 
one server and fifteen nodes.  Each server is tied to other servers on a 
network, which also includes a sort of master server for the YP services, 
but it is not used for actual computing. Here is the problem.

Saturday: Made a fresh kickstart install of all 64 servers and nodes using 
RedHat 7.1 and NFS exporting the kickstart stuff.  Everything is cool.  I 
can immediatley lamboot and run one of the example programs (the cpi 
program) on each cluster individually.  NIS works fine.  AutoFS works fine.

Monday: Everything goes to hell.  When I add users or change NIS maps and 
then "cd /var/yp; make" every map tells me something about RPC portmap 
failure or can't export maps.  ypserv and ypbind are running fine on the 
master server (where these events take place).  When I run ypinit -s 
<server> on each of my YP slave servers, they say that they cannot enumerate 
maps, make sure it is running on <server>.  It is, I've quadruple checked 
it, even putting the master ypserv process in debug mode.  Also, going to 
single user mode and then coming back to console runlevel shows me that 
ypbind now cannot bind to the domain server, which IS running ypserv.  Also, 
even when I can bind to the master server on all the other servers, the 
nodes are not receiving the yp maps from their slave servers.  Also, NFS 
stuff goes haywire, complaining about RPC, portmap, and not being able to 
register or get slots.  Solution: go home, eat dinner, nap, come back 
refreshed.  When I come back a few hours later, EVERYTHING WORKS.  Its like 
nothing happened.  I an again lamboot and run the pi program on 64 machines.

Tuesday: Everything works.  New users are being added by the hour and I can 
lamboot and mpirun programs on all clusters.

Wednesday: Out sick and unable to monitor system, but I get no complaints 
from users.

Thursday: Everything is gone to hell again.  I first notice it when I add a 
new user and the "cd /var/yp; make" again complains about RPC and portmap.  
Again, the slave servers actually get the maps (though the master server is 
complaining that they do not), but the nodes bound to these servers are not 
getting anything.  They complain about YP_DOMAINNOTBOUD, but it definitely 
is.

Now, I'm at a complete loss for what to do next.  I'm trying different 
combinations of options in the yp.conf and ypserv.conf files and trying to 
take machines to single user mode and only bringing network, portmap, 
ypserv, and ypbind up.  I'm only getting error messages, but the config 
files seem proper.  My guess is that every couple days portmap is croaking, 
but how do I get real proof of this?  I'm also not seeing anything in the 
system logs that gives any help, except a news program activates everyday.  
I don't think it should interfere with portmap/YP/NFS. Any ideas?

Thanks,

Wes Wells
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com





More information about the Beowulf mailing list