Good mornign,

   We've got a Linux cluster with 64 nodes of dual PIII/850's connected
with Fast Ethernet. We've also installed OpenPBS 2.3pl2-1 throughout
the cluster.
   We have one node (penguin46) that only has the exechost RPM
installed. Jobs that include penguin46 will work for a day or two and
then all of a sudden any job using penguin46 will just hang. The actual
code starts running, but it will just hang after a period of time. The
PBS logs show nothing except job start, job end (or job kill if we do a
qdel). Also, I see nothing strange at all in /var/log/messages.
   I know this is a rather nebulous error description, but this is the best
we can find out in over two weeks of checking. We have reinstalled
and reconfigured PBS on penguin46 several times with no change in
behavior. We have checked the NIC by flooding it with ping packets
(I know it's not an extensive test, but it's a start :) and we starting to
look at the switch port as well.




