[Beowulf] Transient NFS Problems in New Cluster

Jon Forrest jlforrest at berkeley.edu
Tue Feb 2 14:00:37 PST 2010


I have a new cluster running CentOS 5.3.
The cluster uses a Sun 7310 storage server
that provides NFS service over a private
1Gb/s ethernet with 9K jumbo frames to the
cluster.

We've noticed that a number of the compute
nodes sometimes generate the

automount[15023]: umount_autofs_indirect: ask umount returned busy /home

message. When this happens the program running on the
node dies. This has happened between 10 and 20 times.
We're not sure what's going on on a node when this
happens. Most of the time everything is fine and
the home directories are automounted without problem.

I've googled for this problem and I see that other people
have seen it too, but I've never seen a resolution,
especially not for RHEL5.

The auto.master line for this mount is

/home  /etc/auto.home  --timeout=1200 
noatime,nodiratime,rw,noacl,rsize=32768,wsize=32768

The network interface configuration is

eth0      Link encap:Ethernet  HWaddr 00:30:48:B9:F6:52
           inet addr:10.1.255.233  Bcast:10.1.255.255  Mask:255.255.0.0
           inet6 addr: fe80::230:48ff:feb9:f652/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
           RX packets:32999308 errors:0 dropped:0 overruns:0 frame:0
           TX packets:27468315 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:24225053296 (22.5 GiB)  TX bytes:73313582546 (68.2 GiB)
           Interrupt:74 Base address:0x2000

Any advice on what to do?

Cordially,
-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032



More information about the Beowulf mailing list