[Beowulf] NFSv3 client hangs - tcp v/s udp.
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Amitoj G. Singh amitoj at cs.uh.eduWed May 3 15:21:07 PDT 2006
- Previous message: [Beowulf] 512 nodes Myrinet cluster Challanges
- Next message: [Beowulf] NFSv3 client hangs - tcp v/s udp.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Cluster Details: ================ o 648 single processor Intel P4 worker nodes. o single head-node, NFSv3 server o OS - RedHat EL 4, kernel 2.6.12 o Torque 2 o Maui 3.2 o all worker nodes NFS mount /home, /usr/local After upgrading from Red Hat 7.1 to Red Hat EL 4 we realized that we were having a 1 in 10 user jobs fail because of a worker node NFS mount point failing to respond. The NFS mount points on the worker nodes would become unresponsive during heavy NFS I/O. A simple "netstat -t" on the head-node showed that there were thousands of open TCP nfs sockets on the head-node. Worker nodes that had frozen NFS mount points responded with the following error message: nfs_statfs: error no = 512 The above error message should be handled in kernel space but somehow was being reported in user space. The kernel should have handled the nfs timeout and reconnected transparent to the user. We realized that NFS v3 defaults to TCP if not explicitly mentioned at mount time. The only solution for a worker node with a frozen NFS mount point was to reboot the node. A "remount" works but you need to stop all services using the NFS mount points. We recently switched all our NFS mounts to use udp and have had no worker nodes with failing or unresponsoive NFS mount points. Thought would share this bit of experience with the list. Interestingly while googling we did not find a lot of chatter about this issue. - Amitoj.
- Previous message: [Beowulf] 512 nodes Myrinet cluster Challanges
- Next message: [Beowulf] NFSv3 client hangs - tcp v/s udp.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
