[Beowulf] NFS Read Errors
Michael H. Frese
Michael.Frese at NumerEx.com
Wed Dec 5 08:55:50 PST 2007
This tale is at an end, I think, because I can't bear to tell it much
longer. As many have suggested, there is probably a hardware
problem, and since the hardware is old, I will do without the
services of the troublesome machines -- It turns out that there is
another acting up as well -- till they are replaced in a couple of weeks.
Many thanks to all who racked their brains for helpful suggestions.
I want to tell a little more of what I have learned, before I drop
the subject altogether.
First, I did swap the cable of the bad machine with that of a good
one with no effect on either machine. This eliminates the
possibility of the cable or the switch port being bad. Since I had
previously changed out the NIC and the switch, the only possibilty is
something inside the machine itself, probably the motherboard, but
possibly a corrupted kernel module for handling udp -- more on that below.
Second, we could find no sign of this failure in any log. Nor did
/proc/net/dev show any errors. The suggestion is that older kernels
aren't going to detect and report such errors. I think that's
because they do nfs over udp. More about that in a moment.
Third, though netcat isn't on these systems, nc is. We didn't get
around to trying it, because we found ttcp.
Fourth, with ttcp over tcp, I found that the troubled machine could
send 800 MB in about 20 seconds -- the wire speed for those 32-bit
PCI slots as tested by netpipe. However, if I used ttcp over udp, I
couldn't reliably send even ten 8192-byte blocks! Successive sends
and receives would receive 3, or 1, or 5 blocks. Don't ask me how
these two facts are compatible. I don't know.
Clearly, this puts a premium on using tcp for nfs. All our attempts
to do that failed. Well, both of them, anyway. In the first one, we
unmounted the offending disk, modified its fstab entry, and remounted
it. We were pretty careful in the second one, where we added tcp to
the fstab argument, unmounted all the remote disks, restarted all the
nfsd's, and did 'mount -a'. We got an error message in both cases
that didn't obviously refer to the tcp argument, but the mount didn't
happen. As I write this, I see references to tcp mount requests in
the mountd man page, so maybe we need to do a bit more here.
The Wikipedia article on nfs says this: "At the time of introduction
of Version 3, vendor support for TCP as a transport-layer protocol
began increasing. While several vendors had already added support for
NFS Version 2 with TCP as a transport, Sun Microsystems added support
for TCP as a transport for NFS at the same time it added support for
I'd like to know what version of nfs this server supports, but the
man page on nfsd doesn't say. The man page on rpc.mountd says that
it supports nfs version 2 and version 3, but that "If the NFS kernel
module was compiled without support for NFSv3, rpc.mountd must be
invoked with the option --no-nfs-version 3." Yet the
/proc/procnum/cmdline for the running rpc.mountd doesn't show a
--no-nfs-version argument. Clearly, both the kernel and the server
need to support the use of tcp.
I'd like to get any of our other machines with these older kernels at
other sites to using tcp for nfs where possible, in order to avoid
this in the future. We are already seeing signs of network problems
on them. If that's not possible, then in order to avoid a complete
rebuild of those systems -- there are 12 of them -- we are going to
put a testing script together using remote invocations of md5sum and
comparison of results to recorded local results.
At 08:54 AM 12/4/2007, you wrote:
>Thanks for your helpful comments.
>At 11:31 PM 12/3/2007, you wrote:
>>>I am guessing you are using TCP NFS mounts as well? TCP forces
>>>retries in the event of bad packets. UDP doesn't force this, but
>>>the NFS protocol will
>>UDP has a checksum as well, though it's only 16b. then again, the TCP
>>checksum isn't all that strong for today's data rates either.
> From reading the man page on nfs on the systems with the 2.4
> kernels, it looks like the default for an nfs mount is udp. It
> also looks like tcp is not really an option until nfs v4, so it may
> be something to try on the 2.6 kernels that I have on some of my
> newer machines at another site.
>>you should definitely examine /proc/net/dev on involved machines.
>I hadn't known about /proc/net/dev. When I check there, I see no
>transmit errors on the server side and no receive errors on the
>client side. That's odd, because the other thing I see is that the
>average packet size received (bytes received divided by packets
>received) on the client side is 3.9, while on the server side, the
>average packet size sent is 1430. In other words, there are a many
>more packets received than there ought to be. That's very
>fishy. It's probably the result of the way the packet count is done
>and reported. I.e., it may be that all the received packets -- good
>and bad -- are counted, but only the bytes in the good ones are
>counted, with some similar problem on the server side. I think the
>statistics are aggregate since the last boot, so they may not be
>just from the troublesome tests I was performing, either.
>>I would attempt to reduce the complexity of your testing.
>>for instance, can a node write and verify to its local disk
>The local disk read seems rock solid in comparison to the NFS
>one. The local md5sum produces the same result time after time,
>which is just not the case for the remote.
>>can it stream data over tcp sockets (netcat or the like) without
>>corruption or obvious problems reflected
>netcat is not on my systems. Looks like I have to get someone to
>download and build it for me, and try the streaming tests you recommend.
>>does ethtool tell you anything about the config of the nic?
>Not on the 2.4 systems, though it seems to tell me a little on the 2.6's.
>>comparing tcp vs udp NFS would be sensible
>>as well - varying the packet size, too. switching client and/or
>>server to a modern 2.6 kernel may be instructive.
>Upgrading the kernel is probably the only way I'll get nfs over
>tcp. Given that these systems are headed out the door, I'm not sure
>that's a good use of our time. But it may be worth doing an our new
>and newer systems.
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf