[Beowulf] NFS Read Errors
hahn at mcmaster.ca
Mon Dec 3 22:31:05 PST 2007
> does the same thing with a slightly simpler syntax. There is mounting
> evidence that you should use sha1sum rather than md5sum.
for general checking, md5 is still fine (ie not security-related stuff).
> I am guessing you are using TCP NFS mounts as well? TCP forces retries in
> the event of bad packets. UDP doesn't force this, but the NFS protocol will
UDP has a checksum as well, though it's only 16b. then again, the TCP
checksum isn't all that strong for today's data rates either.
you should definitely examine /proc/net/dev on involved machines.
>> We are in the process of upgrading and thus replacing all the machines we
>> have of that configuration due to space limitations and their age, but I'm
>> still curious what the problem could be.
I would attempt to reduce the complexity of your testing.
for instance, can a node write and verify to its local disk
without problem? can it stream data over tcp sockets (netcat
or the like) without corruption or obvious problems reflected
in /proc/net/dev? does ethtool tell you anything about the
config of the nic? comparing tcp vs udp NFS would be sensible
as well - varying the packet size, too. switching client and/or
server to a modern 2.6 kernel may be instructive.
More information about the Beowulf