Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] NFS Read Errors

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at mcmaster.ca
Mon Dec 3 22:31:05 PST 2007


> does the same thing with a slightly simpler syntax.  There is mounting 
> evidence that you should use sha1sum rather than md5sum.

for general checking, md5 is still fine (ie not security-related stuff).

> I am guessing you are using TCP NFS mounts as well?  TCP forces retries in 
> the event of bad packets.  UDP doesn't force this, but the NFS protocol will

UDP has a checksum as well, though it's only 16b.  then again, the TCP
checksum isn't all that strong for today's data rates either.

you should definitely examine /proc/net/dev on involved machines.

>> We are in the process of upgrading and thus replacing all the machines we 
>> have of that configuration due to space limitations and their age, but I'm 
>> still curious what the problem could be.

I would attempt to reduce the complexity of your testing.
for instance, can a node write and verify to its local disk
without problem?  can it stream data over tcp sockets (netcat 
or the like) without corruption or obvious problems reflected
in /proc/net/dev?  does ethtool tell you anything about the 
config of the nic?  comparing tcp vs udp NFS would be sensible
as well - varying the packet size, too.  switching client and/or 
server to a modern 2.6 kernel may be instructive.



More information about the Beowulf mailing list