Cluster programming...

Robert G. Brown rgb at phy.duke.edu
Wed Jan 22 11:07:12 PST 2003


On Wed, 22 Jan 2003, Karl Bellve wrote:

> 
> I am running into a little problem about multiple writes to a single 
> file via NFS.
> 
> 
> An application is spawned on a number of nodes. When they are done, they 
> all write to a specific, but non-overlapping area of the NFS mounted 
> file. I use fcntl (fd, F_SETLKW, &lck) to lock to file, or wait until it 
> can lock the file for writing. Fcntl() is capable to lock across NFS. 
> However, some nodes fail to write their result to the file. It isn't the 
> same nodes every time. I am not seeing any write errors. I tend to think 
> it is a NFS caching issue. All writes get flushed before releasing the 
> lock via fsync() and close().
> 
> The fileserver is a Redhat 8.0 system. I uprgaded to the latest Kernel 
> offered to RH8.0. That didn't fix the problem. I compiled a new kernel 
> (2.4.20) and that didn't fix the problem. The nodes are Alpha's running 
> RH6.2.

Things to check:

  a) chkconfig --list nfslock

(not running equals a problem:-).  On the alphas you may have to look
for rpc.lockd.

  b) whether the problem exists with e.g. Intel or AMD nodes running RH
7.x or 8.x.  This seems to me to be your most likely problem -- alphas
have always had more bugs that get less attention as there are fewer of
them and nobody makes them or buys them, to speak of, any more.  RH 6.2
is a really OLD kernel, right -- 2.2.16 or so?  Does it even have kernel
level NFS?  Anything approximating the latest NFS?  NFS used to be a bit
of a bug farm in linux, although it would usually "work well enough"
when one wasn't doing things like file locking...

If the problem goes away with Intel and a modern kernel, I'm going to
guess that your problem is alphas running 6.2.  This won't tell you how
to SOLVE your problem -- dumping your alphas probably won't be appealing
to you, neither will trying to get a more recent RH release running on
the alphas -- but it might get you to where you can at least face the
problem squarely and figure out which of the unpleasant solutions to
attempt.  Probably an "alternate" one...

> I am thinking about alternate means of locking but fnctl() should be the 
> trick.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list