Cluster programming...
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Jan 22 11:07:12 PST 2003
- Previous message: Cluster programming...
- Next message: Cluster programming...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 22 Jan 2003, Karl Bellve wrote: > > I am running into a little problem about multiple writes to a single > file via NFS. > > > An application is spawned on a number of nodes. When they are done, they > all write to a specific, but non-overlapping area of the NFS mounted > file. I use fcntl (fd, F_SETLKW, &lck) to lock to file, or wait until it > can lock the file for writing. Fcntl() is capable to lock across NFS. > However, some nodes fail to write their result to the file. It isn't the > same nodes every time. I am not seeing any write errors. I tend to think > it is a NFS caching issue. All writes get flushed before releasing the > lock via fsync() and close(). > > The fileserver is a Redhat 8.0 system. I uprgaded to the latest Kernel > offered to RH8.0. That didn't fix the problem. I compiled a new kernel > (2.4.20) and that didn't fix the problem. The nodes are Alpha's running > RH6.2. Things to check: a) chkconfig --list nfslock (not running equals a problem:-). On the alphas you may have to look for rpc.lockd. b) whether the problem exists with e.g. Intel or AMD nodes running RH 7.x or 8.x. This seems to me to be your most likely problem -- alphas have always had more bugs that get less attention as there are fewer of them and nobody makes them or buys them, to speak of, any more. RH 6.2 is a really OLD kernel, right -- 2.2.16 or so? Does it even have kernel level NFS? Anything approximating the latest NFS? NFS used to be a bit of a bug farm in linux, although it would usually "work well enough" when one wasn't doing things like file locking... If the problem goes away with Intel and a modern kernel, I'm going to guess that your problem is alphas running 6.2. This won't tell you how to SOLVE your problem -- dumping your alphas probably won't be appealing to you, neither will trying to get a more recent RH release running on the alphas -- but it might get you to where you can at least face the problem squarely and figure out which of the unpleasant solutions to attempt. Probably an "alternate" one... > I am thinking about alternate means of locking but fnctl() should be the > trick. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Cluster programming...
- Next message: Cluster programming...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
