Cluster programming...

Steffen Persvold sp at scali.com
Wed Jan 22 15:39:55 PST 2003


On Wed, 22 Jan 2003, Robert G. Brown wrote:

> On Wed, 22 Jan 2003, Karl Bellve wrote:
> 
> > 
> > I am running into a little problem about multiple writes to a single 
> > file via NFS.
> > 
> > 
> > An application is spawned on a number of nodes. When they are done, they 
> > all write to a specific, but non-overlapping area of the NFS mounted 
> > file. I use fcntl (fd, F_SETLKW, &lck) to lock to file, or wait until it 
> > can lock the file for writing. Fcntl() is capable to lock across NFS. 
> > However, some nodes fail to write their result to the file. It isn't the 
> > same nodes every time. I am not seeing any write errors. I tend to think 
> > it is a NFS caching issue. All writes get flushed before releasing the 
> > lock via fsync() and close().
> > 
> > The fileserver is a Redhat 8.0 system. I uprgaded to the latest Kernel 
> > offered to RH8.0. That didn't fix the problem. I compiled a new kernel 
> > (2.4.20) and that didn't fix the problem. The nodes are Alpha's running 
> > RH6.2.
> 
> Things to check:
> 
>   a) chkconfig --list nfslock
> 
> (not running equals a problem:-).  On the alphas you may have to look
> for rpc.lockd.
> 
>   b) whether the problem exists with e.g. Intel or AMD nodes running RH
> 7.x or 8.x.  This seems to me to be your most likely problem -- alphas
> have always had more bugs that get less attention as there are fewer of
> them and nobody makes them or buys them, to speak of, any more.  RH 6.2
> is a really OLD kernel, right -- 2.2.16 or so?  Does it even have kernel
> level NFS?  Anything approximating the latest NFS?  NFS used to be a bit
> of a bug farm in linux, although it would usually "work well enough"
> when one wasn't doing things like file locking...

AFAIK the latest 2.2 kernel is 2.2.23 and 2.2 kernels had kernel level 
NFS from the beginning although not always correctly implemented as you 
point out :)

> 
> If the problem goes away with Intel and a modern kernel, I'm going to
> guess that your problem is alphas running 6.2.  This won't tell you how
> to SOLVE your problem -- dumping your alphas probably won't be appealing
> to you, neither will trying to get a more recent RH release running on
> the alphas -- but it might get you to where you can at least face the
> problem squarely and figure out which of the unpleasant solutions to
> attempt.  Probably an "alternate" one...

RH 7.1 is the latest RedHat distribution released for the Alphas. The 
latest errata kernel for this distribution is 2.4.9-40 IIRC, however it 
should be any problem to compile and run for example a stock 2.4.20 kernel 
with RH 7.1 (with the latest upgrades).


Regards,
-- 
  Steffen Persvold   |       Scali AS      
 mailto:sp at scali.com |  http://www.scali.com
Tel: (+47) 2262 8950 |   Olaf Helsets vei 6
Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY




More information about the Beowulf mailing list