[Beowulf] network filesystem

Mark Hahn hahn at mcmaster.ca
Mon Mar 5 08:08:28 PST 2007


> Our developers  had  that issue  of inconsistent  file system  view in RHEL
> based systems, some of it is solved by disabling dir list caching, another
> by using noac,

well, developers should be smart enough to know what FS they're using,
and how it's intended to behave.  turning off AC is a nice option, 
but smarter is to leave it on and not try to cause race conditions.
(I expect that such race-friendly behavior will fail on some other 
non-NFS filesystems, though probably harder to trigger.)

> what the other was doing was writing simultaneously to  the
> same file partitioned over several nodes, I told this is probably not the
> right way to do file writing. apparently he used to do it in Sun Solaris and
> it worked flawlessly.

I would spank any developer who said "but it works on platform X"!
developers must be aware of the spec, not merely what they can get 
away with somewhere, sometime.  of course, this is the thinking 
behind apps having "supported" platforms - just a fancy way of saying
"no, we don't know what standards-conformance we need, or how we 
violate the standard, but here's a few places we haven't yet noticed 
any bad-enough bugs".

writing to different sections of a file is probably wrong on any 
networked FS, since there will inherently be obscure interactions 
with the size and alignment of the writes vs client pagecache,
network transport, actual network FS, server pagecache and underlying 
server/disk FS.  in my experience, people who expect it to "just work"
have an incredibly naive model of how a network FS works (ie, write()
produces an RPC direct to the server)



More information about the Beowulf mailing list