[Beowulf] network filesystem
hahn at mcmaster.ca
Mon Mar 5 08:08:28 PST 2007
> Our developers had that issue of inconsistent file system view in RHEL
> based systems, some of it is solved by disabling dir list caching, another
> by using noac,
well, developers should be smart enough to know what FS they're using,
and how it's intended to behave. turning off AC is a nice option,
but smarter is to leave it on and not try to cause race conditions.
(I expect that such race-friendly behavior will fail on some other
non-NFS filesystems, though probably harder to trigger.)
> what the other was doing was writing simultaneously to the
> same file partitioned over several nodes, I told this is probably not the
> right way to do file writing. apparently he used to do it in Sun Solaris and
> it worked flawlessly.
I would spank any developer who said "but it works on platform X"!
developers must be aware of the spec, not merely what they can get
away with somewhere, sometime. of course, this is the thinking
behind apps having "supported" platforms - just a fancy way of saying
"no, we don't know what standards-conformance we need, or how we
violate the standard, but here's a few places we haven't yet noticed
any bad-enough bugs".
writing to different sections of a file is probably wrong on any
networked FS, since there will inherently be obscure interactions
with the size and alignment of the writes vs client pagecache,
network transport, actual network FS, server pagecache and underlying
server/disk FS. in my experience, people who expect it to "just work"
have an incredibly naive model of how a network FS works (ie, write()
produces an RPC direct to the server)
More information about the Beowulf