[Beowulf] network filesystem

Mark Hahn hahn at mcmaster.ca
Tue Mar 6 08:09:18 PST 2007


>> writing to different sections of a file is probably wrong on any
>> networked FS, since there will inherently be obscure interactions
>> with the size and alignment of the writes vs client pagecache,
>
> I'm rather surprised to see that sentiment on a mailing list for high
> performance clusters :>

smiley noted, but I would suggest that HPC is not about convenience first - 
simply having each node write to a separate file eliminates any such issue,
and is hardly an egregious complication to the code.

> I would contend that writing to different sections of a file *must* be
> supported by any file system deployed on a cluster.  How else would
> you get good performance from MPI-IO?

who uses MPI-IO?  straight question - I don't believe any of our 1500 users do.

> PVFS, GPFS, and Lustre all suppoort simultaneous writes to different
> sections of a file.

NFS certainly does as well.  you just have to know the constraints.
are you saying you can never get pathological or incorrect results from
parallel operations on the same file on any of those FS's?

>> in my experience, people who expect it to "just work" have an
>> incredibly naive model of how a network FS works (ie, write()
>> produces an RPC direct to the server)
>
> I agree that the POSIX API and consistency semantics make it difficult
> to achieve high I/O rates for common scientific workloads, and that
> NFS is probably not the best solution for those truly parallel workloads.
>
> Fortunately,  there are good alternatives out there.

starting with the question: "do you have a good reason to be writing in 
parallel to the same file?".  I'm not saying the answer is never yes.

I guess I tend to value portability by obscurity-avoidance.  not if it makes
life utter hell, of course, but...



More information about the Beowulf mailing list