[Beowulf] network filesystem

Robert Latham robl at mcs.anl.gov
Tue Mar 6 10:44:10 PST 2007

On Tue, Mar 06, 2007 at 11:09:18AM -0500, Mark Hahn wrote:
> >I would contend that writing to different sections of a file *must* be
> >supported by any file system deployed on a cluster.  How else would
> >you get good performance from MPI-IO?
> who uses MPI-IO?  straight question - I don't believe any of our 1500 users 
> do.

Excellent question.  Direct users?  Probably not very many.

We do find that straight-up MPI-IO isn't a good fit for a lot of
scientific applications.  The convienence factor you mentioned is
indeed important.  MPI-IO thinks of data as "stream of bytes", while
applications think in terms of "multidimentional typed data" (a slice
of upper atmosphere).

Libraries like Parallel-HDF5 and Parallel-NetCDF bridge the gap and
provide a convienent, familiar API.  The app is still using MPI-IO,
just not directly.

> NFS certainly does as well.  you just have to know the constraints.
> are you saying you can never get pathological or incorrect results from
> parallel operations on the same file on any of those FS's?

You observe correctly that file systems offer a set of rules on what
to expect from I/O patterns.  These consistency semantics are not set
in stone: MPI-IO consistency semantics are more relaxed than POSIX,
yet generally sufficent for parallel scientific applicaitons.   

We would consider it a serious bug in PVFS if simultaneous
non-overlapping writes corrupted data.

If the only file system I had access to was NFS, I'd do one file per
process as well. 

> starting with the question: "do you have a good reason to be writing in 
> parallel to the same file?".  I'm not saying the answer is never yes.
> I guess I tend to value portability by obscurity-avoidance.  not if it makes
> life utter hell, of course, but...

one file per processor falls down on systems like BGL (where even a
small run is 1024 processes, and 128k is not unheard of).  

One file per process also robs the higher layers of the I/O software
stack from an opportunity to optimize access patterns.  All processes
reading a collumn out of a row-major array is noncontiguous (and
generally slow) in file-per-processor, but can be contiguous in
single-file after applying data shipping or two-phase collective
buffering optimizations.  

Jeff touched on the data management issues of file-per-processor.

If file-per-processor really is the most portable and convienent way
to work on data, well, I can't argue with that.  On NFS, that's
probably the only way to get correct results.   The single-file
approach, however, has significant benefits on the modern parallel
file systems available today.

As I hope you could tell, this kind of discussion is a lot of fun for
me.  Thanks!


Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B

More information about the Beowulf mailing list