[Beowulf] network filesystem
Jeffrey B. Layton
laytonjb at charter.net
Tue Mar 6 09:00:24 PST 2007
Mark Hahn wrote:
>>> writing to different sections of a file is probably wrong on any
>>> networked FS, since there will inherently be obscure interactions
>>> with the size and alignment of the writes vs client pagecache,
>> I'm rather surprised to see that sentiment on a mailing list for high
>> performance clusters :>
> smiley noted, but I would suggest that HPC is not about convenience
> first - simply having each node write to a separate file eliminates
> any such issue,
> and is hardly an egregious complication to the code.
Actually this can greatly complicate code. If I run a CFD run on n number of
processes and they each write the solution to a separate file, then if I run
1.5*n processes, how do I read the n files? I can write some code to
n files, and then write out a single file or 1.5*n files for instance.
To me this
is a wasteful use of cycles when something like MPI-IO is so much better
and I can stick with a single file.
While I don't want to speak for the entire CFD community, but I haven't
seen anyone write out n files. That concept was proven to be a huge pain
many years ago.
Other disciplines may have other opinions of course.
>> I would contend that writing to different sections of a file *must* be
>> supported by any file system deployed on a cluster. How else would
>> you get good performance from MPI-IO?
> who uses MPI-IO? straight question - I don't believe any of our 1500
> users do.
I do. I also know that some ISV's are moving rapidly to use MPI-IO.
>>> in my experience, people who expect it to "just work" have an
>>> incredibly naive model of how a network FS works (ie, write()
>>> produces an RPC direct to the server)
>> I agree that the POSIX API and consistency semantics make it difficult
>> to achieve high I/O rates for common scientific workloads, and that
>> NFS is probably not the best solution for those truly parallel
>> Fortunately, there are good alternatives out there.
> starting with the question: "do you have a good reason to be writing
> in parallel to the same file?". I'm not saying the answer is never yes.
As Rob mentioned writing in parallel to the same file gets you good
I think this is a fundamental underpinning of parallel IO. You can do
or without MPI-IO. MPI-IO just makes it easier, standard, and portable.
Of course you would not have different processes writing to the same region
of a file. But if you can have each process write to a distinct region
of the file without worrying about having another process stepping on that
one, then why not write in parallel? It's easy to do using MPI-IO. Take
at the tutorials on MPI-IO around the web and give them a try.
More information about the Beowulf