Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] network filesystem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Jeffrey B. Layton laytonjb at charter.net
Tue Mar 6 09:00:24 PST 2007


Mark Hahn wrote:
>>> writing to different sections of a file is probably wrong on any
>>> networked FS, since there will inherently be obscure interactions
>>> with the size and alignment of the writes vs client pagecache,
>>
>> I'm rather surprised to see that sentiment on a mailing list for high
>> performance clusters :>
>
> smiley noted, but I would suggest that HPC is not about convenience 
> first - simply having each node write to a separate file eliminates 
> any such issue,
> and is hardly an egregious complication to the code.

Actually this can greatly complicate code. If I run a CFD run on n number of
processes and they each write the solution to a separate file, then if I run
1.5*n processes, how do I read the n files? I can write some code to 
take the
n files, and then write out a single file or 1.5*n files for instance. 
To me this
is a wasteful use of cycles when something like MPI-IO is so much better
and I can stick with a single file.

While I don't want to speak for the entire CFD community, but I haven't
seen anyone write out n files. That concept was proven to be a huge pain
many years ago.

Other disciplines may have other opinions of course.

>> I would contend that writing to different sections of a file *must* be
>> supported by any file system deployed on a cluster.  How else would
>> you get good performance from MPI-IO?
>
> who uses MPI-IO?  straight question - I don't believe any of our 1500 
> users do.

I do. I also know that some ISV's are moving rapidly to use MPI-IO.

>>> in my experience, people who expect it to "just work" have an
>>> incredibly naive model of how a network FS works (ie, write()
>>> produces an RPC direct to the server)
>>
>> I agree that the POSIX API and consistency semantics make it difficult
>> to achieve high I/O rates for common scientific workloads, and that
>> NFS is probably not the best solution for those truly parallel 
>> workloads.
>>
>> Fortunately,  there are good alternatives out there.
>
> starting with the question: "do you have a good reason to be writing 
> in parallel to the same file?".  I'm not saying the answer is never yes.

As Rob mentioned writing in parallel to the same file gets you good 
performance.
I think this is a fundamental underpinning of parallel IO. You can do 
this with
or without MPI-IO. MPI-IO just makes it easier, standard, and portable.

Of course you would not have different processes writing to the same region
of a file. But if you can have each process write to a distinct region 
or section
of the file without worrying about having another process stepping on that
one, then why not write in parallel? It's easy to do using MPI-IO. Take 
a look
at the tutorials on MPI-IO around the web and give them a try.

Jeff




More information about the Beowulf mailing list