[Beowulf] network filesystem
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jeffrey B. Layton laytonjb at charter.netTue Mar 6 09:00:24 PST 2007
- Previous message: [Beowulf] network filesystem
- Next message: [Beowulf] network filesystem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mark Hahn wrote: >>> writing to different sections of a file is probably wrong on any >>> networked FS, since there will inherently be obscure interactions >>> with the size and alignment of the writes vs client pagecache, >> >> I'm rather surprised to see that sentiment on a mailing list for high >> performance clusters :> > > smiley noted, but I would suggest that HPC is not about convenience > first - simply having each node write to a separate file eliminates > any such issue, > and is hardly an egregious complication to the code. Actually this can greatly complicate code. If I run a CFD run on n number of processes and they each write the solution to a separate file, then if I run 1.5*n processes, how do I read the n files? I can write some code to take the n files, and then write out a single file or 1.5*n files for instance. To me this is a wasteful use of cycles when something like MPI-IO is so much better and I can stick with a single file. While I don't want to speak for the entire CFD community, but I haven't seen anyone write out n files. That concept was proven to be a huge pain many years ago. Other disciplines may have other opinions of course. >> I would contend that writing to different sections of a file *must* be >> supported by any file system deployed on a cluster. How else would >> you get good performance from MPI-IO? > > who uses MPI-IO? straight question - I don't believe any of our 1500 > users do. I do. I also know that some ISV's are moving rapidly to use MPI-IO. >>> in my experience, people who expect it to "just work" have an >>> incredibly naive model of how a network FS works (ie, write() >>> produces an RPC direct to the server) >> >> I agree that the POSIX API and consistency semantics make it difficult >> to achieve high I/O rates for common scientific workloads, and that >> NFS is probably not the best solution for those truly parallel >> workloads. >> >> Fortunately, there are good alternatives out there. > > starting with the question: "do you have a good reason to be writing > in parallel to the same file?". I'm not saying the answer is never yes. As Rob mentioned writing in parallel to the same file gets you good performance. I think this is a fundamental underpinning of parallel IO. You can do this with or without MPI-IO. MPI-IO just makes it easier, standard, and portable. Of course you would not have different processes writing to the same region of a file. But if you can have each process write to a distinct region or section of the file without worrying about having another process stepping on that one, then why not write in parallel? It's easy to do using MPI-IO. Take a look at the tutorials on MPI-IO around the web and give them a try. Jeff
- Previous message: [Beowulf] network filesystem
- Next message: [Beowulf] network filesystem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
