[Beowulf] PVFS on 80 proc (40 node) cluster

Robert Latham robl at mcs.anl.gov
Mon Nov 1 08:46:44 PST 2004


On Sun, Oct 31, 2004 at 10:14:44PM -0500, Brian Smith wrote:
> PVFS2 has much improved fault tolerance over PVFS1 in that there can be
> redundant file nodes where as with PVFS1, if one node dropped dead, your
> FS was toast.

Just wanted to point out that through shared storage, 'heartbeat', and
engough hardware, you can have redundant PVFS1 and PVFS2 nodes.  We do
not at this time have *software* redundancy.  It's an area of active
research, though.

Please don't let the lack of software redundancy scare you off!  Many
many sites have run PVFS and not found reliability to be a problem.
Your application can do its I/O, writing out checkpoints or reading
datafiles or whatever IO it does to PVFS.  After your application
runs, move the data to tape or long-term storage at your liesure.
PVFS is fast scratch space, and as long as you treat it as such,
everything should work just fine.

> If you go to their web site, there should be plenty of documentation on
> how to set it up.  

Yes.  Also, feel free to take up this discussion on the PVFS
mailing lists. 

==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Labs, IL USA                B29D F333 664A 4280 315B



More information about the Beowulf mailing list