[Beowulf] copying big files (Henning Fehrmann)
mm at yuhu.biz
Sun Aug 10 06:56:50 PDT 2008
On Sunday 10 August 2008 15:02:52 Scott Atchley wrote:
> On Aug 10, 2008, at 7:57 AM, Scott Atchley wrote:
> > You may want to look at http://loci.cs.utk.edu. If you need to
> > distribute large files within a cluster or across the WAN, you can
> > use the LoRS tools to stripe the file over multiple servers and the
> > clients then try pulling blocks off of each server in parallel.
> > Using Internet2 and one client at Vanderbilt and a couple servers at
> > Univ of Tennessee, they were able to saturate UT's ~400 Mb/s I2 link
> > (much to the disbelief of the Vandy IT staff). I have seen ~5 Gb/s
> > within a cluster using good 10G NICs. :-)
> > Scott
> I forgot to mention LoRS optionally uses MD5 for checksums and AES-128
> for encryption (you can use either, both or neither).
> The stored file is represented by a XML file called an exNode. If you
> want to share the data, you can email the exNode to someone and they
> can then download the data. You control the download offset and length
> so that you can extract just the parts of the file that you want. I
> believe there is a NetCDF version that can use exNodes and there may
> be a HDF5 version as well.
I'm new to the list and I don't know if this was previously discussed but when
I need to provision a file to all machines within my cluster I use a cluster
file system like GlusterFS(http://www.gluster.org/docs/index.php/GlusterFS)
or GFarm(http://datafarm.apgrid.org/). I started with NFS but when you have
more then 50-60 machines your NFS becomes the problem that all machines see.
And the cure for that usually is an expensive hardware purchase.
More information about the Beowulf