[Beowulf] copying big files (Henning Fehrmann)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at caltech.eduFri Aug 8 10:55:19 PDT 2008
- Previous message: [Beowulf] copying big files (Henning Fehrmann)
- Next message: [Beowulf] copying big files (Henning Fehrmann)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> I will say that my dream would be for something like dolly to get some sort > of transfer recovery mechanism, though I realize that would be quite > difficult in such a topology. nettee has some failover and continuation capabilities at different points - but not what I think you want. The development version has a few extra modes for cases where data is being merged, but that isn't relevant to this discussion. When setting up the initial chain nettee can connect to an alternate node (from a list of failovers) if the target node will not answer. It also has the ability to keep going if the local disk becomes unwritable, and it can continue a download on a chain down to the node above the point of failure. However, nettee cannot at present rewire around a failed node to continue a download to the node(s) below it. That would indeed be quite difficult, since one could have a situation like this: A -> B (A knows it has sent 100MB) B -> C (B knows it has sent 98MB, then it blows up) C (C knows it has received 98 MB) A and C will eventually figure out that B has died, and they could conceivably negotiate a new connection, but A may no longer have the missing 2 MB (it might have been sent out a pipe, processed, and not stored in the raw state anywhere.) On the other hand, the development version uses ring buffers, and one could set those to be very large, enabling a certain level of "redo" from A. So if C comes back and says "I only have 98MB" A can see if it has the missing parts and go on if it does. It still might not though. If B has stalled for long enough the ring buffer on A may have completely filled from the previous node, overwriting the data needed to recover. I guess it would be possible to implement a "safety region" in the ring buffer which could not be overwritten. > > As an aside, I know that the dolly author (Felix) reads this list. I assume > dolly itself is now unmaintained? AFAIK Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] copying big files (Henning Fehrmann)
- Next message: [Beowulf] copying big files (Henning Fehrmann)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
