[Beowulf] copying big files (Henning Fehrmann)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at caltech.eduMon Aug 18 08:38:09 PDT 2008
- Previous message: [Beowulf] Writing about Clusters/HPC
- Next message: [Beowulf] hang-up of HPC Challenge
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Henning Fehrmann wrote: > > I spread successfully a 10G file to 50 nodes. The rate was 140Mb/s for nettee and a bit slower using dolly. > I guess it was due to a busy node somewhere in the chain. > Increasing the number of clients up to 100 failed in both cases. > > For nettee I got: > nettee: fatal error writing to child: Connection reset by peer > > I will do more systematic test the next days. > David Mathog, are you interested in bug reports? Yes, please. If memory serves you will see that error whenever a child node, or nettee on that child, crashes. For instance, if you "kill -9" nettee on a child the parent should see that. The command option -colwf will let the chain continue if this is caused by a full disk or a stdout pipe failing. The option -conwf should let the chain continue transfer down to one above the failed node, and it should tell you which node it was that failed, so long as -v is used with the appropriate bits. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] Writing about Clusters/HPC
- Next message: [Beowulf] hang-up of HPC Challenge
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
