[Beowulf] Re: HPC fault tolerance using virtualization)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Dave Love d.love at liverpool.ac.ukMon Jun 29 05:33:37 PDT 2009
- Previous message: [Beowulf] Re: HPC fault tolerance using virtualization
- Next message: [Beowulf] HPC fault tolerance using virtualization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Greg Lindahl <lindahl at pbm.com> writes: >> What I typically see from smartd is alerts when one or more sectors has >> already gone bad, although that tends not to be something that will >> clobber the running job. How should it be configured to do better >> (without noise)? > > That isn't noise, that's signal. Of course I didn't mean that bad block alerts were noise. However, there is what I and a hardware expert think is noise from the default smartd configuration. I'm interested in how best to configure it for useful warnings. I did have a look OTW, of course. > You're just lucky that your running > job doesn't need the data off the bad sector. Not if the problem is, say, on /usr, which the job normally isn't going to need before it finishes. > You can try waiting > until the job finishes before taking the node out of service; from the > sounds of it, you will usually win. But if you don't have > application-level end-to-end checksums of your data, how do you know > if you won or not? I know where the job is doing i/o, and I'm not going to kill multi-day, multi-node jobs -- especially not automatically -- because there's a bad sector somewhere irrelevant. Also we have better things to worry about here, at least, than application checksums, much as they might feature in an ideal world.
- Previous message: [Beowulf] Re: HPC fault tolerance using virtualization
- Next message: [Beowulf] HPC fault tolerance using virtualization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
