[Beowulf] HPC fault tolerance using virtualization
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
John Hearns hearnsj at googlemail.comTue Jun 16 07:21:53 PDT 2009
- Previous message: [Beowulf] HPC fault tolerance using virtualization
- Next message: [Beowulf] Re: HPC fault tolerance using virtualization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
2009/6/16 Ashley Pittman <ashley at pittman.co.uk> > > > > elements (or slots) allocated for the job on the node - if the VM is > > able to adapt itself to such a situation, f.e. by starting several MPI > > ranks and using shared memory for MPI communication. Further, to > > cleanly stop the job, the queueing system will have to stop the VMs, > > sending first a "shutdown" and then a "destroy" command, similar to > > sending SIGTERM and SIGKILL today. > I will provide a counter-example here - I think that a lot of people have thought about re-booting nodes every time they finish a job. There are codes out there which leave processes running, or leave shared memory segments, if the code is not properly terminated. I think everyone has had to run clean-ipcs at some time! Yes, you're right, the codes should be written properly and should not do this. However it is very tempting to put a reboot in as a step following every job, which means you get a machine in a known state for the next job. Running virtual machines will make that easy (depends how long they take to boot up) I agree with you about the 5% figure - the point I was making is that there will come a point where the advantages of running a virtual machine will outweigh a few percent of performance loss. Who knows where that point will be! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090616/9423f358/attachment.html
- Previous message: [Beowulf] HPC fault tolerance using virtualization
- Next message: [Beowulf] Re: HPC fault tolerance using virtualization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
